Suggestions on how to use git to cleanup commit history before opening a PR

Discussions about the wiki documentation of FreeCAD and its translation.
Forum rules
Be nice to others! Respect the FreeCAD code of conduct!
Post Reply
carlopav
Veteran
Posts: 2062
Joined: Mon Dec 31, 2018 1:49 pm
Location: Venice, Italy

Suggestions on how to use git to cleanup commit history before opening a PR

Post by carlopav »

matthijskooijman wrote: ping
Kunda1 wrote: ping


Hello, recently @matthijskooijman mentored me to make some changes to Draft Edit. Apart the great help he provided on the code itself, I found his suggestions on git really clear. It would be nice if someone could take inspiration to add them to the wiki: at the moment the pages on git (this and this) are quite tecnical, but lack in transferring to newby developers (like me) the aim and common process of preparing commits for easy review on a PR.
Here is the link to the PR 4811 and the text of the good post by @matthijskooijman:



ME: I agree that separated commits would have been better, but can I keep the suggestion for the next contribution? :-)

MATTHI: Well, ultimately, that's not up to me, but I would think that splitting/restructuring the commits to improve reviewability would be helpful to reduce load on maintainers and make it easier to merge this. Especially now that you're adding commits that amend to original thing, it becomes more difficult to review this commit-by-commit, leaving the only option of reviewing the complete PR as one, which I would not be happy about as a maintainer.
If you're intimidated by this restructuring because git can be a challenge to work with (I'm quite proficient with it and love its power, but it's certainly not easy if you're less experienced) but you a're open to investing the extra time needed to improve the commit structure, I'll be happy to share some tips on how I'd approach this using some git commands to reduce the amount of work needed.

ME: ok, let's do it (i feel a bit guilty cause you are loosing a lot of time). I'm using GitKraken because the command line interface is quite scaring to me as a Windows user...

MATTHI: Ok, here goes. I can't really comment on gitkraken (I always use the commandline since there it is, to me, clearest what happens under the hood and you can always use all features, rather than the subset exposed by the GUI), but I'll try to also explain the concept so you can find the same thing in gitkraken.

First off: I think it would be good to first further update the code to look like you want it to look (e.g. generalizing the getObjectInfo stuff), since you'd otherwise need to make those changes twice later.

Intended commit history
Then, to plan what we want the commit history to look at eventually, I think something like below. Note that I'm just talking about what the resulting commits should look like, not yet about how to get there from the current history. So the starting point of the below list is the master commit you started from.

Remove all the empty pass functions
Remove the event parameter from startEditing (you added this commit last, but since evaluate_context_menu_action already has an obj and node_idx parameter, I think you can do this change earlier, preventing the intermediate breakage by removing references to edit_command.event before removing the variable itself).
Remove the App.DraftWorkingPlane.restore() line (I do not really understand this change, but you mentioned it should be its own commit?)
Introduce a getSpecificObjectInfo function and update the original addPoint function to use it (or whatever other approach you choose to extract the object-type-agnostic code from addPoint so it does not have to be duplicated).
Split out addPoint to multiple add_point methods in the GuiTools subclasses (and replacing the call to addPoint in evaluate_menu_action with a call to obj_gui_tools.add_point).
Rewriting evaluate_menu_action and display_tracker_menu to use the new callback structure, and rewrite all GuiTools subclasses to also adapt (you could split this into two commits for clarity, but then the first commit would break things, which is usually not a good idea, so I'd keep this as one commit, unless you want to add the backward compatibility wrapper).
Add the reverse wire command

Basic approach
Given that most of these changes are pretty independent, the basic workflow I would choose here is to:

Reset you branch back to the original master commit you started working from. The default "mixed" flavor of the git reset command resets your branch and the index/staging area but not the working copy, so this would put you in a position where you would have been if you had done all the coding in your checkout, but not committed any of it yet. On the commandline, this would be git reset origin/master (assuming that you origin/master still points to the commit you branched off from, if not you might need to rebase first, or replace origin/master with the SHA of the branch-off point).
Now, you can use interactive staging to select a subset of all your changes and create a new commit for that. You can do this for each commit to be created in turn. On the commandline, you would use git add -p for this, but I suspect that in kraken you can do a similar thing in e.g. a diff view where you select lines of the diff to be staged. On the commandline you can even edit the to-be-staged diff in a text editor (useful in case you need to commit one change in a line but not another, or in adjacent lines), but if Kraken doesn't support this, and you'd need it, you can always just revert one of the changes in the file manually, commit that, ^Z in your editor and commit the other change.

Note that especially the git add -p / interactive staging is IMHO a very powerful tool, which allows me to have a much improved workflow where I can (in a lot of cases) just focus on coding and making things work first, and then still be able to split the changes in proper commits at a later stage (sometimes at the end, somethings I already commit things halfway when I'm confident that a particular change is somewhat complete and will be included in the final history).

Additional work for commit 4 and 5
This approach will not work for all commits above. In particular, number 4 and 5 need to make changes to code in gui_edit.py that is already removed in the current PR (so with the above approach, that code will be gone from your working copy already). For this, I would suggest first making a backup copy of gui_edit.py and then use git checkout to throw away all uncommitted changes from gui_edit.py (or alternatively, only throw away a selection of changes, git checkout has an interactive -p option too). After this, you should again have the original addPoint and evaluate_menu_action code, so you can make changes to those and commit that (this is the part where you'll need to do some manual code editing, and will be writing/copying back some code that will be thrown away again in the next commit, but that's ok). After you created these two commits, you can restore your gui_edit.py backup again, and start interactively committing the rest of the commits.

Verifying commits work
Note that you might want to verify in between that your separated changes actually work. One easy way for that is to first commit them (even when you're not entirely sure that you got everything, you can always amend the commit later) and then use git stash to stash away all uncommitted changes, so you can test the code as you committed it. Then unstash and continue with the next commit.

Interactive rebase and fixup commits
One additional thing I would want to point out, in case you haven't used it before, is the interactive rebase in git. Though rebase is intended to be used to "rewind" a number of commits you made and "replay" them onto another starting point (e.g. a more recent version of the master branch), you can also rebase on top of an earlier commit in your existing history (instead of some newer master version), so you end up rewinding to that earlier commit, and then simply replaying the same commits on top of the same earlier base commit again (i.e. nothing changes). But if you then use rebase in interactive mode, you can select which commits should be replayed in which order. This allows you to easily reorder your commits or remove some of them.

Even more, you can tell git to squash some commits together in the process, which means it becomes very easy to fix mistakes later: If you already committed something, but spot a mistake in it, you just make a new commit with just the fix, and then use an interactive rebase to move that "fixup" commit backwards in history to be right after the original commit, and then ask git to squash them together into one commit, as if the mistake never happened.

Git even automates a part of this process using git commit --fixup, which marks such "fixup" commits in such a a way that the commit message indicates what previous commit they are fixing, so an interactive rebase (with --autosquash) can automatically reorder and squash them. I use this workflow often where I put fixes for earlier mistakes into such fixup commits, but do not do the interactive rebase right away, but wait until some later point (so I can rebase/squash a few of them at the same time to save time, or so I can push out the fixup commits to a PR so reviewers that already saw the mistake can see the fixup more clearly before I make both disappear again). Since git remembers for me which fixup is meant to fix which mistake, I can easily postpone the actual rebasing/squashing since I do not have to remember :-) Of course if multiple commits touch the same code and you reverse their order (i.e. a fixup commit that fixes a mistake in code that was modified again after the mistake commit), such a rebase is likely to cause conflicts, so be prepared to fix some conflicts along the way (but knowing how to fix conflicts is useful in any case, of course).

Force pushing
One word of caution, though: All this rebase/fixup (and also the reset + recommit stuff) rewrites history, which is typically not appropriate once commits have been merged to master, or some other branch that is intended to have some stability. If you rewrite history on a branch that you already pushed somewhere, you'll have to force push to overwrite the previous history, which means that pulling from such a branch might cause problems (since as far as git is concerned, the original and rewritten commit histories look like separately developed commits which happen to touch all the same code, leading to conflicts). Force pushing is not acceptable for master usually (though this is a matter of convention, git doesn't care), but for pull request branches, this is usually ok (people pulling from PR branches should pull carefully, though again, project-specific convention might differ).

Commit messages
One additional advantage of splitting code into different commits, is that you can use each commit's message to more specifically motivate the changes in that commit. In this particular case, some of the changes may seem meaningless when the commit is viewed in isolation (i.e. commit 4 would split off some code into its own function which is then called only once), but makes sense in the whole. What I usually do for such commits is explicitly note this in the commit message (e.g. something like "This commit prepares for splitting out the addPoint code into multiple versions in a future commit, reducing the amount of code to duplicate when that happens").

Final words
I hope this will add some tools to your toolbox, and allow you to restructure this PR in a easier-to-review manner. If things are still unclear or you're stuck somewhere, just shout and I'll try to help a bit more :-)

On another note: given you're still working on this PR, maybe you should mark it as a draft for now. Then once everything is cleaned up and we're happy, it can be marked as done, maybe marking a more clear point for the maintainers to have a look at the PR.
follow my experiments on BIM modelling for architecture design
Post Reply