What’s this about Macro-commits?
This article tries the bundle and direct my woolly thoughts on an article called What’s this about Micro-commits by Tim Ottinger into something coherent and ultimately useful by summarising and extending upon a lengthy Twitter discussion (German language, beware!) I had about it earlier.
First off, if you are working in a small, cohesive, and agile team of closely collaborating engineers who iterate fast and collectively and systematically chew through the burn-down chart, stop reading and keep doing what you’re doing — micro-commits are for you, the end.
(actually, maybe don’t stop reading just yet)
Micro-commits: The Good, the Bad and the Ugly
For everybody else, I want to give some perspective on why I think “fat” macro-commits aren’t all bad and for the right project even desirable. Someone with more enterprise experience may disagree with me, but this is my experience from four years of herding coding cats as a maintainer on a growing open source project called KeePassXC.
The main benefits of treading the mill that is “red, green, refactor, commit” are clear:
- You focus on one thing at a time,
- your workflow is stable and you rarely end up standing on a burning floor with a crumbling roof above your head,
- you “save your game” (as the original author put it) and can fall back into your safety net (a.k.a.
git reset --hard) at any time instead of having to start from scratch,
- you are encouraged to share changes with your team early, reducing the risk of a total merge catastrophe,
- a regular reviewer can easily follow your thought process, making reviewing quick and less of a chore,
- rebasing small commits on top of a moving branch is easy,
- the green bathroom tiles on your GitHub profile make you look like a super-hero.
However, there are also equally many downsides to this approach:
- You focus on only one thing at a time,
- the larger your test suite and the longer your build times are, the less attractive it becomes to run the whole thing every few minutes,
- if you only test changes locally and with only parts of the test suite, breakage may still occur when your changes hit the CI server,
- if you get carried away, you start spending more time thinking about your commit messages than your actual code,
- and if you don’t, your commit messages become meaningless and over time your whole history turns into utter rubbish that might as well be truncated,
- an occasional reviewer wastes a lot of time following your convoluted thought processes from last week, which not even you understand anymore,
- bisecting a longer history takes longer,
- bisecting a longer history in which not every commit is green and atomic is a catastrophic waste of time,
- rebasing a long tail of small commits on top of a moving branch is virtually impossible.
Some of these caveats can be overcome by pure discipline, some are systemic, and some become an outright show stopper when you are not working in your small cohesive team anymore, but as an open source maintainer with outside contributors. In fact, it gets so bad that you waste a huge amount of time attempting to disentangle other people’s code trying to fix bugs in a giant cesspool of useless back-and-forth commits for code that is virtually undebuggable two weeks after it’s been merged. Let me explain.
Where Macro-commits save your day
Macro-commits are not coding “landfill-style”
To get this misconception out of the way before it even arises: macro-commits do not mean you dump all your code of the last weeks and months into a single giant revision, slap a label on the sack with “Done stuff”, and whack the next reviewer who doesn’t run fast enough. That is utterly ridiculous and silly. You still want to commit regularly and “save your game”. Yet you do not want to submit your full save-game history when you are done. Macro-commits are about submitting the polished results of your work in logical and digestible chunks.
Micro-commits do not necessarily let you rebase
Let us start with the rebasing point, because it is perhaps the most obvious one. The author writes:
There is one other odd effect of micro-commits: rebase works great.
A lot of people on the internet have been telling me that they never do git pull –rebase because the merge always fails and leaves them in a weird (“headless”) state in their local codebase.
This statement is particularly funny, because it is both right and totally wrong at the same time. So wrong even that it becomes outright dangerous advice.
Small changes generally rebase well and most of the time without manual intervention even. Long histories of commits of whatever size, however, do not. In fact, this is where you turn your perfectly curated and atomic all-green commit history into a bloody red mess of broken commits, because your merge and conflict resolution strategy produces intermediate commits that should have never existed like that in the first place with inconsistent code and in the worst case even syntax errors. The least you must do after such a rebase is to run your test suite (or better the full CI pipeline) on each new commit again and then go back and fix the broken commits, which will introduce new broken commits in the more recent history, so you keep tweaking commits until you either come out on top or abort the rebase, squash everything and try again. By committing in larger chunks, you would have avoided this mess (almost) entirely, because a conflict has to be solved only once and no cascading errors and inconsistent states occur.
You may object that this wouldn’t happen if you played your changes back to the main branch regularly and that is true, but let me remind you that we are in cat herding mode where a small number of maintainers manages outside contributors without push access to the main branch. So any change that goes to develop has to go through me.
That said, rebase works totally fine for macro-commits and I never use
git pull --merge (like ever, it’s the worst way of updating your branch, trust me).
I do not care what you were thinking six days ago
Since we have established that every change has to go through me, we can firmly say that below me there is a bottle and I am its neck and if one is maintaining an open source project, the contributor landscape is usually vastly asymmetric. That is, there are many more contributors than there are authorised reviewers who can merge your stuff. Hence, I as a member of the maintainer minority have to manage my time well.
Managing my time means that I can only review in batches, yet every patch (or pull request, which is the same for me) has to be reviewed before it gets merged into develop. If you now dump the yield of your 15 CPH (commits per hour) workflow on me in the form of 700 individual meaningless commits, rest assured that I will desk-reject you immediately no matter the merits of your work. My only route to ever reviewing such a PR is either to go through every commit trying to follow your (no offence) convoluted, confusing, and probably more-than-once-derailed train of thought — or I can review the mere end result, in which case you might as well have squashed it all.
On the other hand, I also cannot hold your hand and pair program with you and blocking your progress because I haven’t yet reviewed your most recent five commits is neither productive nor a sustainable workflow for both my mental health and your patience as a valued spare-time contributor.
Further, the documental value of a long micro-commit history is near zero. I much prefer a well-drafted and skilfully crafted descriptive summary of what the purpose of your patch is over a bunch of back-and-forth nonsense that says nothing about what the entirety of your code is supposed to do.
Here is an example of a bunch of “in-the-moment” commits on an old pull request of mine:
Each commit has the green checkmark saying the commit is fine, but the history is utterly useless for future documentation or any batch reviewer for that matter (particularly the first two and the last one in the list). The changes are from back when we were still in the process of figuring things out in general. Today I would have created more focused commits with dedicated tests, but for illustration purposes it shall suffice (in fact, the passing tests never touched any of the changes, since it is just an external helper script for building binaries, so the green ticks are a mere placebo).
Compare that to one of my more recent pull requests where I backported changes from develop to the stable branch:
Behind each of the grey dots hides a detailed description of what the patch does and what consequences you may expect (more on that later). Hence, the co-maintainer who has to review the patches has an easy time figuring out what is going on even if the individual change sets are a bit larger (usually, they aren’t massive, though). In fact, it is rather rare that we have a PR with more than one commit and if we do (yes, locally, we also “save our game”), we tend to squash them unless it makes sense to keep them separate. If we need more individual small patches, we simply submit more PRs. This way we keep the workload for the reviewers manageable while also maintaining a clean and expressive commit history for the future debugging of regressions.
Perhaps I should add that this is not to discourage you from submitting your PR early and then adding further commits to it. On the contrary! Add the WIP label and we can get an early impression of what you are trying to submit, so we can work with you and steer you in the right direction. Just don’t expect continuous in-depth reviews and merges into develop along the way.
Intermediate commits which do not pass should not exist
This is something that wouldn’t happen with sufficient discipline on the part of the submitter and a strict red/green workflow, but it does happen all the time with outside contributors (and even with us if we are working in our spare time).
The rule is: If you do not build and test each individual intermediate commit, I can guarantee you that your history is un-bisectable. I as a reviewer certainly do not have the time to ensure that your commits make sense and are atomic and green. But worst of all, our CI system has no time to do it either. We build 14 different configurations of your patch on various different platforms (including one where we lint your wretched coding style) and that takes time and clogs up the CI pipeline for other submissions, so we only build the top commit, leaving the others to rot. If one of your commits does not compile, we have a problem and I will curse you when I have to bisect a bug and get stranded in a void that does not let me determine whether the code works because I cannot compile it or — only marginally better — the tests do not pass. My only game here is to crawl linearly from commit to commit trying to find one that works and continue bisecting from there.
So if I did not care what you were thinking when you committed “Do X”, “Do Y”, “Do Z”, “Undo X” when I reviewed your code, why would I ever want to get back there when I have to bisect it? Macro-commits avoid this altogether. It may take a little longer to spot the exact line where a bug occurs once I found the offending commit, but that is easily offset by staying clear of the trench of undebuggable and — worst of all —non-descriptive commits.
Commit messages matter (a lot)
As an outside reviewer who is external to your thought process and did not pair-program with you, I need help and guidance understanding your code and your motivation behind it and you can assist me by writing clean and simple code (probably the hardest), adding comments inside the code, composing a good description for your PR (very useful, but doesn’t help me anymore when I am bisecting), and writing a descriptive commit message.
Unfortunately, the smaller your commits are, the less descriptive commit messages tend to become. An extreme example would be a message such as “Add else branch”, which is a prime example for a totally meaningless and uninformative commit message. If I look at the message alone, I get no information at all. We have thousands of conditional branches in our code, which one did you add an else branch to? Even on a file level it is non-descriptive and highly ambiguous. If, on the other hand, I look at the diff itself, I can already see that you added an else branch. It is all your commit is about and your message only states the obvious, so it is entirely redundant and you may as well leave it empty. What I want to know is not simply what you did, but also why.
On the other extreme are excessively long commits where it becomes increasingly difficult to describe each detail accurately and the once clear mapping of what part of the commit message maps to which part of the code becomes blurred. You want to avoid this as well. Thus, the goal is to strike a good balance between these extremes and to maximise the value of the commit message.
Here is an example of a descriptive and helpful commit message for a macro-commit. You may argue that there is (way) too much going on for one commit and, depending on the structure of your team and workflow, I may even agree with your assessment, but as a batch reviewer, it helps me more than a series of micro-commits and therefore saves me a lot of time:
Here, on the other hand, is an example of a very small and focused commit which you may even consider a typical micro-commit, yet I took the time to also document it thoroughly, which I wouldn’t have done if I committed only at this granularity (at least not if I am not paid for it):
And lastly: The changelog
This one is easy. Assembling a human-readable changelog from micro-commits is futile and you have to resort to other sources such as your burn-down chart or generally your system specification. We do not have that. We only have pull requests and commits. While a pull request could logically bundle multiple commits, it becomes easier if the commits themselves are also somewhat descriptive. An alternative is to maintain a separate changelog document along the way, which we are also experimenting with. Yet such a document may easily get out of sync.
To end this long rambling, let me summarise that micro-commit workflows work fine in fast-moving, controlled environments, but are utterly misplaced in asymmetric open-world scenarios where much of the code comes from outside (volunteer) contributors. Not only could we not impose this strict workflow on them and still expect to receive patches, even if they complied, they would most likely still get it wrong and impose more work upon us than if we had simply squashed their commits. So it makes total sense that (at least to my knowledge) very few, if any, medium-sized or large public open source projects have adopted this workflow and generally prefer contributions in larger chunks that have been split up by what they do and not by how they came to be. At least the Linux kernel folks do and we do, too.
So let me end with the original author’s argument
When I find people doing very large commits, or “squashing” all their intermediate commits together into one mega-commit, I flinch. I don’t want to do the code review on that bundle.
to which I can firmly retort that I flinch when I have to review pull requests whose commits have not been (sensibly) squashed. Macro-commits certainly do have a place, even if it means that my GitHub profile page looks like that of a lazy loser.