Show HN: Build the habit of writing meaningful commit messages
github.comToo often I find myself being lazy with commit messages. But I don't want AI to write them for me... only i truly know why i wrote the code i did.
So why don't i get AI to help me get that into words from my head?
That's what i built: smartcommit asks you questions about your changes, then helps you articulate what you already know into a proper commit message. Captures the what, how, and why.
Built this after repeatedly being confused 6 months in a project as to why i made the change i had made...
Would love feedback!
It probably depends on the codebase, but I find the best motivation for writing solid commit messages is reading commit messages. Tools like gitlens make this really easy.
Almost daily, I use commit messages and history as part of understanding why a decision was made, why a seemingly obvious alternative wasn’t chosen, etc. seeing the commit title on every line, and hovering to see the full message has become a core editor feature for me.
It’s kind of like testing, the more I do it, the more I want to do it because the value is so consistently reinforced.
There’s nothing like being able to track down exactly why a decision was made 6 years ago in a part of the code base you are struggling to understand written by someone who left before you joined the team.
10,000% this. Attaching JIRA tickets, etc. to the commit helps for searching as well. I've worked with a number of people who do not believe in this and it drives me insane ; I try to enforce it, but there's a lot of messages like "fixed bug" that have zero context or detail associated with them.
I don't understand why so many engineers are like this.
Attaching ticket numbers has always been enforced by automated checks wherever I have worked, so it is not necessary to “try” to enforce it.
Similarly with AI it is fairly simple to have eg a pre-merge check that validates the commit msg is somewhat useful. This could be implemented for example with GitHub org level checks that must run in a PR.
YOU should be writing your commit messages, not an AI.
You can always generate a new commit message (or summary, alternative summary, etc) down the road with AI. You can never replace your mind being in the thick of a changeset.
The author of the commit doesn't matter per se. If someone is just having AI summarize their changes and using that as the commit message, I agree that they're doing it wrong.
These days, lots of my commit messages are drafted by AI after having chatted at length about the requirements. If the commit message is wrong or incomplete, I'll revise it by hand or maybe guide the AI in the right direction. That tends to be a much more useful and comprehensive description of the commit's intent than what I would naturally find worthwhile to write on my own.
OP's approach is interesting as well, at least in principle, and if it works well it might be the next best option in the absence of a chat log. It should just make sure to focus on extracting the "why" more than describing the "what".
If we assume, as many do, that we are going to delegate the work of "understanding the code" to AI in the coming years, this surely becomes even more important.
AI writing code and commit messages becomes a loop divorced from human reasoning. Future AIs will need to read the commit history to understand the evolution of the code, and if they're reading poor summaries from other AIs it's just polluting the context window.
Commit messages are documentation for humans and machines.
I have completely opposite opinion on this.
Writing commit messages is one of these mundane chores I’d gladly delegate to LLMs which are very very good at this kind of thing.
I mean, if you really know you code, you know it, there is no much value in reinforcing it in your head one more time via writing comprehensive commit messages - it’s a waste of time, imho.
I agree in principle, but in practice, it's horrible right now.
Most AI generate commit messages and PR descriptions are much too verbose and contain 0 additional informational value that couldn't be parsed from the code directly. Most of the time I'd rather read 2 sentences written by a human than a wall of text with redundant information.
Neither the code nor the AI know WHY a commit it being made.
This context should at the very least be linked.
Man, 99% of non-bug-fix commits don't have a why other than "advance the current task".
Almost all commits live in tandem with some large feature or change being made. The reason for absolutely all of them is the same - build the thing .
>other than "advance the current task"
How do you expect someone to know what “the current task” was when they’re tracking down a bug 2 years down the line?
Perhaps this is about commit granularity. If keeping the history about advancing the task is not useful, then I’d merge these commits together before merging the PR; in some workflows this is set up to happen automatically too.
Then write that and link to the current task. That's the why. You don't need an LLM for that.
Sounds like you haven't been working long enough to forget your decisions, which you WILL do eventually. In such cases, where you're looking at code you wrote 10 years ago and you find a weird line, when you view the git blame and read the commit message, you'll be very thankful that you explain not just "what" you did, but "why" you did this, something an AI will have a very hard time doing.
You don't have to if you don't want to, but if you think "this commit message is just a summary of the changes made", you'll never write a useful commit message.
I’ve been working in the industry for two decades, and I think commit messages is not the best place for storing decisions and associated context. I personally prefer ADRs.
Two decades and you don't see any value in writing down what's currently in your head?
Anyhow, ADRs are good, but they stand for Architectural decisions, not every decision is at that level.
In general, if there's a better place to store explanations, do use it, but often, in many projects, commit messages are the least bad place; and it's enormously better to write there than nowhere at all.
That’s why you put the comment in the code
A lot of developers are afraid of referring to themselves in a commit message. I think this is a mistake. Treat a commit like an email accompanying a patch. Explain to the reader (probably you, five years from now!) why you’re making the changes you are in a conversational, but technical tone.
This reminds me of the time when I put extra effort to write good commit messages, explain 'the theory'[1] behind when it makes sense, name Chesterton's fences and generally provide guidance to the future engineer who has to look through this to understand why something was done in a certain way.
This was a delayed project running out of budget and everything had to be completed within a few months. However, the management in it's infinite wisdom also did a complete source code/issue management platform change.
The person who did the repo migration went with the default settings. (I believe this was the case. I forget the details. I would also only half blame the person because everything was rushed)
Everything up to that point was committed with "Made by so and so bot".
This was way ahead of Google was allegedly committing "well over 30%" of code by AI. I witnessed the true pioneers in this space.
[1] https://pages.cs.wisc.edu/~remzi/Naur.pdf
Most of the time ROI is not there just like with unit tests.
You don’t need 98% of commit messages ever again.
Yes when you need those 2% most likely it is for important reasons but usually not so important to make all the other mulled over.
I don't find commit messages useful for historical reasons (git log), I squash them anyway when merging PR. Commit messages are very helpful to a reviewer to get a necessary context and intent behind the change. Without it, I need to figure out it my self when reviewing. It is very easy to get a habit of committing often and in AI era i don't write commits anymore. So there is no excuse.
https://github.com/arpxspace/smartcommit/commit/cc677f7bd210...
This is just a complete braindead commit. Without looking at the code I could probably take 5 minutes to make sense of the commit message, being intrigued something interesting or important is happening. The message is massively over the top, it has way more text then actual code changes. It wastes time.
I am not against AI as a helper in various places. But if possible it should be an opt-in tool if deemed useful. If someone wants to get a summary about a non trivial commit, that can be useful. Even better if the committer writes about the intentions and reasons for the commit, so an AI could match those with the actual code. Don't reiterate whats happening in a patch. Give the meta that isn't there or less obvious. Please.
You see the same with code and text generated by LLMs. Overly verbose, comments that repeat code, and commit messages that repeat what is done in code, but not WHY.
For that matter,
> The full path specification in `go build` was redundant given the context of how Go modules are structured. Simplifying the instructions improves clarity and reduces confusion for new users or contributors.
The explanation doesn't seem quite right. The module mentioned in that command was moved to the project root, in such a way that the command no longer needs to specify a path. So the full path specification wasn't redundant; the updated version of it, became redundant.
And all of this was done in a single commit. Better (disclaimer, I have no experience using Go. Actual Go developers probably don't even need to be told this much):
and then: and then: (No explanation required for the last one.)There's no need to justify that your changes are "in accordance with best practices", tell a story about "ongoing efforts" (unless you actually have other recent commits that you want to group together like that conceptually), etc. Commit messages are for other developers. Another developer who reads, in effect, "this change was made in the hopes that YOU will have an easier time contributing to the project"... is going to feel patronized.
But making fine-grained commits with short messages will help in the long run. No amount of prose in commit messages can actually organize the commits. Meanwhile, the AI's summary completely ignored a change I would recommend splitting out into a separate (third) commit.
This is great advice.
> There's no need to justify that your changes are "in accordance with best practices", tell a story about "ongoing efforts" (unless you actually have other recent commits that you want to group together like that conceptually), etc.
LLMs are very prone to generalisation and marketing language like this. Despite being sycophants, they are also trained to speak as if they constantly have to justify and persuade.
This is called out in the Wikipedia meta I linked to in another comment. They're great red flags to look out for in any writing; humans, myself included, often used this kind of lazy construction!
A bit of prompt engineering should do the trick.
If you are interested in writing better commit messages on your own, Google's advice makes for a good starting point:
https://google.github.io/eng-practices/review/developer/cl-d...
I also recommend Zulip's guidelines.
https://zulip.readthedocs.io/en/latest/contributing/commit-d...
Thanks for sharing this! This kind of quick and easy grasp one pagers is how this type of things should be done. We should drum it up loud whenever we can.
Core AI model interaction is here https://github.com/arpxspace/smartcommit/blob/main/internal/...
It is fun you have to pre-prompt with: You are an expert software developer.
yes lol...
I also found i had to hold llama3.1's hand more than gpt4o but i suppose that is a given since it's a much smaller model.
No.
Honest to God, this would discourage frequent commits.
Which will lead to a lot of work not being committed.
Thoughtful messages are for PRs.
Alternative: I do a lot of commits just marked "wip" as I go, then when I'm ready to consider a PR, I rebase my branch into something for public consumption. Some commits will get thoughtful messages when warranted, others will get one line, and the PR description will tie it all together with screenshots and links to the most interesting parts.
(On a good day, that is. Though even on a bad day I don't let "wip" commits into the main branch.)
Nice! Most of my commit messages end up being something like "fixes and stuff, this works now!". This can be pretty helpful
> Built this after repeatedly being confused 6 months in a project as to why i made the change i had made...
That might indicate an underlying problem that can’t be fixed with AI. ;)
> strictly enforces the Conventional Commits specification (feat, fix, chore, etc.).
Nope. Waste of bytes in my commit message header that are better done by git trailers.
Otherwise, I love the idea of the tool. I personally try to answer “why does this commit exist?” when I create commits.
I have a coworker that makes every branch into a story about wizards, elfs, or whatever. There's a whole arc that explains the story of the commit in a fun way. I have no idea how he comes up with it all for the past 10 years.
rotfl can you upload some of that somewhere?
I find that the CC structure helps me scan a list of recent commits, e.g. when squashing something into a past commit or trying to find where to navigate.
I don't spend a lot of time on trying to come up with scopes etc, I just make sure that my commit does one thing that fits the label.
These categories can be useful, as they indicate part of the “why”. I just heavily dislike the value judgement of “chore”.
Interesting, never knew about git trailers - will have a look!
I agree, I hate conventional commits. Why the hell do I care if changes are chores or features? I want to know what the change was.
I'm surprised to read that someone not only finds no value in conventional commits, but actively hates it. Wow.
A few reasons to care about CCs:
- The first few characters of a commit message tell you immediately the type of change you should expect. This tells you part of the "what" at a glance. If you're looking for a bug fix, for example, you can safely ignore any other type of commit.
- Thinking about the type of change you're committing helps you create atomic commits. Anything that is not strictly related should go in a separate commit. Hopefully you already know why you should care about this.
- A conventional commit message also often includes the change scope. This is a handy way to indicate the subsystem that was changed, which is also useful for filtering, searching, aggregating, etc.
- They help with writing change logs. I'm a strong proponent of the idea that change logs shouldn't be just autogenerated dumps of commit messages, but carefully redacted for the intended audience, and CCs can help with grouping changes by type or scope. These days LLMs do a decent job at generating this type of changelog (even though it should still be manually reviewed and tweaked), and the additional metadata provided by CCs helps them make it more accurate.
I'm with Ish on this one.
> The first few characters of a commit message tell you immediately the type of change you should expect.
1. Why do I care about this particular classification of "type" of change?
2. "The first few characters" of the message aren't actually what I necessarily see first, anyway.
> If you're looking for a bug fix, for example, you can safely ignore any other type of commit.
1. If I'm looking for a bug fix, I'm using tools like git blame and git bisect.
2. How often do bugs actually get fixed by a single commit, that has that bug fix as their sole purpose, and which is recognized as a bug fix at the time of writing? I'm guessing it's much lower than one would naively expect.
3. If I'm looking for a bug fix, I'm looking for the fix for a specific bug, which is probably most recognizable by some bug tracker issue ID. (And if not, it's most searchable that by figuring out an ID and looking that up). So I'm scanning lines for a # symbol and a number, which I would definitely not expect to be at the start of the line.
> Thinking about the type of change you're committing helps you create atomic commits. Anything that is not strictly related should go in a separate commit. Hopefully you already know why you should care about this.
Yes. And I do this by thinking about a verb that naturally belongs at the beginning of the sentence (fragment) describing the commit. "Bugfix", "feature", and "enhancement" aren't actions.
The discipline of organizing commits is orthogonal to the discipline of labeling them.
> A conventional commit message also often includes the change scope.
One that is thoughtfully written by hand will naturally include the scope of the change any time that this concept is meaningful.
Eh, I wouldn't say it's a waste of bytes. Conventional Commits are useful for many scenarios.
This metadata could also be added via trailers, but most Git UIs don't show them prominently, or at all. So prefixing the subject is still the way to go.
I’d rather not hack my process around the deficiencies of someone else’s poor UI choices. As long as the data is stored in a powerful, structured format, you can serve that up in any way that’s sensible to you with `git log` and other shell friends.
Meaningful commit messages are meant to be read later and be useful. I'm not going to read through 8 sentences of generated slop, would rather look through the diff.
Obligatory "show me the prompt" https://news.ycombinator.com/item?id=39374249
I'm assuming the commits in the repo were generated with the tool itself. In that case, commit `cc677f7` has a (in my opinion) terrible commit message. It starts out with a listing of stuff I could just as well read from the patch. It then contains another list that tries to explain the why, but it ends up being useless fluff like `The full path specification in `go build` was redundant given the context of how Go modules are structured.` and `streamlining the project structure and reducing unnecessary directory complexity.` which tells me exactly nothing about why those changes were made.
It generates a whole lot of text that makes me none the wiser as to why you wanted to do any of those changes. It feels like a robot trying to justify the changes post hoc. Which it of course is, so that's understandable.
Don't take this comment as rudeness BTW. It's cool that you're making a fun little tool. I'm assuming you care about writing more useful commit messages, so I thought I'd give you some feedback on that part.
Yeah totally see where you're coming from, i seemed to have been slightly lazy with that commit... However, the tool does ensure that the dev has the final say; it will open the user's editor with the commit message that the ai has 'drafted' so the dev can make necessary changes - it provides a starting point that a dev can then tailor.
> However, the tool does ensure that the dev has the final say; it will open the user's editor with the commit message that the ai has 'drafted' so the dev can make necessary changes - it provides a starting point that a dev can then tailor.
The problem with this is that it still biases people towards including useless fluff. I'd almost rather have no commit mesasge whatsoever (so I at least know there's nothing of value there) rather than having to read through paragraphs of text to determine that there was nothing useful to read. I'd much rather have a terse one line sumamry that includes the gist of the intent of the change than a bunch of waffle.
(I'd rather have 1-2 paragraphs of a well-written, accurate description of the content than any of that, but AI unfortunately isn't capable of that).
> The problem with this is that it still biases people towards including useless fluff.
The developer now has to choose "do I spend the time to make this commit message better?" or just skim it and say "yeah that's good enough."
I applaud your desire to write better commit messages and not be lazy. Not every commit deserves the attention, but being able to turn on "I am definitely going to leave a precise record for the next person to see this diff" is a great skill to have.
However, I feel like your approach here is a little backwards. By getting the AI to come up with the commit messages, you're actually removing the chance for the human, you, to practise and improve.
I'm a real fan of Kahneman's "thinking fast" and "thinking slow" paradigm. By asking the human to review and approve the commit message, you're allowing them to "think fast", instead of doing the challenging, deliberative "thinking slow" of actually writing what you mean.
While getting the LLM to ask you questions about what you did and why is better than just one-shotting the commit message from the diff, it still lets you reply "reactively" and instinctually, using your "fast" gut thinking, instead of engaging the slower attentive processes required to write from scratch.
Now there are a couple of other posters here critiquing the commit messages in this repo's history. I think that's fair, but by your own admission you are learning, and this is a small and new project! Probably most commits should be along the lines of "getting a thing working", not essays about the intricacies of character encoding:
https://dhwthompson.com/2019/my-favourite-git-commit
But the commits we can see are already demonstrating some of the pitfalls of LLM generated language.
From a recent commit,
"This update enhances user interaction by explicitly addressing scenarios with large diffs, directing users towards feasible actions and maintaining workflow continuity."
This comes after a detailed breakdown of the diff. It is too vague to stand alone without the preceding detail (e.g. 40k character limit) but also doesn't explain them. Why 40k characters? Why any limit at all? Words like "enhances" and "feasible" are filler - be concrete instead.
This article on wiki has fantastic advice about ways that LLM writing fails, more along the lines of what I've just pointed out:
https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Writing well is hard, never "effortless" as your readme advertises. Sadly, good results have to come from hundreds of hours of hard and uncomfortable work. Truth is rare and precious and difficult to come by, and even when we glimpse it, turning it into words is a whole nother story. I hope you can continue to develop this tool to help you learn and train your own writing, rather than avoid it.
More drive-by criticism of your commits, meant in a sincere attitude of helping you learn.
As best I can tell, lots of your commits seem to be including several unrelated changes.
This means commit messages become longer as they have to explain more things, and they also end up explaining the diff so that you can fit more on one page.
I'd suggest getting in the habit of making coherent commits with one change each. Some changes will be trivial, and the diff will be self explanatory. Then you can save your writing effort for commits that are challenging.
On the other hand, if I'm wrong and many changes to have to get bundled, then the commit message would be a good place to explain why.
I wrote more on the "primitives" and what I think of as the "physics" of commits here: https://crabmusket.net/2024/thoughts-on-git-commits-branches...
> Probably most commits should be along the lines of "getting a thing working", not essays about the intricacies of character encoding:
I'm not a fan of that commit as a commit, although it would make a great start for a blog post. The explanation of how the issue was tracked down, is not helpful in understanding what the issue is. On the other hand, while the author describes finding (and replacing) a non-ASCII character masquerading as a space, it would have been more interesting to know what character it was.
I agree that explaining the "why" is useful, but in this case I don't think it deserves much more detail than "ensure the file uses only ASCII characters to avoid a text encoding error while running tests in this specific manner". (I guess I can also see the argument for showing the error message for later searches, too....)
I agree with you.
More on the subject: https://mtlynch.io/no-longer-my-favorite-git-commit/
I'd rather have that commit message than one that doesn't explain anything, but it's a bit verbose to my taste because I don't really care how he discovered the issue. I really just need to know what and why.
So let me link to my favorite author of consistently excellent commit messages, Jeff King on the git project itself:
https://github.com/git/git/commits?author=peff
To pick just one, here's a well explained single-line code change. It's subtle, so besides the excellent commit messages, he also adds a comment and a couple tests:
https://github.com/git/git/commit/1940a02dc1122d15706a7051ee...
Another example with an even greater ratio of explanation (10 paragraphs) to code (partial line change):
https://github.com/git/git/commit/8f32a5a6c050766bfa2827869e...
Oh these are great examples, thank you!
This is amazing, thank you! Will definitely take this on board for the next iteration of this tool. I wholeheartedly agree with the perspective of:
> develop this tool to help you learn and train your own writing, rather than avoid it
Will be striving for this for sure.
Best of luck! A couple of clarifications as I was writing from the train and didn't do a second draft:
By "backwards" I meant to suggest, have the LLM critique a commit message you wrote. Have it point out vague language, weasel words, generalisation and marketing terms.
The wiki article is good advice for writing in general, not limited to LLMs.
I was expecting a tool like this to exist. Kudos for actually caring about commit messages. As you can see from some of the comments here, there's a growing cohort of developers who simply don't, which is a shame.
I like the implementation, and how it asks you questions to get you to answer why a change was made, instead of making things up, or simply regurgitating what the code does.
I still wouldn't trust it to be accurate and would have to review it, and I personally dislike the default "LLM style", and I wouldn't want to read these messages or subject other people to them, so I won't be using your tool, but thanks for building and sharing.
All good - appreciate the feedback!
Writing meaningful commit messages is an absolute time-wasting nonsense.
All my commit messages are a mess and I spend the time that I have designing and writing code, not figuring out how to make the commit message look pretty. That's what merges are for. I don't see the value of this tbh. Just pedantic time wasting.
When these people say "commit" they're referring to the same concept you "merge".
I have branches that I work in and fork out from in every direction before merging everything back into master. My branches are messy because they're works in progress so I don't care about the commit messages. For the final merge back into master its a high-level overview of changes. I don't try detail every change. I honestly think that devs obsess too much over this clean commit history thing and that it looks to me like pedantic OCD.
Not really, necessarily
ok, but did they have to make commit message required, or is there a way to disable it? i think of git as checkpoints, nothing more. the day i have to explore history is the day i quit.
To understand your point of view, is this because you have seen very bad commit messages (I have too!) like "fixed" repeating 700 times and now come to believe it's a pointless thing in the grand scheme of things anyway?
> the day i have to explore history is the day i quit.
Huh? I do that all the time, and it's really useful. What is difficult or problematic about it?
One particular cynical reading of that could be "the day I'm held responsible for my code is the day I quit".
totally. the fact git-quicksave isn't a standard command that commits with an "Autosave" message is pretty short-sighted.
This thread makes me weep for software engineering.
If it wasn't obvious I am not serious. On the contrary I'm actively making contributions to jujutsu (none involving a quicksave command... yet)