Git is a flexible and powerful version control system. While Git offers significant functionality over legacy centralized tools like CVS and Subversion, it also presents so many options for workflow that it can be difficult to determine what is the best method to commit code to a project. The following are the guidelines I like to use for most software projects contained within a Git repository. They aren't applicable to every Git project (especially those hosted on drupal.org or GitHub), but I've found that they help ensure that our own projects end up with a reasonable repository history.
CVS and Subversion encouraged large, single commits due to limitations in their branching model. This is especially apparent with the Drupal project, where a single-commit patch-based workflow is still in use. There are a few problems with large monolithic commits:
git blamebecomes much less useful, as the commit message on given lines of code will usually be something like "Ticket #123: Add progress bars to video series." instead of "Ticket #123: Add updated jQuery UI library for progress bars."
- History of the development of code is lost. With large commits, the process of code development is obscured and not discoverable within git.
git bisectbecomes near useless. Even when debugging manually by checking out different commits, having granular commits makes it much simpler to find the lines of code that are actually the source of the bug.
For a Drupal project, a few guidelines I use for commit size include:
- Always add or update modules in their own commits. Never bundle multiple modules in the same commit unless there is tight coupling between the modules.
- When writing new modules, write stub functions and phpdoc comments first. Then, come back to each function and fill them out, committing along the way.
- Always write and commit API-level functions before writing and committing consumers of those functions (such as forms, menu callbacks, and theme code).
- If a commit is more than 100 lines of code, re-evaluate it to see if it's actually a few different logical changes.
- Always commit unrelated bug fixes to your branch as separate commits, or as a separate commit on a new branch.
It's common for introductory git tutorials to suggest always committing code with
git commit -a without accurately explaining what the command does. By treating the commit command like how other legacy VCS' do, one of the the most useful features of Git is ignored.
Git introduces a new tool to the version control workflow interchangeably called the "staging area" or the "index". It sounds complicated, but it's actually a very simple tool. In essence, the index is a place to indicate what exactly to include in the next commit. This can be as broad as an entire directory or file, or as granular as specific changes in a file (excluding other unrelated changes).
git commit -a is just shorthand for telling git to add all uncommitted changes (except for new files) to the index, and then immediately start the commit process.
A much better method for the commit process is to explicitly review what is to be committed. Git includes an awesome tool for this in the form of
git add --patch. This command will show changes in your code, and ask if they should be committed or not. Sometimes, Git will show a large diff that is actually a few small changes. In this case, "s" will split the diff into smaller chunks that can be individually acted on. If needed, the change can be manually edited to indicate exactly what should be committed. The commit process ends up looking something like this:
git add --patchto add changes to the index.
git diff --cachedto do a final review of what is to be committed.
git committo commit what is in the index.
- Repeat at step 1 until there is nothing left to commit, or there are uncommitted changes that need more work.
One small caveat is that
git add --patch will not add brand-new files to the index, but only files that have previously been added to the repository. In that case (such as adding a new module) use
git add directly to start tracking the new files.
Rebasing is a powerful feature of Git that is both awesome and dangerous. Rebasing allows you to rewrite the history of a branch into something new. Most commonly
git rebase is used to move where a branch appears to start from and to rewrite it to be on top of a new commit.
git rebase --interactive is also an excellent tool for rewriting commits to amend in typo fixes, rewrite commit messages, or change the order of commits to accurately describe dependencies in code. With GitHub projects in particular, rebasing is commonly used to keep history straight and free of merge commits.
The issue with rebasing shared (or pushed) commits is that doing so requires a "forced push" and automatically invalidates any work others might be doing on that branch of code. It obscures the actual development history of a branch in favour of arbitrary cleanliness of the Git history graph. Rather than forbid rebasing entirely, I have a rule that I never rebase commits that I've pushed to a remote repository. This ensures I don't break anyone else's code that they've committed to my branch, and keeps a log of any bugs or mistakes I've fixed.
One serious difference between Git and Subversion is that branch addition and removal are not commits themselves. A Git branch is just a pointer to a commit. While in Subversion a deleted branch can be restored just by checking out an old revision, in Git a commit not pointed to by any branch will eventually be removed by the garbage collection process. So, how do we handle obsolete branches so they can be referenced if needed, without cluttering up the
git branch listing?
git merge -s ours obsolete-branch
This will merge obsolete-branch into the current branch, but completely discarding the changes in the obsolete branch. I usually make it clear in the commit message for the merge that the branch is being discarded instead of a true merge.
git merge -s ours --edit obsolete-branch
If the old changes are ever needed for reference or to be resurrected, it's as easy as checking out the last commit on the merged branch and creating a new branch pointing to it.
Git is easily extensible and configurable. It's possible to add custom git commands to ~/.gitconfig or to write entirely new top-level commands in whatever language you prefer. Git is best thought of as being more like another Unix shell than a monolithic program. I'm partial to git-sh, a tool that exposes git commands directly ("commit" instead of "git commit", for example) and works nicely with autocomplete and a bash prompt. Other extensions to simplify common Git commands and workflows are easily searchable on Google or GitHub.
These guidelines are just some of what I use to help organize and manage projects in Git. Have any great tools or suggestions I missed? Hop on down to the comments and let us know!