by Andrew Berry on June 14, 2012 // Short URL

Git Best Practices: Workflow Guidelines

Git is a flexible and powerful version control system. While Git offers significant functionality over legacy centralized tools like CVS and Subversion, it also presents so many options for workflow that it can be difficult to determine what is the best method to commit code to a project. The following are the guidelines I like to use for most software projects contained within a Git repository. They aren't applicable to every Git project (especially those hosted on drupal.org or GitHub), but I've found that they help ensure that our own projects end up with a reasonable repository history.

Small, logical commits

CVS and Subversion encouraged large, single commits due to limitations in their branching model. This is especially apparent with the Drupal project, where a single-commit patch-based workflow is still in use. There are a few problems with large monolithic commits:

  • git blame becomes much less useful, as the commit message on given lines of code will usually be something like "Ticket #123: Add progress bars to video series." instead of "Ticket #123: Add updated jQuery UI library for progress bars."
  • History of the development of code is lost. With large commits, the process of code development is obscured and not discoverable within git.
  • git bisect becomes near useless. Even when debugging manually by checking out different commits, having granular commits makes it much simpler to find the lines of code that are actually the source of the bug.

For a Drupal project, a few guidelines I use for commit size include:

  • Always add or update modules in their own commits. Never bundle multiple modules in the same commit unless there is tight coupling between the modules.
  • When writing new modules, write stub functions and phpdoc comments first. Then, come back to each function and fill them out, committing along the way.
  • Always write and commit API-level functions before writing and committing consumers of those functions (such as forms, menu callbacks, and theme code).
  • If a commit is more than 100 lines of code, re-evaluate it to see if it's actually a few different logical changes.
  • Always commit unrelated bug fixes to your branch as separate commits, or as a separate commit on a new branch.

Always review code before committing it

It's common for introductory git tutorials to suggest always committing code with git commit -a without accurately explaining what the command does. By treating the commit command like how other legacy VCS' do, one of the the most useful features of Git is ignored.

Git introduces a new tool to the version control workflow interchangeably called the "staging area" or the "index". It sounds complicated, but it's actually a very simple tool. In essence, the index is a place to indicate what exactly to include in the next commit. This can be as broad as an entire directory or file, or as granular as specific changes in a file (excluding other unrelated changes). git commit -a is just shorthand for telling git to add all uncommitted changes (except for new files) to the index, and then immediately start the commit process.

A much better method for the commit process is to explicitly review what is to be committed. Git includes an awesome tool for this in the form of git add --patch. This command will show changes in your code, and ask if they should be committed or not. Sometimes, Git will show a large diff that is actually a few small changes. In this case, "s" will split the diff into smaller chunks that can be individually acted on. If needed, the change can be manually edited to indicate exactly what should be committed. The commit process ends up looking something like this:

  1. git add --patch to add changes to the index.
  2. git diff --cached to do a final review of what is to be committed.
  3. git commit to commit what is in the index.
  4. Repeat at step 1 until there is nothing left to commit, or there are uncommitted changes that need more work.

One small caveat is that git add --patch will not add brand-new files to the index, but only files that have previously been added to the repository. In that case (such as adding a new module) use git add directly to start tracking the new files.

Never rebase shared commits

Rebasing is a powerful feature of Git that is both awesome and dangerous. Rebasing allows you to rewrite the history of a branch into something new. Most commonly git rebase is used to move where a branch appears to start from and to rewrite it to be on top of a new commit. git rebase --interactive is also an excellent tool for rewriting commits to amend in typo fixes, rewrite commit messages, or change the order of commits to accurately describe dependencies in code. With GitHub projects in particular, rebasing is commonly used to keep history straight and free of merge commits.

The issue with rebasing shared (or pushed) commits is that doing so requires a "forced push" and automatically invalidates any work others might be doing on that branch of code. It obscures the actual development history of a branch in favour of arbitrary cleanliness of the Git history graph. Rather than forbid rebasing entirely, I have a rule that I never rebase commits that I've pushed to a remote repository. This ensures I don't break anyone else's code that they've committed to my branch, and keeps a log of any bugs or mistakes I've fixed.

Never delete unmerged remote branches

One serious difference between Git and Subversion is that branch addition and removal are not commits themselves. A Git branch is just a pointer to a commit. While in Subversion a deleted branch can be restored just by checking out an old revision, in Git a commit not pointed to by any branch will eventually be removed by the garbage collection process. So, how do we handle obsolete branches so they can be referenced if needed, without cluttering up the git branch listing?

git merge -s ours obsolete-branch

This will merge obsolete-branch into the current branch, but completely discarding the changes in the obsolete branch. I usually make it clear in the commit message for the merge that the branch is being discarded instead of a true merge.

git merge -s ours --edit obsolete-branch

If the old changes are ever needed for reference or to be resurrected, it's as easy as checking out the last commit on the merged branch and creating a new branch pointing to it.

Make your Git toolbox your own

Git is easily extensible and configurable. It's possible to add custom git commands to ~/.gitconfig or to write entirely new top-level commands in whatever language you prefer. Git is best thought of as being more like another Unix shell than a monolithic program. I'm partial to git-sh, a tool that exposes git commands directly ("commit" instead of "git commit", for example) and works nicely with autocomplete and a bash prompt. Other extensions to simplify common Git commands and workflows are easily searchable on Google or GitHub.

These guidelines are just some of what I use to help organize and manage projects in Git. Have any great tools or suggestions I missed? Hop on down to the comments and let us know!

Andrew Berry

Senior Drupal Architect

Want Andrew Berry to speak at your event? Contact us with the details and we’ll be in touch soon.

Comments

Anonymous

This is a great list of do's

This is a great list of do's and don'ts, but it doesn't really suggest a best practice workflow for branches.
How do you best make use of branching to aid development?
Do you break specific features out into branches?
Do you fix all bugs in branches? Or only more complex ones?
Do you ever work directly off master branch?
How do you go about testing/approving changes in branches, so they can be merged back to master?

Cheers!

Reply

Clint Randall

"Best" practices vs practical practices

Great questions from the previous poster!

I've found it difficult to find an always best solution for branch management. Some projects fit neatly into the branch per feature model while others are a string of frequent hot fixes and minor tweaks that make branching unnecessarily cumbersome. It also has a lot to do with whether theres one or many developers on a given project and how many overlapping development efforts are going on at one given time. Where we've often landed is a philosophy of keeping master inline with production ( aside from the occasional need to revert temporarily and fix something ) and this is where we do a lot of minor tweaks that affect one or a few files and that are pushed into production as hot fixes with urgency. However, anything that might possibly affect other areas of the application or interfere with another developer gets a branch, which is the majori of cases. On the other hand, some clients have every production change bundled with scheduled releases and in that case we always branch everything. My point is that I think it's a case by case determination where you start with a preference and then adjust to the reality of the situation.

Great topic!

Reply