Using git on subversion projects

Despite all the noise lately about distributed version control systems, the chances are any given project you want to work on today will be using Subversion. But that’s OK, you can still get the benefit of all the advanced features of git by using it as a “front end” to subversion.

Before I get into the “how”, why would you want to do this?

The most obvious benefits are having a full local history, and cheap local branching. It’s trivial in git to create branches for features you’re working on, and then easily switch between them. Say you’re working on a feature for the next release, and an urgent bug for 1.0 comes in. Simply:

$ git commit -m "work in progress"
$ git checkout --track -b fix-urgent-bug-1234 release-1.0

...hack hack...

$ git commit -m "fixed bug #1234"
$ git checkout cool-feature-foo

and continue where you left off.

There’s also a bunch of other neat stuff in git that I miss whenever I have to use something else (keep in mind that I’m no svn guru, so there may be similar things in svn if you look hard enough. But I very much doubt they’re as fast). git grep for rapidly searching source trees. gitk for visualising branches and interactively searching for commit messages and changes. Local commits. Oh, and everything is much faster.

OK, on to the how.

We start by checking out the svn repo:

$ git svn clone -s http://svn.example.com/svn/cool-project

The -s switch means “standard layout”, i.e. the recommended subversion usage of trunk/branches/tags. If your project doesn’t follow this convention, you can specify the names of the subdirectories used:

$ git svn clone --trunk=MAIN --branches=branches --tags=releases \
    http://svn.example.com/svn/cool-project

There are lots of other options to clone that can help if you have a really non-standard repo to work with. Check the init command in man git-svn(1).

You should now have the HEAD of trunk in a directory called “cool-protect”. (You can specify a different target directory name by appending it to the git svn clone command.)

The full power of git is now at your command! You can grep the source tree:

$ git grep '^class Model('
django/db/models/base.py:class Model(object):
tests/modeltests/invalid_models/models.py:class Model(models.Model):

Find the git commit corresponding to a subversion revision:

$ git svn find-rev r1234
c5dfec042453672a27fd19ff81131edd01145584

$ git show c5dfec0
commit c5dfec042453672a27fd19ff81131edd01145584
Author: Michael Rowe <mrowe@mojain.com>
Date:   Sat Feb 16 10:14:57 2008 +1100
...

And interrogate the full history of the repo:

$ cd ~/src/django
$ git log '@{3 weeks ago}' -1
commit 696a3322d6709ebffcc436eb6188ea4d769ebfc5
Author: mtredinnick <mtredinnick@bcc190cf-cafb-0310-a4f2-bffc1f526a37>
Date:   Mon Feb 4 04:57:56 2008 +0000

    Fixed a simple TODO item in one error path of the "extends" tag.

In the time we’ve been playing with this, maybe some changes have been committed upstream. To make sure our local repository is up to date, we rebase:

$ git svn rebase

You could also just use git svn fetch to fetch the upstream changes into the repo without rebasing your working tree. In general, I would avoid this unless you know what you are doing, since it can make things complicated when you go to merge and push your changes upstream. If you are working on the main trunk of the svn repo, rebase is almost always what you want.

So now we have an up to date checkout, lets get to work! As you work, add files to git’s “index” and commit to the repo. Commits in git are fast, and should be used almost as frequently as saving a file in your editor. You can always consolidate these “micro-commits” into larger feature or bug fix commits later.

...hack hack...

$ git add src/module.py src/other.py
$ git commit -m "I did stuff"

and repeat.

Note for subversion users: you have to tell git about every file you change, even if it’s not a new file. Details of git usage are beyond the scope of this article (there are some excellent starting points), but be aware that you have to git add each file you want included in a commit.

(Note for lazy git users: This can be combined into a single command for existing files:

$ git commit -m "I did stuff" src/module.py src/other.py

but I tend to prefer the two-step approach for anything but the most trivial changes.)

As you work, you can periodically sync with the upstream subversion repo to get other people’s changes:

$ git svn rebase

This won’t work if you have any local uncommitted changes. However, you can “stash” them away temporarily (in git 1.5.3 and later):

$ git stash
$ git svn rebase
$ git stash apply

In any case, as mentioned above, you want to commit locally as often as possible.

When you have finished work on a feature or bug fix that you want to push back to the subversion repository, make sure all your changes are committed locally to git (git status), then review what you’ve done:

$ git log origin/trunk (by default, or whatever svn branch you're on)
$ git diff origin/trunk

Finally, when you are happy with the work you’ve done and are ready to push it up to subversion:

$ git svn dcommit

This will create individual svn check ins for each git commit since the last upstream revision. If you want to combine local commits into one large svn check in (e.g. because you followed my advice above and made frequent local commits), the interactive rebase command will help:

$ git rebase --interactive origin/trunk

Interactive rebase opens an editor with a list of all the commits since the revision you specify (remotes/trunk in our example).

pick d79a908 A small change to a file
pick c5dfec0 An unrelated change
pick db0346b Fix typo in hello

To combine the typo fix into the first commit, move its line directly below the line for the first commit and change “pick” to “squash”:

pick d79a908 A small change to a file
squash db0346b Fix typo in hello
pick c5dfec0 An unrelated change

The result will be two commits (d79a908 and c5dfec0), with d79a908 incorporating the changes from db0346b. You can do this for multiple consecutive lines if you want to combine many commits into one. See man git-rebase(1) for full details.

Now use git svn dcommit as above to push the revised commits upstream.

We’ve been working on a single branch so far, but one of the big benefits of using git is the cheap branching. Lets start work on a new experimental feature:

$ git checkout -b my-wacky-feature

The -b switch means create a new branch. Without that, git checkout switches to an existing branch.

...hack hack...

$ git add ...
$ git commit ...

At any time, we can commit locally and switch to another branch:

$ git checkout other-thing-to-work-on

...hack hack...

$ git add ...
$ git commit ...

then switch back and continue where we were:

$ git checkout my-wacky-feature

All of the commands we’ve discussed operate on the current branch (unless you specify otherwise). So you can grep for strings, get change logs and diffs and view visual history all in the context of the branch. You can also diff the current branch with another. To get a diff from release-1.0 to the current working tree (on branch fix-urgent-bug-1234):

$ git checkout fix-urgent-bug-1234
$ git diff release-1.0

Or to get diffs between arbitrary branches and revisions (without having to checkout either branch):

$ git diff release-1.0..my-wacky-feature 

See man git-diff(1) for all the options to diff.

git svn dcommit will only push changes on the current branch up to the subversion repository, so you can clean up and consolidate your commits using rebase, then push them back to subversion when they’re ready.


I hope this quick introduction has whet your appetite for combining the power of git with the ubiquity of subversion. There is much more to git (we haven’t touched on merging at all), and once you’ve dipped your feet in, I recommend reading the intros and man pages at the git site.

Please let me know if you have any suggestions or notice any errors.

Aperture 2

I’m a little disappointed that Apple are charging AUD129 for the upgrade to Aperture 2. Sure, there are a bunch of new, and very attractive, features that would otherwise make paying for the upgrade acceptable, but it seems a bit rich given that the full retail price has dropped from USD300 to USD200. Effectively, people who bought 1.x are getting hit twice.

But I guess it doesn’t bother me enough to stop me buying the upgrade…

PHP namespace

This comment from the Drupal Theme developer’s guide is an example of why whatever your question, PHP is not the answer:

An important note- when developing a theme using any of the methods described here, you must be sure that the name of the theme is not the same as the name of any module being used on the site because the function names may collide and your site may no longer function correctly.

New home for a blog

My blog has moved to a dedicated new home: http://www.mikerowecode.com/

All appropriate redirects are in place, but please check your feed reader to be sure. I’ve done a far-reaching survey of a wide range of users and clients–ok, well, actually myself and one friend, both using NetNewsWire–and it seems that it works fine when the feed is accessed directly, but if you have it syncing via NewsGator it doesn’t correctly propagate the new feed URL. It does follow the redirect to get the feed content, but doesn’t to push the changed URL back to the client. I’d be interested to hear about experiences with other readers.

A word about what’s behind curtain

The new site is built from text files using the blosxom publishing system. The text files are formated using John Gruber’s Markdown, with punctuation fixed by his SmartyPants.

I use a number of plugins for blosxom to get things working the way I want. These include archives and recententries to provide the navigation options in the sidebar, entries_index to maintain article time stamps and atomfeed to produce, er, an atom feed. :)

Blosxom runs in “static” mode to generate the site locally, and then I rsync it to my web server, where it’s served as static HTML.

Why blosxom?

It probably seems like a strange choice, when there are so many “advanced” alternatives such as Drupal (which was my previous system), WordPress, MovableType, Blogger, etc., etc. But a couple of things convinced me that blosxom was the way to go.

First, my needs are minimal. I just want to publish the stuff I write with the minimum of fuss and overhead. I wanted a publishing system that would get out of the way.

Second, there is something very appealing about keeping things in plain text. I can write in emacs (which is of course the One True Editor), manage changes with git, search with grep (or spotlight). The directory layout is the same on my hard disk as on the public server. There’s no database to worry about backing up.

Finally, since I’m serving static HTML, in the (admittedly far-fetched) event that this site becomes wildly popular and sees huge amounts of traffic, scaling will be trivial. :)

Reviewboard git mirror

For some months now, I’ve been maintaining a git mirror of the Reviewboard project’s svn repository. The git-svn tool works really well for this, except for one small wrinkle: the reviewboard projects uses svn:external to include an external module, djblets, and git-svn provides no transparent way to support this.

For now, I manage this manually. When ever I notice an update to djblets (which are thankfully rather rare), I use the following process to merge the changes into a branch (with-djblets) in the git repo:

$ cd ~/src/djblets
$ git svn rebase
$ git log -1 | grep -v '^commit' > /tmp/djblets.log

Note: change “1” to whatever number of commits have happened in djblets since the last time I did this. The grep command removes the git-specific “commit” lines from the log, which won’t be interesting enough to include in the commit message below.

$ cd ~/src/reviewboard-with-djblets
$ git status # make sure working dir is clean
$ cp -rp ~/src/djblets/* reviewboards/djblets/

At this point, I do a git status and manual sanity check to make sure the changes I’m about to commit here match the incoming change to djblets.

$ git add <files that are changed/new>
$ git commit -F /tmp/djblets.log
$ git push public-repo with-djblets

Done! Simple, no? Well, no… This process has a number of problems, the main one of which is it’s manual, and I have to do it. I’m hoping that I’ll be able to bend git-submodule to my will enough to take care of this.

Job search update

It’s been a while coming, but here is a quick update on my job search:

Whether it was my letter to recruiters, or just dumb luck, I ended up finding and accepting a pretty good contract job back in November. A product company, smart people, great relaxed environment. More or less everything on my list. Even a kick-ass coffee machine in the office. As expected, it was a smaller “boutique” recruiter that came through.

I’ve had a happy and productive couple of months.

Then this week, the company was bought by Microsoft and my contract terminated early. *sigh* More job search news to come, I guess.

Multiple instances of the iiNet Usage Widget

There is a very handy usage widget for iiNet available at LemonJar. However, the way it stores its preferences for iiNet account and password means that you can’t run multiple instances of the widget to monitor multiple iiNet accounts.

This patch fixes it so you can:

--- MAIN.js.ORIG    2007-11-08 11:35:07.000000000 +1100
+++ MAIN.js 2007-11-08 11:33:59.000000000 +1100
@@ -144,14 +144,18 @@

 }

+function keyForUsername() { return widget.identifier + "-" + "userName"; }
+function keyForPassword() { return widget.identifier + "-" + "psword"; }
+function keyForAlertCol() { return widget.identifier + "-" + "alertColorOn"; }
+
 //Read in Username & Password Stored in OS .plist. Updates Global Variables.
 function readPrefs(){
    debug("Function: readPrefs() run.");

    if(window.widget) { 
-       var TMPuserName = widget.preferenceForKey("userName"); 
-       var TMPpsword = widget.preferenceForKey("psword"); 
-       var TMPalertColorOn = widget.preferenceForKey("alertColorOn"); 
+        var TMPuserName = widget.preferenceForKey(keyForUsername()); 
+       var TMPpsword = widget.preferenceForKey(keyForPassword()); 
+       var TMPalertColorOn = widget.preferenceForKey(keyForAlertCol()); 

        if ( TMPuserName && TMPuserName.length > 0) { 
            userName = TMPuserName;
@@ -199,10 +203,10 @@
    alertColorOn = document.getElementById("alertColorPref").checked;

    if(window.widget){  
-       widget.setPreferenceForKey(document.getElementById("userNamePref").value, "userName");
-       widget.setPreferenceForKey(rot13(document.getElementById("pswordPref").value), "psword");
-       widget.setPreferenceForKey(document.getElementById("alertColorPref").checked, "alertColorOn");
-   }   
+      widget.setPreferenceForKey(document.getElementById("userNamePref").value, keyForUsername());
+      widget.setPreferenceForKey(rot13(document.getElementById("pswordPref").value), keyForPassword());
+      widget.setPreferenceForKey(document.getElementById("alertColorPref").checked, keyForAlertCol());
+   }
 }

LemonJar also make widgets for other Australian ISPs. I suspect this patch would also work for those widgets (on the assumption that the code in MAIN.js is common), but I haven’t tested it.

For what it’s worth, I filed a bug in their issue tracker.

Another patch for pyblosxom entrycache - normalised keys

The entrycache plugin uses the absolute path of a file as the key for caching its date. This is problematic if the file is moved (e.g. your data dir is different locally to on your web server).

This patch normalises the key to remove the “datadir” component. It also cleans up how the cache is written to disk:

diff --git a/entrycache.py b/entrycache.py
index 0cc3196..b46f89d 100644
--- a/entrycache.py
+++ b/entrycache.py
@@ -52,19 +52,18 @@ def cb_filestat(args):
    request = args["request"]
    data = request.getData()
    cache = data["cache"]
-   if cache.has_key(args['filename']):
+   config = request.getConfiguration()
+   key = args['filename'].replace(config['datadir'], '')
+   if cache.has_key(key):
        mtime = []
        for i in args['mtime']:
            mtime.append(i)
-       mtime[8] = cache[args['filename']]
+       mtime[8] = cache[key]
        args['mtime'] = tuple(mtime)
    else:
-       cache[args['filename']] = args['mtime'][8]
+       cache[key] = args['mtime'][8]
        f = open(data['cachefile'],'w')
-       f.write("{\n")
-       f.write("\t'%s' : %i,\n" % (args['filename'], \
    args['mtime'][8]))
-       for i in cache:
-           f.write("\t'%s' : %i,\n" % (i, cache[i]))
-       f.write("}")
+       import pprint
+       pprint.pprint(cache, f)
        f.close()
    return args

I’ll get around to publishing my git repo of this soon.

Leopard breaking MacPorts git over ssh

As I have been twitting recently, git over ssh stopped working for me after the upgrade to Leopard:

$ git pull
percent_expand: NULL replacement
fatal: The remote end hung up unexpectedly
Cannot get the repository state from ssh://git.mojain.com/...

A quick google search quickly turned up the answer. The problem was not with git, but with ssh. Spefically, ssh from MacPorts. It’s worth noting that ssh in OS X 10.5 is not broken (which made my intial trouble-shooting harder, as ssh-ing from the command line worked just fine). But git in MacPorts is:

$ which ssh
/usr/bin/ssh
$ ssh git.mojain.com
Last login: Mon Oct 29 20:17:47 2007 from ...
$ ^D

$ /opt/local/bin/ssh git.mojain.com
percent_expand: NULL replacement

You can follow the google results above for the details, but essentially it seems that two things cause the git problem:

  1. Leopard changed some environment variables that caused the MacPorts version of git to get a NULL when it tried to determine what “identity” to use.

  2. git looks for ssh in the same directory as the git binary, causing it to find the MacPorts version before the “native” OS X version.

There are a number of ways to work around this problem (all found in the aforemetioned google results):

  1. Set GIT_SSH to the OS X version (/usr/bin/ssh). This works for git only of course.

  2. Rename your ssh key files so MacPorts ssh can find them (I didn’t try this).

  3. Tell ssh which key file to use by adding the following line to $HOME/.ssh/config (creating that file if it doesn’t exist):

    IdentityFile ~/.ssh/id_dsa

This last option is the one I chose, as it has the advantage of working for all versions and invocations of ssh, and is probably a good idea anyway. Presumably the MacPorts ssh package will be fixed at some point, but this is working for me now.

Patch for pyblosxom entrycache plugin to make cache location configurable

The entrycache plugin for pyblosxom is really cool. I only wish I could configure the location of the file it uses to store its cached dates.

So here’s patch:

--- a/entrycache.py
+++ b/entrycache.py
@@ -21,24 +21,31 @@ __url__ = "http://joe.terrarum.net"

 import os.path

+def _get_cache_filename(args):
+   request = args["request"]
+   config = request.getConfiguration()
+        if config.has_key('entrycache_cachefile'):
+                return config['entrycache_cachefile']
+        else:
+                return os.path.join(config['datadir'],'.entrycache')
+
 def cb_start(args):
    t = { }
    request = args["request"]
-   config = request.getConfiguration()
    data = request.getData()
-   if os.path.isfile(os.path.join(config['datadir'],'.entrycache')):
-       data['cachefile'] = os.path.join(config['datadir'],'.entrycache')
-       f = file(os.path.join(config['datadir'],'.entrycache'))
+   if os.path.isfile(_get_cache_filename(args)):
+       data['cachefile'] = _get_cache_filename(args)
+       f = file(_get_cache_filename(args))
        t = eval(f.read())
        f.close()
         data['cache'] = t
    request.addData(data)

    if not data.has_key('cachefile'):
-       f = file(os.path.join(config['datadir'],'.entrycache'),'w')
+       f = file(_get_cache_filename(args),'w')
        f.write("{ }")
        f.close()
-       data['cachefile'] = os.path.join(config['datadir'],'.entrycache')
+       data['cachefile'] = _get_cache_filename(args)
        request.addData(data)

 def cb_filestat(args):

Then add a line like this to your pyblosxom config:

py["entrycache_cachefile"] = \
    "/Users/mrowe/Sites/blog/data/.entrycache"