Importing a Blosxom blog into Jekyll

As mentioned, I recently decided to move my blog from a self-hosted, Blosxom-driven mostly-manual set up to github pages.

This involved these main steps:

  • Set up a github repo to hold the templates and source text
  • Migrate templates from Blosxom’s templating language to Jekyll/Liquid
  • Import the content

I won’t cover the first two in detail here. Setting up a repository for pages is well documented by github, and migrating the templates was relatively straightforward–I used the code behind Simon Harris’s blog as a starting point. (Getting the archive page working was slightly more interesting. I’ll write more on this later.)

There were two parts to importing the content. Firstly, the directory layout expected by Jekyll is slightly different to that I was using in Blosxom.

Here is what I had:

.
|-- 2009
|   `-- 04
|       |-- an-interesting-story.txt
|       `-- something-else.txt
|-- 2010
|   |-- 01
|   |   |-- happy-new-year.txt
|   |   `-- headache.txt
|   `-- 08
|       `-- migrating-blog.txt

Jekyll wants a much flatter directory layout, with all the files in a single directory and the date as part of the file name:

.
`-- _posts
    |-- 2009-04-01-an-interesting-story.md
    |-- 2009-04-19-something-else.md
    |-- 2010-01-01-happy-new-year.md
    |-- 2010-01-02-headache.md
    `-- 2010-08-04-migrating-blog.md

The trick was that Jekyll wanted a day, but I only encoded the year and month in my Blosxom file structure. Luckily, I was using the Blosxom entries_index plugin, which stores Unix-style timestamps for every entry it publishes. So I wrote a little Clojure program to read the entries_index cache and derive a Jekyll-style file name for every entry:

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)

(import 'java.util.Date 'java.text.SimpleDateFormat)

(def entry-index
  (read-lines (first *command-line-args*)))

(defn parse-line [line]
  (let [[_ filename timestamp] (re-matches #".*'(.+)'.*\s+(\d+).*" line)]
    {:filepath filename :timestamp timestamp}))

(defn date [timestamp] (Date. (* 1000 (Long/valueOf timestamp))))
(defn date-str [date] (. (SimpleDateFormat. "yyyy-MM-dd") format date))
(defn filename [path] (last (re-split #"/" path)))
(defn md-ext [s] (re-sub #".txt$" ".md" s))
(defn valid? [line] (not (nil? (:timestamp line))))

(defn target-file-name [entry]
  (str (date-str (date (entry :timestamp))) "-" (md-ext (filename (entry :filepath)))))

(def entries (filter valid? (map parse-line entry-index)))

(defn copy-command [entry]
  (str "cp " (entry :filepath) " " (target-file-name entry)))

(println (str-join "\n" (map copy-command entries)))

Note that this program doesn’t actually do anything, it just outputs a bunch of “cp” commands that you can feed into a shell.

The second step is to add a block of YAML “front matter” to each file that Jekyll uses to parse the file and generate the appropriate output. This front matter is of the form:

---
layout: post
title: Blog migration
---

This tells Jekyll which template to use, and what to use for a title. The Blosxom source files don’t contain any such front matter, but do have the post’s title as their first line. A simple bit of sed wrote the appropriate opening lines of each file:

1,1 s/\([^-].*\)/---\
layout: post\
title: \1\
---/g

I invoked it like this:

for f in `ls  _posts/*`
    do sed -f ~/Projects/migrate-blosxom-to-jekyll/insert_front_matter.sed -i "" $f
done

And that was more or less that! The above code is available on github at http://github.com/mrowe/migrate-blosxom-to-jekyll, and of course the entire content of my blog is at http://github.com/mrowe/mrowe.github.com.

Blog migration

And in the latest installment in an ongoing tradition… I’ve moved my blog! This time, to github pages. Now they can worry about keeping servers running, and generating HTML from my text when I commit and all of those little details.

The migration was relatively painless–more details on the mechanics to follow. But if you can see this, it worked!

(Aside: all the templates and content that runs this blog is available on github.)

First adventures in Clojure

I’ve been banging on to anyone who’d listen for ages now about how Clojure is going the be the Next Big Thing. I read a fair way into Stuart Halloway’s Programming Clojure, and I played in the REPL a bit here and there, but I never got around to doing anything serious with it.

Today I finally found an excuse to use Clojure at work for a real-world problem. I needed to write a small program to read a product feed in CSV format, and cross-check that all the products in the feed actually exist in the live product catalogue database.

Here is my somewhat naïve attempt at implementing a solution:

;;
;; Read a CSV file and look up the product ids it contains in a
;; database. Report all the products in the CSV that do not exist in
;; the database.
;;
;; Usage: $0 <path-to-csv-file>
;;

(import 'java.io.FileReader 'au.com.bytecode.opencsv.CSVReader)

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.sql)

;; OpenCSV gives us a List of String[]s... ugh.
(defn read-csv [file-name]
  (with-open [reader (CSVReader. (FileReader. file-name))]
     (rest ;; skip the header row
      (map seq (seq (. reader readAll))))))

;; extract interesting fields from a CSV row
(defn product-from [row]
  {:product-id (nth row  0 "")
   :title      (nth row  1 "")})

;; set up the db connection
(def db {:classname   "org.h2.Driver"
         :subprotocol "h2"
         :subname (str "file:///Users/mrowe/.h2data/mydata")
         :user     "sa"
         :password ""})

(defn sql-query [q]
  (with-query-results res q (doall res)))

(defn count-products [product-id]
  (:count
   (first
    (sql-query ["select count(1) as count from product where id = ?" product-id]))))

(defn exists? [product-id]
   (>= (count-products product-id) 1))

(defn product-missing? [csv-row]
  (let [product (product-from csv-row)]
    (not (exists? (product :product-id)))))

;;;;;;;;;;

(def filename (first *command-line-args*))
(def feed (read-csv filename))

(defn report-product-id [row]
  (let [product (product-from row)]
    (format "Not in product catalog: %s - %s" (product :product-id) (product :title))))

(with-connection db 
  (println (str-join "\n" (map report-product-id (filter product-missing? feed)))))

This was purely an exercise in thinking functionally, and figuring out the basics of driving Clojure and getting it to interact with the world around it. I’ve made no attempt to actually use one of Clojure’s headline features, concurrency. (For what it’s worth, it happily processes an input of 2500 rows in a few seconds, most of which is spent in the database–I doubt there’s much to be gained from parallelising it.) But I think it reads pretty well, and is at least as concise and expressive as the equivalent Ruby would have been–once you learn to see through all the parentheses. ;-)

Let me know what you think!

Update: I’ve put the above code on github: http://gist.github.com/505633

Using VMWare Fusion shared folders with a Linux guest

VMWare Fusion has a “shared folders” feature which allows you to seamlessly share folders on the host Mac system with the virtualised guest OS. With a Linux guest, vmware-tools will install the “Host-Guest File System” (hgfs) driver and add an entry to /etc/fstab to automagically mount all shared folders under /mnt/hgfs.

This is great, but unless your user id in the Linux guest happens to match your user id OS X, you will not be able to access the mounted directories as a regular user. Luckily, you can get the hgfs driver to mount the shared folders as your user. Edit /etc/fstab as root:

$ sudo vi /etc/fstab

and look for a section like:

# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0
# End of the block added by the VMware software

Add options for uid and gid:

# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5,uid=1000,gid=1000 0 0
# End of the block added by the VMware software

The values I’ve used, 1000 for uid and gid, are the defaults for the first user created on an Ubuntu desktop install. To find the correct values for your user, run the id command in the guest OS:

$ id
uid=1000(mrowe) gid=1000(mrowe) groups=...

Filtering lists

Recently, my friend Gav wrote about using STL to filter a vector of values in C++ in which he explained a surprising gotcha. I’m sure he knows what he’s talking about, but it struck me how ugly this (presumably idomatic) code was. So I figured I’d see what it would look like in a few more “modern” languages:

Ruby

>> numbers = 1..9
=> 1..9
>> numbers.reject { |n| n.even? }
=> [1, 3, 5, 7, 9]

Or, if you skip the separate assignment of the input data:

>> (1..9).reject { |n| n.even? }
=> [1, 3, 5, 7, 9]

Python

>>> numbers = range(1,10)
>>> [n for n in numbers if n % 2]
[1, 3, 5, 7, 9]

or

>>> [n for n in range(1, 10) if n % 2]
[1, 3, 5, 7, 9]

Clojure

user=> (def numbers (range 1 10))
#'user/numbers
user=> (filter odd? numbers)
(1 3 5 7 9)

or

user=> (filter odd? (range 1 10))
(1 3 5 7 9)

Yeah, I get that this wasn’t the point of the original post–sometimes you’re just stuck with C++. But if you do have the choice, other languages can be far more expressive for this common kind of list processing.

If you have examples in other languages (or improvements to my efforts) send them in and I’ll post them here.

Update: From Julian Doherty:

Erlang

1> Numbers = lists:seq(1,9).
[1,2,3,4,5,6,7,8,9]
2> [X || X <- Numbers, X rem 2 =/= 0].
[1,3,5,7,9]

Update: From Ben MacLeod:

C#

using System;
using System.Linq;

// ...

    var numbers = Enumerable.Range(1, 10).Where(n => n % 2 != 0);
    // or, equivalently:
    //var numbers = (from n in Enumerable.Range(1, 10) where n % 2 != 0 select n);
    foreach(var number in numbers) {
        Console.WriteLine(number);
    }

// ...

Update: From John Carney:

PHP

5.2

function not_even($x) {
    return $x & 1 ;
}

$numbers = array(1, 2, 3, 4, 5, 6, 7, 8, 9) ;
$numbers = array_filter($numbers, "not_even") ;

5.3

$numbers = array(1, 2, 3, 4, 5, 6, 7, 8, 9) ;
$numbers = array_filter($numbers, function($x) { return $x & 1 ; }) ;

Enabling git bash completion on OS X

Bash completion, the magic that allows you to start typing the name of a file, directory, etc. in bash then press TAB to complete it, can be taught new tricks, including knowing about your git repository. But if you’re on a Mac, the magic is not installed by defaut.

If you are running git from MacPorts, you probably don’t have the bash_completion variant installed. You can install it with:

sudo port install git-core +bash_completion

If you do already have git installed without this variant, you’ll probably need to deactivate it first:

sudo port deactivate git-core

Then reinstall with the variants you need:

sudo port install git-core +bash_completion +gitweb +svn +doc

You can then activate completion by adding the following to your ~/.bash_profile:

if [ -f /opt/local/etc/bash_completion ]; then
    . /opt/local/etc/bash_completion
fi

Thanks to Denis Barushev for this tip.

A potted history of JAOO 2009

A couple of weeks ago I attend the JAOO 2009 conference in Brisbane. What follows is a biased, incomplete and probably misleading account of my impression of the two days.

Keynote

I always assumed conference keynotes were meant to be broad, sweeping and inspiring. This one was narrow, technical and delivered in a mind-numbing monotone. Maybe it’s just the way they do things now?

Introduction to Objective-C

This was clearly targeted at people who have no exposure to Objective-C, but rather than just being a dry survey of the language syntax and libraries, Glenn Vanderburg provided a nice historical overview of the Objective-C and its heritage.

My take away: Objective-C is basically SmallTalk, and SmallTalk is basically Lisp.

Google App Engine: Building an App the Google Way

Pamela got rave reviews in Sydney, and she’s certainly an entertaining speaker. If you’d never heard of GAE, or never looked at its capabilities, this would have been a very good introduction. I’ve built a couple of small GAE apps though (in Python), and other than seeing the Java version of some of the APIs, this talk really told me nothing new.

1,001 Iterations: Product Design, Illustrated

This was a recounting of the process Avi Bryant went through taking a new idea from its inception through many refinements to a polished product.

Perhaps most the interesting part for me was Avi’s assessment of the relative strengths of the various languages he ended up using to implement the product:

  • Squeak - for “thinking” in (i.e. the interesting problems and their solutions)

  • Java - for nuts-and-bolts computing (crunching numbers)

  • Ruby - for interfacing with external libraries and APIs (e.g. twitter)

  • JavaScript - for interacting with the user

I’m not sure it’s always a good idea to mix so many technologies in the one product, but it certainly makes some sense to not get hung up on the One True Language, and just use each where they’re best suited.

Speeding Ducks

Avi again. Much more technical this time. Avi’s main point: Ruby really is slow, but there’s no reason it has to be.

He began with an interesting history of Java’s Hotspot VM, which was based on technology developed for SmallTalk and Self in the 1980s. But Google’s V8 was built by three people in about 3 months–surely we can do the same for ruby!

At the end of the talk, Avi was challenged by Joshua Bloch. Josh disputed Avi’s claim that because V8 was built in three months, all optimising “hotspot” VMs should be easy to build. Java’s current VM has been constantly improved over many years, and solves many non-trivial problems.

Of course, this sort of interaction between notable figures in our industry is exactly why you go to conferences like JAOO.

Hey You! Get On To My Cloud! - Application Development in the Clouds

Dave Thomas gave us some thought provoking ideas about current languages development platforms. Is JavaScript the way of the future? I’m not so sure, but I think one of Dave’s main points is worth paying attention to: functional programming is the way forward if we want to improve the speed with which we can build software.

Atlassian

Mike Cannon-Brookes gave us a bit of background of Atlassian’s history (they’ve gone from two people and one product to nearly 100 engineers and ten products in eight years), then listed what he thought were the ten key practices that have made them successful. I’ll excerpt just the ones I think are worth talking about, and add my thoughts (not necessarily agreeing with Mike):

  • Agile - it’s the principles that are important, not any particular methodology or set of tools

  • Code review - there’s plenty of hard evidence that code review/inspection is one of the best ways to reduce the number of defects in software. Of course, pairing is the ultimate form of code review.

  • Optimise tests - the main goal: get feedback to developers as fast as possible. Some of the things Atlassian do to achieve this include selectively running only tests that could possibly be affected by a code change (by doing static analysis on coverage), and splitting functional tests into parallel builds.

This is a common problem–functional test suites that take so long to test an application that the pipeline from code check-in to the “you broke the build!” feedback can be hours. Atlassian’s solution is to split the tests into chunks that run in a maximum of ten minutes, and have enough build agents to run all the chunks in parallel.

  • Put everything in a wiki. Yeah, they would say that, wouldn’t they? :-)

  • “Dev speed posse” - Atlassian have a small team that spend a fixed amount of time every week just focusing on removing things that slow down development. This is a great idea (although not one that’s unique to Atlassian), and something more organisations should consider. One of the more interesting goals they have is that the “checkout loop” (the time it takes a developer to go from a clean machine to having a checked out app running locally and ready to work on) should be no more than ten minutes. How many large development shops can achieve that?

Josh Bloch - Effective Java

This was basically a summary of some of the new things in the second edition of Effective Java. About a third of the talk was all about generics. Good grief. Surely someone has noticed by now that this has all gone horribly wrong.

“Concurrency is hard” - even if you use the right APIs (for example, always use ConcurrentHashMap not Collections.synchronized*()) it’s still easy to get it wrong. Read Brian Goetz’s Java Concurrency in Practice.

And finally: Serializable is bad, since it allows objects to be created without using constructors. This can lead to invariants and other assumptions being violated. Josh says to use serialization proxies instead.

Doug Crockford on JavaScript

One of the classic Doug Crockford JavaScript talks. Probably nothing new if you’d listened to his talks from Yahoo’s YUI Theater, but still well worth spending 45 minutes listening to in person.

Some of Doug’s comments, observations and tips:

  • JavaScript has widest range of user programming skills of any language, from computers scientist to cut-and-pasters

  • JavaScript has many influences, including: Self (prototypes, dynamic typing), Scheme (lambda, loose typing), Java (syntax), Perl (regexps)

  • it is commonly being used as a functional language–you’ll write better JavaScript if you think functionally

  • eval is the most misused feature - just don’t do it!

  • always use ===. You’ll be tempted to use == instead, but it’s broken–it causes type coercion, which leads to unexpected and buggy results

  • manage the divide between client and server (don’t recreate the server in the browser)

Software Visualization and Model Generation

Eric Doernenburg is a consultant at ThoughtWorks, and I’d heard him talk before about some of the cool code visualisation tools he’s put together. The basic idea is that by visualising certain attributes of a code base, it’s much easier to focus on the trouble spots without getting lost in the detail of thousands of lines of code.

Interestingly, Eric uses both common tools (e.g. CheckStyle) and the more exotic (CodeCrawler, CodeCity). Those last two are more or less self-contained, but Eric does really cool things with CheckStyle and Graphviz, and a bit of XSL to glue them together. As a general approach, use whatever analysis tool is closest to what you need, then map the output into a format your visualisation tool can read.

Smart Software with F#

An overview of, and small sample app, in F#, Microsoft’s functional language for the CLR. The main message:

  • F# is great for data-intensive applications

  • smart algorithms are (relatively) easy in F#

Both of which apply to any functional language of course.

You try to give Microsoft people the benefit of the doubt… but Joel Pobar, despite obviously being very knowledgeable about F# and functional programming, still managed a couple of clangers. Most egregious: he called python an “elementary imperative language”. Fair enough if your background is Visual Basic and you’d never heard of functional programming… but this guy is the F# expert.

Anyway, it was good to see a bit of F# in action. If it gets more people thinking about functional programming, great. But it doesn’t offer anything you can’t get in Clojure, SmallTalk, Scheme, etc., unless you’re stuck in the Microsoft ecosystem.


Overall, a great couple of days. I learnt new things and expanded my mind about things I already knew. I hope to go again next year, and hopefully it will come to Melbourne!

A new start

I have a new job, and contrary to what I’ve said previously, it’s at a consulting company. With a differnce.

Cogent is a consulting company, but with ambitions to be a product company. In fact, I’ve spent my first few weeks here working on our first publicly available product, Runway. (Runway is a task management app that supports the principals of Getting Things Done®.)

I’m really excited to be here. Cogent has an explicit goal of treating its employees and its customers humanely. It’s very open and free of bureaucratic nonsense. And it’s full of really smart people–although the average probably just went down bit… :-)

Emacs full-screen shortcut

When you’re writing or coding, you want to remove as many distractions as possible. In addition to obvious things like shutting down you email, IM and twitter clients, it can be helpful to put your editor in full-screen mode. This way, the editor is the only thing visible, so your attention isn’t drawn to menu bars, flashing notifications or bouncing dock icons.

To create a shortcut for fullscreen mode in emacs, put this in your ~/.emacs file:

(defun toggle-fullscreen ()
  (interactive)
  (set-frame-parameter nil 'fullscreen (if (frame-parameter nil 'fullscreen)
                                           nil
                                           'fullboth)))
(global-set-key [(meta return)] 'toggle-fullscreen)

Now pressing M-return (usually alt + return on Windows/Linux or ⌘ + return on a Mac) will toggle Emacs between normal and full screen mode.

Thanks to Vebjorn Ljosa in this thread for this code snippet.

Spaces becomes usable in OS X 10.5.3

Spaces was one of the most anticipated features in Leopard, at least for Unix/X11 refugees like myself. X has had virtual desktops for decades, but users of “mainstream” desktop operating systems (i.e. Windows and Mac OS X) have had to rely on third-party utilities to get the same functionality.

In the case of OS X, Leopard was set to change that with Spaces. Unfortunately, the implementation was broken in such a way as to make it incredibly frustrating to use the way I’m used to using X11. I typically have Terminal and Safari (and often Emacs) windows open on multiple desktops. But on a desktop dedicated to a particular task, I want to be able to ⌘-⇥ (command-tab) between application windows on that desktop. Prior to 10.5.3, this would invariably do precisely the opposite of what I wanted, and flip to another desktop that had a window of that application open. This resulted in Spaces being about 5% as useful as X11 for serious keyboard-oriented work.

(For what it’s worth, this whole thing is mostly an issue because of the distinction OS X makes between apps and windows of apps–in X11, alt-tab usually cycles between all windows equally, regardless of what application they belong to. On OS X however, command-tab cycles between applications–⌘-` can be used to cycle between windows of an application.)

But good news! The recent 10.5.3 update to Mac OS X fixes it! Contrary to what Gruber says:

[Y]ou shouldn’t notice any changes, because the default behavior remains the same in 10.5.3

the default behaviour has changed: command-tabbing between applications now stays on the same desktop if the target application has a window there, and jumps to another desktop otherwise.

This is just about perfect. I actually like the jump-to-desktop behaviour for applications that aren’t on multiple desktops (e.g. iTunes), but now the default is to stay on-desktop for apps that are. (I still think I’d be slighly more comfortable if OS X behaved the same way as X11, and treated all windows as equal–but that could be Just What I’m Used To.)

Thanks Apple!