A Clojure library to authenticate with LDAP

My employer has released a small Clojure library I wrote that allows you to easily authenticate users against an LDAP server:

https://github.com/realestate-com-au/clj-ldap-auth.

It uses the UnboundID LDAP SDK for Java to look up a user name in an LDAP server and attempt to bind with specified credentials.

The simplest usage looks like:

(require '[clj-ldap-auth.ldap :as ldap])

(if (ldap/bind? username password)
  (do something-great)
  (unauthorised))

That works, but isn't very helpful when authentication fails. So you can also pass a function that will be called with a diagnostic message in the event that authentication fails:

(let [reason (atom nil)]
  (if (ldap/bind? username password #(reset! reason %1))
    (do something-great)
    (unauthorised @reason)))

The provided function should take a single argument, which will be a string.

Configuration of the library (i.e. the ldap server to connect to, etc.) is via system properties. See the README for details.

Implementation

The library first establishes a connection to the server, optionally using SSL. If a bind-dn is configured (i.e. credentials with which to connect to the LDAP server), it is used to bind to the server. If that's successful, we then look up the provided username (in the attribute uid). If found, the entry's distinguished name (DN) is extracted and this DN and the provided password are used to bind a new connection.

If any of these steps fail (e.g. the binddn is unauthorised, the username can't be found, or the looked up DN and password can't bind) the function returns false (and calls the provided sink function to say why). If everything works and the connection can be bound with the target DN and password, it returns true (and the sink function is not called).

Limitations

It would probably be useful to be able to specify what attribute(s) to use for looking up the username, but for now it is hard coded to uid. Also, current test coverage (using midje) is minimal. UnboundID provide an in-memory LDAP server implementation, which could probably be used to build some fast-running integration tests.

Do you have anything to declare, sir?

One of the cornerstones of modern software engineering is dependancy management systems. Think Bundler, Leiningen, or (forgive me) Maven. We stand on the shoulders of giants when we write our apps, and we need a way of specifying which giants. Modern systems like RubyGems are pretty good at this. But not perfect.

I have a dream

I have a simple dream. All I want is this: I want to be able to checkout your project on any computer, install the appropriate language runtime and dependency manager and type make (or rake, or lein, or ./build.sh, or ...) and have a running system.

None of this is new. Joel said it, Twelve Factor App says it. But surprisingly few people seem to actually do it.

Undeclared dependencies are the root of all evil

It's very easy as a developer to introduce dependencies into your project without even realising. Our workstations get all sorts of cruft installed on them over time, and the chances are something lying around fulfills an undeclared transitive dependency for the library you just installed. But the next developer may not be so lucky.

I don't want to have to find out by trial and error what native libraries your code depends on. I don't want to belatedly discover some assumptions you made about what else would be running on my computer. I just want to type make.

So what's the point?

Just this:

Your project's build system has the responsibilty to install any library that is required to support your code.

Hopefully most of these will be taken care of by your dependency management system. But for those that aren't (e.g. native libraries that are required by Ruby Gems) your build system needs to make sure they are installed.

New release of clj-aws-ec2

I have released version 0.2.0 of clj-aws-ec2. This version contains no changes from 0.1.11. I'm just trying to adhere more closely to semantic versioning, having been fairly slack about it so far.

This version does however contain many changes since I last mentioned it here. It can now describe, create and delete tags on resources, and create and deregister images (AMIs).

I consider this more or less "feature complete" for my current purposes. Of course, it only covers a very small fraction of the available EC2 SDK but hopefully it is on the right side of the 80/20 rule. :-) I am open to feature requests—or even better pull requests—for further elements of the API that you would like to see supported.

A simple polling function in Clojure

One of my projects at work is to build an internal web service around AWS to support our internal tooling. (This led to the development my clj-aws-ec2 library.)

The web service needs "integration" tests that exercise its RESTful API to manipulate AWS resources (i.e. create instances, add tags, etc.). This sort of testing is fraught for many reasons and should be kept to a minimum, but it does provide a bit of an assurance that the service will actually respond to its published interface when deployed.

One of the reasons this sort of testing is fraught is that it depends on an external service that is beyond our control (i.e. AWS). Many things can go wrong when talking to AWS, and everything takes time. So my test needs to invoke the service to perform an action, then wait until the expected state is achieved (or a timer elapses causing the test to fail). What I'd like to be able to write is something like:

(deftest ^:integration instance-lifecycle
  (testing "create instance"
 
    (def result (POST "/instances" (with-principal {:name "rea-ec2-tests/int-test-micro", :instance-type "t1.micro"})))
    (has-status result 200)
 
    (let [id (first (:body result))]
      (prn (str "Created instance " id))
 
      (testing "get instance"
        (has-status (GET (str "/instances/" id)) 200)
        (is (wait-for-instance-state id "running")))
 
      (testing "stop instance"
        (has-status (PUT (str "/instances/" id "/stop")) 200)
        (is (wait-for-instance-state id "stopped")))
 
      (testing "start instance"
        (has-status (PUT (str "/instances/" id "/start")) 200)
        (is (wait-for-instance-state id "running")))
 
      (testing "delete instance"
        (has-status (DELETE (str "/instances/" id)) 200)
        (is (wait-for-instance-state id "terminated"))))))

But how do you write a polling loop in Clojure? A bit of clicking around on Google led me to a function written by Chas Emerick for his bandalore library:

;; https://github.com/cemerick/bandalore/blob/master/src/main/clojure/cemerick/bandalore.clj#L124
(defn polling-receive
  [client queue-url & {:keys [period max-wait]
                       :or {period 500
                            max-wait 5000}
                       :as receive-opts}]
  (let [waiting (atom 0)
        receive-opts (mapcat identity receive-opts)
        message-seq (fn message-seq []
                      (lazy-seq
                        (if-let [msgs (seq (apply receive client queue-url receive-opts))]
                          (do
                            (reset! waiting 0)
                            (concat msgs (message-seq)))
                          (do
                            (when (<= (swap! waiting + period) max-wait)
                              (Thread/sleep period)
                              (message-seq))))))]
    (message-seq)))

That seems pretty close! I generalised it a bit to remove dependencies on Chas's messaging routines and just take a predicate function:

Finally, a couple of helper functions to tie it all together and enable the tests to be written as above:

(defn get-instance-state [id] (:state (:body (GET (str "/instances/" id)))))
(defn wait-for-instance-state [id state] (wait-for #(= (get-instance-state id) state)))

There's a couple of improvements that could be made to wait-for, the most obvious being to use a "wall clock" for the timeout. The current implementation will actually wait for timeout + (time-to-evaluate-predicate * number-of-invocations) which is probably not what you want, especially when the predicate could take a non-trivial amount of time to evaluate because it is invoking an external service.

Comments and improvements welcome!

UPDATE: My colleague Eric Entzel pointed out that there is no need to use an atom to store and update the "waiting" counter, its state can just be passed around with function invocations (and recursion). The above gist has been simplified to reflect this observation.

UPDATE: Even better, when I went to implement the "wall clock" timeout, I realised there is no need to maintain any state at all, since the absolute timeout time can be calculated up front and compared to the system clock on each evaluation. (I also flipped the timeout test and the sleep, to more accurately relfect the intent of a timeout.) Gist updated again.

UPDATE: And finally, Adam Fitzpatrick noticed that there's no longer any need to let bind the poller function to a symbol, we can just put its contents in the main function body. Gist updated again.

Introducing clj-aws-ec2

We use Amazon's AWS quite heavily at work, and part of my job involves building internal tools that wrap the public AWS API to provide customised internal services.

I am building some of these tools in Clojure, and I needed a way to call the Amazon API. Amazon provide a Java SDK so it's a fairly simple matter to wrap this in Clojure. In fact James Reeves had already done so for the S3 API. So I took his good work and adapted it to work with the EC2 components of the API:

https://github.com/mrowe/clj-aws-ec2

The library tries to stay true to Amazon's official Java SDK, but with an idiomatic Clojure flavour. In particular, it accepts and returns pure Clojure data structures (seqs of maps mostly). For example:

user=> (require '[aws.sdk.ec2 :as ec2])
user=> (def cred {:access-key "..." :secret-key "..."})
user=> (ec2/describe-instances cred (ec2/instance-id-filter "i-b3385c89"))

({:instances
    ({:id "i-b3385c89",
      :state {:name "running",
              :code 272},
      :type "t1.micro",
      :placement {:availability-zone "ap-southeast-2a",
                  :group-name "",
                  :tenancy "default"}, 
      :tags {:node-name "tockle",
             :name "mrowe/tockle",
             :environment "mrowe"},
      :image "ami-df8611e5",
      :launch-time #<Date Tue Nov 13 08:23:09 EST 2012>}),
  :group-names (),
  :groups ({:id "sg-338f1909", :name "quicklaunch-1"})})

This is still a work in progress. So far, you can describe instances and images, and stop and start EBS-backed instances. I plan to work on adding create/terminate instances next.

UPDATE: I just released v0.1.6 which includes run_instance and terminate_instance support.

Find files in a git project redux

In my previous post I described an update to an Emacs Anything source to "Find files in a git project". This works great if you are inside an git-managed project, but fails horribly if you are not.

Here is a version that fixes that:

(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
  '((name . "Files from Current GIT Project")
    (init . (lambda ()
              (let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
                                                         (file-name-directory (buffer-file-name))
                                                       default-directory)))
                     (top-dir (if git-top-dir
                                  (file-truename git-top-dir)
                                default-directory))
                     (default-directory top-dir)
                     (signature (magit-rev-parse "HEAD")))

                (unless (and anything-c-source-git-project-files-cache
                             (third anything-c-source-git-project-files-cache)
                             (equal (first anything-c-source-git-project-files-cache) top-dir)
                             (equal (second anything-c-source-git-project-files-cache) signature))
                  (if (third anything-c-source-git-project-files-cache)
                      (kill-buffer (third anything-c-source-git-project-files-cache)))
                  (setq anything-c-source-git-project-files-cache
                        (list top-dir
                              signature
                              (anything-candidate-buffer 'global)))
                  (with-current-buffer (third anything-c-source-git-project-files-cache)
                    (dolist (filename (mapcar (lambda (file) (concat default-directory file))
                                              (magit-git-lines "ls-files")))
                      (insert filename)
                      (newline))))
                (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

    (type . file)
    (candidates-in-buffer)))

As a diff from the previous version:

@@ -2,9 +2,12 @@
 (defvar anything-c-source-git-project-files
   '((name . "Files from Current GIT Project")
     (init . (lambda ()
-              (let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
-                                                                    (file-name-directory (buffer-file-name))
-                                                                  default-directory))))
+              (let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
+                                                         (file-name-directory (buffer-file-name))
+                                                       default-directory)))
+                     (top-dir (if git-top-dir
+                                  (file-truename git-top-dir)
+                                default-directory))
                      (default-directory top-dir)
                      (signature (magit-rev-parse "HEAD")))

Emacs Anything - Find files in a git project

If you use Emacs you really should take a look at Anything. When you do, you'll probably want to use it to replicate TextMate's fabled "Go to file...". Ken Wu wrote a nice little anything-source that uses git to derive a file list for a project, but he was obviously using an old version of magit. Here's a tweaked version of his code that works with Magit v1.1.1:

(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
  '((name . "Files from Current GIT Project")
    (init . (lambda ()
              (let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
                                                                    (file-name-directory (buffer-file-name))
                                                                  default-directory))))
                     (default-directory top-dir)
                     (signature (magit-rev-parse "HEAD")))

                (unless (and anything-c-source-git-project-files-cache
                             (third anything-c-source-git-project-files-cache)
                             (equal (first anything-c-source-git-project-files-cache) top-dir)
                             (equal (second anything-c-source-git-project-files-cache) signature))
                  (if (third anything-c-source-git-project-files-cache)
                      (kill-buffer (third anything-c-source-git-project-files-cache)))
                  (setq anything-c-source-git-project-files-cache
                        (list top-dir
                              signature
                              (anything-candidate-buffer 'global)))
                  (with-current-buffer (third anything-c-source-git-project-files-cache)
                    (dolist (filename (mapcar (lambda (file) (concat default-directory file))
                                              (magit-git-lines "ls-files")))
                      (insert filename)
                      (newline))))
                (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

    (type . file)
    (candidates-in-buffer)))

I tried to update the Emacs Wiki page to include this fix but couldn't. Not sure what I was doing wrong... The changes I made to Ken's code:

@@ -6,7 +6,7 @@
                                                                     (file-name-directory (buffer-file-name))
                                                                   default-directory))))
                      (default-directory top-dir)
-                     (signature (magit-shell (magit-format-git-command "rev-parse --verify HEAD" nil))))
+                     (signature (magit-rev-parse "HEAD")))
 
                 (unless (and anything-c-source-git-project-files-cache
                              (third anything-c-source-git-project-files-cache)
@@ -20,10 +20,14 @@
                               (anything-candidate-buffer 'global)))
                   (with-current-buffer (third anything-c-source-git-project-files-cache)
                     (dolist (filename (mapcar (lambda (file) (concat default-directory file))
-                                              (magit-shell-lines (magit-format-git-command "ls-files" nil))))
+                                              (magit-git-lines "ls-files")))
                       (insert filename)
                       (newline))))
                 (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

Chef doesn't lock node data when updating

We came across an exciting Chef bug today.

Chef tracks metadata about nodes in its database. This includes operational facts about the node (uptime, memory, etc.), and chef-related things like when the node last checked in. It also includes intentional data such as what run list should be applied to the node.

Periodically, a node polls its server for updates. What happens is:

  • node checks in with server

  • node gets current metadata from server, including its run list of recipes and roles

  • node performs actions as per the run list

  • node saves its metadata back to the server, including the run list it just applied

All well and good, except that step three can potentially be long running. There's plenty of time for an administrator to change the node's desired run list (or other intentional metadata) using the knife tool or the web interface. But now, when the node's run completes, it saves its old state back to the server, over-writing whatever updates an administrator applied while it was running. And you won't know unless you look.

This is unfortunate.

There's a bug that more or less describes this in the project's tracker. It was raised quite recently, so hopefully someone from the Chef team will take a look at it soon. There's also a thread on the Chef mailing list.

Jekyll archives grouped by date

One thing Jekyll doesn't provide out of the box (as fas I can tell) is any sort of archive functionality. (Aside: I really like what Tumblr does for archives.)

I would have liked something a bit more flexible, but for now this site's archive displays a list of all entries grouped by year. Here's the template code I'm using:

<h2>Archives</h2>
<ul>
  {% for post in site.posts %}

    {% unless post.next %}
      <h3>{{ post.date | date: '%Y' }}</h3>
    {% else %}
      {% capture year %}{{ post.date | date: '%Y' }}{% endcapture %}
      {% capture nyear %}{{ post.next.date | date: '%Y' }}{% endcapture %}
      {% if year != nyear %}
        <h3>{{ post.date | date: '%Y' }}</h3>
      {% endif %}
    {% endunless %}

    <li>{{ post.date | date:"%b" }} <a href="{{ post.url }}">{{ post.title }}</a></li>
  {% endfor %}
</ul>

which was shamelessly ripped off from http://blog.tracefunc.com/2009/12/04/jekyll-custom-liquid-tags/

Importing a Blosxom blog into Jekyll

As mentioned, I recently decided to move my blog from a self-hosted, Blosxom-driven mostly-manual set up to github pages.

This involved these main steps:

  • Set up a github repo to hold the templates and source text
  • Migrate templates from Blosxom's templating language to Jekyll/Liquid
  • Import the content

I won't cover the first two in detail here. Setting up a repository for pages is well documented by github, and migrating the templates was relatively straightforward--I used the code behind Simon Harris's blog as a starting point. (Getting the archive page working was slightly more interesting. I'll write more on this later.)

There were two parts to importing the content. Firstly, the directory layout expected by Jekyll is slightly different to that I was using in Blosxom.

Here is what I had:

.
|-- 2009
|   `-- 04
|       |-- an-interesting-story.txt
|       `-- something-else.txt
|-- 2010
|   |-- 01
|   |   |-- happy-new-year.txt
|   |   `-- headache.txt
|   `-- 08
|       `-- migrating-blog.txt

Jekyll wants a much flatter directory layout, with all the files in a single directory and the date as part of the file name:

.
`-- _posts
    |-- 2009-04-01-an-interesting-story.md
    |-- 2009-04-19-something-else.md
    |-- 2010-01-01-happy-new-year.md
    |-- 2010-01-02-headache.md
    `-- 2010-08-04-migrating-blog.md

The trick was that Jekyll wanted a day, but I only encoded the year and month in my Blosxom file structure. Luckily, I was using the Blosxom entries_index plugin, which stores Unix-style timestamps for every entry it publishes. So I wrote a little Clojure program to read the entries_index cache and derive a Jekyll-style file name for every entry:

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)

(import 'java.util.Date 'java.text.SimpleDateFormat)

(def entry-index
  (read-lines (first *command-line-args*)))

(defn parse-line [line]
  (let [[_ filename timestamp] (re-matches #".*'(.+)'.*\s+(\d+).*" line)]
    {:filepath filename :timestamp timestamp}))

(defn date [timestamp] (Date. (* 1000 (Long/valueOf timestamp))))
(defn date-str [date] (. (SimpleDateFormat. "yyyy-MM-dd") format date))
(defn filename [path] (last (re-split #"/" path)))
(defn md-ext [s] (re-sub #".txt$" ".md" s))
(defn valid? [line] (not (nil? (:timestamp line))))

(defn target-file-name [entry]
  (str (date-str (date (entry :timestamp))) "-" (md-ext (filename (entry :filepath)))))

(def entries (filter valid? (map parse-line entry-index)))

(defn copy-command [entry]
  (str "cp " (entry :filepath) " " (target-file-name entry)))

(println (str-join "\n" (map copy-command entries)))

Note that this program doesn't actually do anything, it just outputs a bunch of "cp" commands that you can feed into a shell.

The second step is to add a block of YAML "front matter" to each file that Jekyll uses to parse the file and generate the appropriate output. This front matter is of the form:

---
layout: post
title: Blog migration
---

This tells Jekyll which template to use, and what to use for a title. The Blosxom source files don't contain any such front matter, but do have the post's title as their first line. A simple bit of sed wrote the appropriate opening lines of each file:

1,1 s/\([^-].*\)/---\
layout: post\
title: \1\
---/g

I invoked it like this:

for f in `ls  _posts/*`
    do sed -f ~/Projects/migrate-blosxom-to-jekyll/insert_front_matter.sed -i "" $f
done

And that was more or less that! The above code is available on github at http://github.com/mrowe/migrate-blosxom-to-jekyll, and of course the entire content of my blog is at http://github.com/mrowe/mrowe.github.com.

Blog migration

And in the latest installment in an ongoing tradition... I've moved my blog! This time, to github pages. Now they can worry about keeping servers running, and generating HTML from my text when I commit and all of those little details.

The migration was relatively painless--more details on the mechanics to follow. But if you can see this, it worked!

(Aside: all the templates and content that runs this blog is available on github.)

First adventures in Clojure

I've been banging on to anyone who'd listen for ages now about how Clojure is going the be the Next Big Thing. I read a fair way into Stuart Halloway's Programming Clojure, and I played in the REPL a bit here and there, but I never got around to doing anything serious with it.

Today I finally found an excuse to use Clojure at work for a real-world problem. I needed to write a small program to read a product feed in CSV format, and cross-check that all the products in the feed actually exist in the live product catalogue database.

Here is my somewhat naïve attempt at implementing a solution:

;;
;; Read a CSV file and look up the product ids it contains in a
;; database. Report all the products in the CSV that do not exist in
;; the database.
;;
;; Usage: $0 <path-to-csv-file>
;;

(import 'java.io.FileReader 'au.com.bytecode.opencsv.CSVReader)

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.sql)

;; OpenCSV gives us a List of String[]s... ugh.
(defn read-csv [file-name]
  (with-open [reader (CSVReader. (FileReader. file-name))]
     (rest ;; skip the header row
      (map seq (seq (. reader readAll))))))

;; extract interesting fields from a CSV row
(defn product-from [row]
  {:product-id (nth row  0 "")
   :title      (nth row  1 "")})

;; set up the db connection
(def db {:classname   "org.h2.Driver"
         :subprotocol "h2"
         :subname (str "file:///Users/mrowe/.h2data/mydata")
         :user     "sa"
         :password ""})

(defn sql-query [q]
  (with-query-results res q (doall res)))

(defn count-products [product-id]
  (:count
   (first
    (sql-query ["select count(1) as count from product where id = ?" product-id]))))

(defn exists? [product-id]
   (>= (count-products product-id) 1))

(defn product-missing? [csv-row]
  (let [product (product-from csv-row)]
    (not (exists? (product :product-id)))))

;;;;;;;;;;

(def filename (first *command-line-args*))
(def feed (read-csv filename))

(defn report-product-id [row]
  (let [product (product-from row)]
    (format "Not in product catalog: %s - %s" (product :product-id) (product :title))))

(with-connection db 
  (println (str-join "\n" (map report-product-id (filter product-missing? feed)))))

This was purely an exercise in thinking functionally, and figuring out the basics of driving Clojure and getting it to interact with the world around it. I've made no attempt to actually use one of Clojure's headline features, concurrency. (For what it's worth, it happily processes an input of 2500 rows in a few seconds, most of which is spent in the database--I doubt there's much to be gained from parallelising it.) But I think it reads pretty well, and is at least as concise and expressive as the equivalent Ruby would have been--once you learn to see through all the parentheses. ;-)

Let me know what you think!

Update: I've put the above code on github: http://gist.github.com/505633