One of the cornerstones of modern software engineering is dependancy management systems. Think Bundler, Leiningen, or (forgive me) Maven. We stand on the shoulders of giants when we write our apps, and we need a way of specifying which giants. Modern systems like RubyGems are pretty good at this. But not perfect.
I have a dream
I have a simple dream. All I want is this: I want to be able to checkout your project on any computer, install the appropriate language runtime and dependency manager and type make (or rake, or lein, or ./build.sh, or ...) and have a running system.
None of this is new. Joel said it, Twelve Factor App says it. But surprisingly few people seem to actually do it.
Undeclared dependencies are the root of all evil
It's very easy as a developer to introduce dependencies into your project without even realising. Our workstations get all sorts of cruft installed on them over time, and the chances are something lying around fulfills an undeclared transitive dependency for the library you just installed. But the next developer may not be so lucky.
I don't want to have to find out by trial and error what native libraries your code depends on. I don't want to belatedly discover some assumptions you made about what else would be running on my computer. I just want to type make.
So what's the point?
Just this:
Your project's build system has the responsibilty to install any library that is required to support your code.
Hopefully most of these will be taken care of by your dependency management system. But for those that aren't (e.g. native libraries that are required by Ruby Gems) your build system needs to make sure they are installed.
I have released version 0.2.0 of clj-aws-ec2. This version contains no changes from 0.1.11. I'm just trying to adhere more closely to semantic versioning, having been fairly slack about it so far.
This version does however contain many changes since I last mentioned it here. It can now describe, create and delete tags on resources, and create and deregister images (AMIs).
I consider this more or less "feature complete" for my current purposes. Of course, it only covers a very small fraction of the available EC2 SDK but hopefully it is on the right side of the 80/20 rule. :-) I am open to feature requests—or even better pull requests—for further elements of the API that you would like to see supported.
One of my projects at work is to build an internal web service around AWS to support our internal tooling. (This led to the development my clj-aws-ec2 library.)
The web service needs "integration" tests that exercise its RESTful API to manipulate AWS resources (i.e. create instances, add tags, etc.). This sort of testing is fraught for many reasons and should be kept to a minimum, but it does provide a bit of an assurance that the service will actually respond to its published interface when deployed.
One of the reasons this sort of testing is fraught is that it depends on an external service that is beyond our control (i.e. AWS). Many things can go wrong when talking to AWS, and everything takes time. So my test needs to invoke the service to perform an action, then wait until the expected state is achieved (or a timer elapses causing the test to fail). What I'd like to be able to write is something like:
(deftest ^:integration instance-lifecycle
(testing "create instance"
(def result (POST "/instances" (with-principal {:name "rea-ec2-tests/int-test-micro", :instance-type "t1.micro"})))
(has-status result 200)
(let [id (first (:body result))]
(prn (str "Created instance " id))
(testing "get instance"
(has-status (GET (str "/instances/" id)) 200)
(is (wait-for-instance-state id "running")))
(testing "stop instance"
(has-status (PUT (str "/instances/" id "/stop")) 200)
(is (wait-for-instance-state id "stopped")))
(testing "start instance"
(has-status (PUT (str "/instances/" id "/start")) 200)
(is (wait-for-instance-state id "running")))
(testing "delete instance"
(has-status (DELETE (str "/instances/" id)) 200)
(is (wait-for-instance-state id "terminated"))))))
But how do you write a polling loop in Clojure? A bit of clicking around on Google led me to a function written by Chas Emerick for his bandalore library:
;; https://github.com/cemerick/bandalore/blob/master/src/main/clojure/cemerick/bandalore.clj#L124
(defn polling-receive
[client queue-url & {:keys [period max-wait]
:or {period 500
max-wait 5000}
:as receive-opts}]
(let [waiting (atom 0)
receive-opts (mapcat identity receive-opts)
message-seq (fn message-seq []
(lazy-seq
(if-let [msgs (seq (apply receive client queue-url receive-opts))]
(do
(reset! waiting 0)
(concat msgs (message-seq)))
(do
(when (<= (swap! waiting + period) max-wait)
(Thread/sleep period)
(message-seq))))))]
(message-seq)))
That seems pretty close! I generalised it a bit to remove dependencies on Chas's messaging routines and just take a predicate function:
Finally, a couple of helper functions to tie it all together and enable the tests to be written as above:
(defn get-instance-state [id] (:state (:body (GET (str "/instances/" id)))))
(defn wait-for-instance-state [id state] (wait-for #(= (get-instance-state id) state)))
There's a couple of improvements that could be made to wait-for, the most obvious being to use a "wall clock" for the timeout. The current implementation will actually wait for timeout + (time-to-evaluate-predicate * number-of-invocations) which is probably not what you want, especially when the predicate could take a non-trivial amount of time to evaluate because it is invoking an external service.
Comments and improvements welcome!
UPDATE: My colleague Eric Entzel pointed out that there is no need to use an atom to store and update the "waiting" counter, its state can just be passed around with function invocations (and recursion). The above gist has been simplified to reflect this observation.
UPDATE: Even better, when I went to implement the "wall clock" timeout, I realised there is no need to maintain any state at all, since the absolute timeout time can be calculated up front and compared to the system clock on each evaluation. (I also flipped the timeout test and the sleep, to more accurately relfect the intent of a timeout.) Gist updated again.
UPDATE: And finally, Adam Fitzpatrick noticed that there's no longer any need to let bind the poller function to a symbol, we can just put its contents in the main function body. Gist updated again.
We use Amazon's AWS quite heavily at work, and part of my job
involves building internal tools that wrap the public AWS API to
provide customised internal services.
I am building some of these tools in Clojure, and I needed a way
to call the Amazon API. Amazon provide a Java SDK so it's a fairly
simple matter to wrap this in Clojure. In fact James Reeves had
already done so for the S3 API. So I took his good work and
adapted it to work with the EC2 components of the API:
https://github.com/mrowe/clj-aws-ec2
The library tries to stay true to Amazon's official Java SDK, but with
an idiomatic Clojure flavour. In particular, it accepts and returns
pure Clojure data structures (seqs of maps mostly). For example:
user=> (require '[aws.sdk.ec2 :as ec2])
user=> (def cred {:access-key "..." :secret-key "..."})
user=> (ec2/describe-instances cred (ec2/instance-id-filter "i-b3385c89"))
({:instances
({:id "i-b3385c89",
:state {:name "running",
:code 272},
:type "t1.micro",
:placement {:availability-zone "ap-southeast-2a",
:group-name "",
:tenancy "default"},
:tags {:node-name "tockle",
:name "mrowe/tockle",
:environment "mrowe"},
:image "ami-df8611e5",
:launch-time #<Date Tue Nov 13 08:23:09 EST 2012>}),
:group-names (),
:groups ({:id "sg-338f1909", :name "quicklaunch-1"})})
This is still a work in progress. So far, you can describe instances
and images, and stop and start EBS-backed instances. I plan to work on
adding create/terminate instances next.
UPDATE: I just released v0.1.6 which includes run_instance and
terminate_instance support.
In my previous post I described an update to an Emacs Anything source to "Find files in a git project". This works great if you are inside an git-managed project, but fails horribly if you are not.
Here is a version that fixes that:
(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
'((name . "Files from Current GIT Project")
(init . (lambda ()
(let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
(file-name-directory (buffer-file-name))
default-directory)))
(top-dir (if git-top-dir
(file-truename git-top-dir)
default-directory))
(default-directory top-dir)
(signature (magit-rev-parse "HEAD")))
(unless (and anything-c-source-git-project-files-cache
(third anything-c-source-git-project-files-cache)
(equal (first anything-c-source-git-project-files-cache) top-dir)
(equal (second anything-c-source-git-project-files-cache) signature))
(if (third anything-c-source-git-project-files-cache)
(kill-buffer (third anything-c-source-git-project-files-cache)))
(setq anything-c-source-git-project-files-cache
(list top-dir
signature
(anything-candidate-buffer 'global)))
(with-current-buffer (third anything-c-source-git-project-files-cache)
(dolist (filename (mapcar (lambda (file) (concat default-directory file))
(magit-git-lines "ls-files")))
(insert filename)
(newline))))
(anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))
(type . file)
(candidates-in-buffer)))
As a diff from the previous version:
@@ -2,9 +2,12 @@
(defvar anything-c-source-git-project-files
'((name . "Files from Current GIT Project")
(init . (lambda ()
- (let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
- (file-name-directory (buffer-file-name))
- default-directory))))
+ (let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
+ (file-name-directory (buffer-file-name))
+ default-directory)))
+ (top-dir (if git-top-dir
+ (file-truename git-top-dir)
+ default-directory))
(default-directory top-dir)
(signature (magit-rev-parse "HEAD")))
If you use Emacs you really should take a look at Anything. When you do, you'll probably want to use it to replicate TextMate's fabled "Go to file...". Ken Wu wrote a nice little anything-source that uses git to derive a file list for a project, but he was obviously using an old version of magit. Here's a tweaked version of his code that works with Magit v1.1.1:
(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
'((name . "Files from Current GIT Project")
(init . (lambda ()
(let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
(file-name-directory (buffer-file-name))
default-directory))))
(default-directory top-dir)
(signature (magit-rev-parse "HEAD")))
(unless (and anything-c-source-git-project-files-cache
(third anything-c-source-git-project-files-cache)
(equal (first anything-c-source-git-project-files-cache) top-dir)
(equal (second anything-c-source-git-project-files-cache) signature))
(if (third anything-c-source-git-project-files-cache)
(kill-buffer (third anything-c-source-git-project-files-cache)))
(setq anything-c-source-git-project-files-cache
(list top-dir
signature
(anything-candidate-buffer 'global)))
(with-current-buffer (third anything-c-source-git-project-files-cache)
(dolist (filename (mapcar (lambda (file) (concat default-directory file))
(magit-git-lines "ls-files")))
(insert filename)
(newline))))
(anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))
(type . file)
(candidates-in-buffer)))
I tried to update the Emacs Wiki page to include this fix but couldn't. Not sure what I was doing wrong... The changes I made to Ken's code:
@@ -6,7 +6,7 @@
(file-name-directory (buffer-file-name))
default-directory))))
(default-directory top-dir)
- (signature (magit-shell (magit-format-git-command "rev-parse --verify HEAD" nil))))
+ (signature (magit-rev-parse "HEAD")))
(unless (and anything-c-source-git-project-files-cache
(third anything-c-source-git-project-files-cache)
@@ -20,10 +20,14 @@
(anything-candidate-buffer 'global)))
(with-current-buffer (third anything-c-source-git-project-files-cache)
(dolist (filename (mapcar (lambda (file) (concat default-directory file))
- (magit-shell-lines (magit-format-git-command "ls-files" nil))))
+ (magit-git-lines "ls-files")))
(insert filename)
(newline))))
(anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))
We came across an exciting Chef bug today.
Chef tracks metadata about nodes in its database. This includes
operational facts about the node (uptime, memory, etc.), and
chef-related things like when the node last checked in. It also
includes intentional data such as what run list should be applied to
the node.
Periodically, a node polls its server for updates. What happens is:
node checks in with server
node gets current metadata from server, including its run list of
recipes and roles
node performs actions as per the run list
node saves its metadata back to the server, including the run list
it just applied
All well and good, except that step three can potentially be long
running. There's plenty of time for an administrator to change the
node's desired run list (or other intentional metadata) using the
knife tool or the web interface. But now, when the node's run
completes, it saves its old state back to the server, over-writing
whatever updates an administrator applied while it was running. And
you won't know unless you look.
This is unfortunate.
There's a bug that more or less describes this in the project's
tracker. It was raised quite recently, so hopefully someone from the
Chef team will take a look at it soon. There's also a thread on
the Chef mailing list.
One thing Jekyll doesn't provide out of the box (as fas I can
tell) is any sort of archive functionality. (Aside: I really like what
Tumblr does for archives.)
I would have liked something a bit more flexible, but for now this
site's archive displays a list of all entries grouped by year.
Here's the template code I'm using:
<h2>Archives</h2>
<ul>
{% for post in site.posts %}
{% unless post.next %}
<h3>{{ post.date | date: '%Y' }}</h3>
{% else %}
{% capture year %}{{ post.date | date: '%Y' }}{% endcapture %}
{% capture nyear %}{{ post.next.date | date: '%Y' }}{% endcapture %}
{% if year != nyear %}
<h3>{{ post.date | date: '%Y' }}</h3>
{% endif %}
{% endunless %}
<li>{{ post.date | date:"%b" }} <a href="{{ post.url }}">{{ post.title }}</a></li>
{% endfor %}
</ul>
which was shamelessly ripped off from
http://blog.tracefunc.com/2009/12/04/jekyll-custom-liquid-tags/
As mentioned, I recently decided to move my blog from a
self-hosted, Blosxom-driven mostly-manual set up to github pages.
This involved these main steps:
- Set up a github repo to hold the templates and source text
- Migrate templates from Blosxom's templating language to
Jekyll/Liquid
- Import the content
I won't cover the first two in detail here. Setting up a repository
for pages is well documented by github, and migrating the templates
was relatively straightforward--I used the code behind Simon
Harris's blog as a starting point. (Getting the archive
page working was slightly more interesting. I'll write more on this
later.)
There were two parts to importing the content. Firstly, the directory
layout expected by Jekyll is slightly different to that I was using in
Blosxom.
Here is what I had:
.
|-- 2009
| `-- 04
| |-- an-interesting-story.txt
| `-- something-else.txt
|-- 2010
| |-- 01
| | |-- happy-new-year.txt
| | `-- headache.txt
| `-- 08
| `-- migrating-blog.txt
Jekyll wants a much flatter directory layout, with all the files in a
single directory and the date as part of the file name:
.
`-- _posts
|-- 2009-04-01-an-interesting-story.md
|-- 2009-04-19-something-else.md
|-- 2010-01-01-happy-new-year.md
|-- 2010-01-02-headache.md
`-- 2010-08-04-migrating-blog.md
The trick was that Jekyll wanted a day, but I only encoded the year
and month in my Blosxom file structure. Luckily, I was using the
Blosxom entries_index plugin, which stores Unix-style timestamps for
every entry it publishes. So I wrote a little Clojure program to
read the entries_index cache and derive a Jekyll-style file name for
every entry:
(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)
(import 'java.util.Date 'java.text.SimpleDateFormat)
(def entry-index
(read-lines (first *command-line-args*)))
(defn parse-line [line]
(let [[_ filename timestamp] (re-matches #".*'(.+)'.*\s+(\d+).*" line)]
{:filepath filename :timestamp timestamp}))
(defn date [timestamp] (Date. (* 1000 (Long/valueOf timestamp))))
(defn date-str [date] (. (SimpleDateFormat. "yyyy-MM-dd") format date))
(defn filename [path] (last (re-split #"/" path)))
(defn md-ext [s] (re-sub #".txt$" ".md" s))
(defn valid? [line] (not (nil? (:timestamp line))))
(defn target-file-name [entry]
(str (date-str (date (entry :timestamp))) "-" (md-ext (filename (entry :filepath)))))
(def entries (filter valid? (map parse-line entry-index)))
(defn copy-command [entry]
(str "cp " (entry :filepath) " " (target-file-name entry)))
(println (str-join "\n" (map copy-command entries)))
Note that this program doesn't actually do anything, it just outputs a
bunch of "cp" commands that you can feed into a shell.
The second step is to add a block of YAML "front matter" to each file
that Jekyll uses to parse the file and generate the appropriate
output. This front matter is of the form:
---
layout: post
title: Blog migration
---
This tells Jekyll which template to use, and what to use for a title.
The Blosxom source files don't contain any such front matter, but do
have the post's title as their first line. A simple bit of sed
wrote the appropriate opening lines of each file:
1,1 s/\([^-].*\)/---\
layout: post\
title: \1\
---/g
I invoked it like this:
for f in `ls _posts/*`
do sed -f ~/Projects/migrate-blosxom-to-jekyll/insert_front_matter.sed -i "" $f
done
And that was more or less that! The above code is available on github at
http://github.com/mrowe/migrate-blosxom-to-jekyll,
and of course the entire content of my blog is at
http://github.com/mrowe/mrowe.github.com.
And in the latest installment in an ongoing tradition...
I've moved my blog! This time, to github pages. Now they can worry
about keeping servers running, and generating HTML from my text when I
commit and all of those little details.
The migration was relatively painless--more details on the mechanics
to follow. But if you can see this, it worked!
(Aside: all the templates and content that runs this blog is available
on github.)
I've been banging on to anyone who'd listen for ages now about how
Clojure is going the be the Next Big Thing. I read a fair way
into Stuart Halloway's Programming Clojure, and I played in the
REPL a bit here and there, but I never got around to doing anything
serious with it.
Today I finally found an excuse to use Clojure at work for a
real-world problem. I needed to write a small program to read a
product feed in CSV format, and cross-check that all the products in
the feed actually exist in the live product catalogue database.
Here is my somewhat naïve attempt at implementing a solution:
;;
;; Read a CSV file and look up the product ids it contains in a
;; database. Report all the products in the CSV that do not exist in
;; the database.
;;
;; Usage: $0 <path-to-csv-file>
;;
(import 'java.io.FileReader 'au.com.bytecode.opencsv.CSVReader)
(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.sql)
;; OpenCSV gives us a List of String[]s... ugh.
(defn read-csv [file-name]
(with-open [reader (CSVReader. (FileReader. file-name))]
(rest ;; skip the header row
(map seq (seq (. reader readAll))))))
;; extract interesting fields from a CSV row
(defn product-from [row]
{:product-id (nth row 0 "")
:title (nth row 1 "")})
;; set up the db connection
(def db {:classname "org.h2.Driver"
:subprotocol "h2"
:subname (str "file:///Users/mrowe/.h2data/mydata")
:user "sa"
:password ""})
(defn sql-query [q]
(with-query-results res q (doall res)))
(defn count-products [product-id]
(:count
(first
(sql-query ["select count(1) as count from product where id = ?" product-id]))))
(defn exists? [product-id]
(>= (count-products product-id) 1))
(defn product-missing? [csv-row]
(let [product (product-from csv-row)]
(not (exists? (product :product-id)))))
;;;;;;;;;;
(def filename (first *command-line-args*))
(def feed (read-csv filename))
(defn report-product-id [row]
(let [product (product-from row)]
(format "Not in product catalog: %s - %s" (product :product-id) (product :title))))
(with-connection db
(println (str-join "\n" (map report-product-id (filter product-missing? feed)))))
This was purely an exercise in thinking functionally, and figuring out
the basics of driving Clojure and getting it to interact with the
world around it. I've made no attempt to actually use one of Clojure's
headline features, concurrency. (For what it's worth, it happily
processes an input of 2500 rows in a few seconds, most of which is
spent in the database--I doubt there's much to be gained from
parallelising it.) But I think it reads pretty well, and is at least
as concise and expressive as the equivalent Ruby would have been--once
you learn to see through all the parentheses. ;-)
Let me know what you think!
Update: I've put the above code on github: http://gist.github.com/505633
VMWare Fusion has a "shared folders" feature which allows you to
seamlessly share folders on the host Mac system with the virtualised
guest OS. With a Linux guest, vmware-tools will install the
"Host-Guest File System" (hgfs) driver and add an entry to
/etc/fstab to automagically mount all shared folders under
/mnt/hgfs.
This is great, but unless your user id in the Linux guest happens to
match your user id OS X, you will not be able to access the mounted
directories as a regular user. Luckily, you can get the hgfs driver to
mount the shared folders as your user. Edit /etc/fstab as root:
$ sudo vi /etc/fstab
and look for a section like:
# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0
# End of the block added by the VMware software
Add options for uid and gid:
# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5,uid=1000,gid=1000 0 0
# End of the block added by the VMware software
The values I've used, 1000 for uid and gid, are the defaults for the
first user created on an Ubuntu desktop install. To find the correct
values for your user, run the id command in the guest OS:
$ id
uid=1000(mrowe) gid=1000(mrowe) groups=...