Chef doesn't lock node data when updating

We came across an exciting Chef bug today.

Chef tracks metadata about nodes in its database. This includes operational facts about the node (uptime, memory, etc.), and chef-related things like when the node last checked in. It also includes intentional data such as what run list should be applied to the node.

Periodically, a node polls its server for updates. What happens is:

All well and good, except that step three can potentially be long running. There’s plenty of time for an administrator to change the node’s desired run list (or other intentional metadata) using the knife tool or the web interface. But now, when the node’s run completes, it saves its old state back to the server, over-writing whatever updates an administrator applied while it was running. And you won’t know unless you look.

This is unfortunate.

There’s a bug that more or less describes this in the project’s tracker. It was raised quite recently, so hopefully someone from the Chef team will take a look at it soon. There’s also a thread on the Chef mailing list.