Chef doesn't lock node data when updating
We came across an exciting Chef bug today.
Chef tracks metadata about nodes in its database. This includes operational facts about the node (uptime, memory, etc.), and chef-related things like when the node last checked in. It also includes intentional data such as what run list should be applied to the node.
Periodically, a node polls its server for updates. What happens is:
node checks in with server
node gets current metadata from server, including its run list of recipes and roles
node performs actions as per the run list
node saves its metadata back to the server, including the run list it just applied
All well and good, except that step three can potentially be long
running. There’s plenty of time for an administrator to change the
node’s desired run list (or other intentional metadata) using the
knife
tool or the web interface. But now, when the node’s run
completes, it saves its old state back to the server, over-writing
whatever updates an administrator applied while it was running. And
you won’t know unless you look.
This is unfortunate.
There’s a bug that more or less describes this in the project’s tracker. It was raised quite recently, so hopefully someone from the Chef team will take a look at it soon. There’s also a thread on the Chef mailing list.