Someone added a required customer_id field to an event and deployed the producer. Half the consumers hadn’t shipped the change yet. For twenty minutes, every message that older consumers had handled fine for a year started failing validation — until the rollout caught up. No code was “wrong.” A contract was changed out from under the people bound by it.

A message’s data shape is an API. The fact that it’s JSON on a queue instead of a response body doesn’t make it less of a contract — it makes it a harder one, because you can’t change both sides at once.

You can’t change a distributed contract atomically

An HTTP API has one server you control. A published message has many consumers, deployed independently, on their own schedules — and with at-least-once delivery, in-flight messages produced under the old shape are still arriving while you roll out the new one. There is no moment where “everyone is on the new schema.” So the only safe changes are the ones where old and new can coexist.

That single constraint gives you the whole rule set.

What’s safe, and what isn’t

A change is backward-compatible if data valid under the old schema is still valid under the new one — so a consumer that upgraded early still accepts messages a not-yet-upgraded producer emits.

  • Add an optional field — safe. Old data simply lacks it.
  • Drop a required constraint (make a field optional) — safe. Old data still satisfies the looser rule.
  • Widen an enum, relax a minimum — safe.

And the ones that bite:

  • Add a required field, or make an optional field required — breaking. Old data omits it.
  • Remove, rename, or retype a field — breaking. A rename is just a remove plus an add.
  • Tighten the rules — drop an enum value, raise a minimum, close additionalProperties — breaking.

The asymmetry is the point: loosening is safe, tightening is not. A consumer can tolerate data that’s more permissive than it expected far more easily than data that’s missing something it now demands.

When it’s breaking, version the identity — don’t mutate

The instinct on a breaking change is to “just update the schema.” Don’t. Mutating the shape behind an existing identity is exactly what broke the rollout above. Instead, mint a new identity — a new message URN (orders.created.v2), a new topic, a new event name — and run both in parallel:

  1. Producers keep emitting v1; you publish v2 alongside it.
  2. Consumers migrate to v2 on their own schedules.
  3. When v1 has no consumers left, retire it.

The rule underneath every step: consumers upgrade before producers. Never emit a version no deployed consumer understands. (This is the same shape as designing the handler so a duplicate delivery is a no-op — you make the change safe to apply in any order, because you don’t control the order.)

Make the rule mechanical

None of this is judgment you want to re-derive in a code review at 5 p.m. on a Friday. The compatibility rules above are deterministic — given the old and new schema, a tool can tell you “additive, ship it” or “breaking, mint a new version” with no opinion involved. So I gate it: a check at the boundary that diffs the two schemas and fails the build on a breaking change, before it reaches anyone downstream. Catching it there costs a comment on a PR; catching it in production costs a rollout window like the one above.

This is the boring-infrastructure end of the same thesis as “the bottleneck moved downstream”: the schema is cheap to change and expensive to change wrongly, so you spend a little tooling to keep the second cost off the table.

When does this stop mattering?

If a message has exactly one producer and one consumer that deploy together — a single service’s internal queue — the contract isn’t really distributed, and you can change both sides at once. Then the ceremony is overhead. The rules earn their keep the moment a second, independently-deployed consumer exists. Most events that outlive their first month reach that point.


A schema isn’t a struct you own; it’s a promise other services planned around. Evolve it the way you’d evolve any promise you can’t unmake — additively, or under a new name. Never by quietly redefining the old one.


See also: the BabelQueue schema-validation spec writes these compatibility rules down, and the babelqueue-registry is where per-URN schemas live — its bqschema tool (and packaged Action) is the boundary check that fails the build on a breaking change.