Distributed Tracing Across a Polyglot Queue

A message is produced by a PHP service, lands on a queue, is consumed by a Go worker, which publishes a follow-up that a Python service handles. Four hops, three languages, one or more brokers in between. When that flow is slow, or one message in ten thousand dies, the question is always the same: where did this message actually go, and what happened at each step?

The honest answer, for most polyglot queue systems, is that nobody knows. You have a correlation id in the logs — if you remembered to log it in all three languages — and you have the patience to grep across three log streams on different hosts. What you do not have is a picture. You cannot see “produced by PHP in 2ms → sat in Redis for 40ms → processed by Go in 210ms, retried twice, dead-lettered” as one connected thing.

That picture is a distributed trace, and OpenTelemetry is the standard way to draw it. This piece is about adding it to a message standard whose wire format is frozen and whose cores carry zero dependencies — two constraints that, together, make the obvious approach illegal and force a more interesting one.

The one rule up front

Let me state it before anything else: do not add a field to the envelope.

The instinct, when you want distributed tracing across a message bus, is to carry a W3C traceparent — the standard 55-character string that encodes a trace id, a span id, and flags. HTTP does exactly this in a header. The instinct is correct for HTTP and wrong here, because the envelope is a frozen contract. Every SDK in every language emits the byte-identical shape job, trace_id, data, meta, attempts. Adding a traceparent field — even an optional one — changes that shape, which means a version bump, which means coordinating a wire change across six language implementations and every broker binding. For a feature that is supposed to be optional observability, that is an absurd price.

So the rule is the constraint: solve tracing without touching the wire. Everything below follows from taking that seriously.

The insight nobody uses

Here is the thing the envelope already gives you for free. The trace_id field is a correlation id — a UUID, minted at produce time and forwarded unchanged across every hop. It is already on the wire, already propagated, already the one value that ties the whole flow together.

Now look at what OpenTelemetry uses to tie a trace together: a TraceID. In the spec, a TraceID is exactly 16 bytes.

A UUID is exactly 16 bytes.

That is the whole trick. A trace_id UUID maps one-to-one onto an OTel TraceID — strip the hyphens, read the 32 hex characters as the 16-byte id, done. (A trace_id that is not a UUID — say it came from a non-babelqueue producer — gets hashed to 16 bytes with SHA-256, deterministically.) Every hop that shares a trace_id therefore derives the same OTel TraceID, with no agreement protocol and no new field. The correlation id you were already carrying is the distributed trace.

So the design writes itself:

On the consumer side, wrap the handler. Before it runs, start a span named process <urn>, but force that span into the trace derived from the message’s trace_id. Tag it with the messaging conventions — messaging.system, messaging.destination.name, messaging.message.id, and messaging.message.conversation_id set to the trace_id itself — run the handler, and record any exception as the span’s error status. The runtime’s retry / dead-letter behaviour is untouched; the span just observes it.
On the producer side, the mirror. Open a publish <urn> span, take its trace id, format it back into a UUID, and stamp that into the message’s trace_id as you build the envelope. The downstream consumer, deriving its TraceID from that same trace_id, lands in the same trace.

Wire a TracerProvider for Jaeger, Tempo, Honeycomb or Datadog and the flow shows up as one waterfall, across all three languages, with per-hop timing and the retry/error markers in place. Don’t wire one, and nothing changes — it is entirely opt-in.

The mechanic, and a phantom

There is one detail worth being precise about, because it is where the design is both clever and limited.

To start a span inside a specific trace in OpenTelemetry, you give it a remote parent — a span context carrying the trace id you want. But a span context is only valid if it has both a trace id and a span id. We have the trace id (from trace_id); we do not have the upstream span’s id, because we deliberately did not propagate one. So we synthesize a deterministic, non-zero span id by hashing the trace_id. The parent is valid, the consumer span lands in the right trace — but its parent points at a span that never existed. A phantom.

This is the honest limit, and it is worth stating plainly. Cross-hop spans all share one trace — you can see every step of a message’s life grouped together, timed, with errors marked. What you do not get is exact parent-child linkage between hops: the consumer’s span is not wired as the child of the producer’s span, because the producer’s real span id was never carried across the wire. Within a single process the hierarchy is correct; across the queue it is flat under one trace.

Getting true cross-hop parent-child back means propagating a span id, which means a traceparent — and since the envelope is frozen, that traceparent has to ride out of band, as a transport header alongside the bq-trace-id the brokers already carry. That is a real feature, but it is a different scale of work: it touches every transport binding in every SDK. So it is a deliberate phase two. The phase-one design above delivers correlation, per-hop timing and error/retry visibility — the 90% — with zero wire change and a few hundred lines per language. Shipping the 90% now and the last 10% behind a bigger lift is the right order, the same way you do the cheap scaling steps before the expensive ones.

One semantic, six packaging idioms

The constraint that the core stays dependency-free has a consequence: the OpenTelemetry code cannot live in the core, because it has to import the OTel API. So in each language it lives beside the core, reached only when you opt in — and “beside the core, optional” is spelled differently in every ecosystem. The semantics are identical in all six; the packaging is where each language shows its personality:

Go — a separate module (babelqueue-go/otel, its own go.mod), exactly like the transport submodules. The core module never sees the OTel dependency.
Python — an [otel] extra. pip install babelqueue[otel] pulls opentelemetry-api; the module imports it, so it is only importable when you asked for it. A TraceID here is a 128-bit int, not bytes — same value, different shape.
Node — a subpath export, @babelqueue/core/otel, with @opentelemetry/api as an optional peer dependency. Critically, the tracing code is not re-exported from the package root: if it were, importing @babelqueue/core would eagerly load the OTel import and break for anyone who didn’t install the peer. The subpath keeps the main entry truly dependency-free.
Java — an optional Maven dependency on opentelemetry-api. Optional dependencies are not transitive, so a consumer who never touches the tracing classes never pulls OTel onto their classpath.
.NET — the interesting one. The idiomatic tracing primitive in .NET is System.Diagnostics.ActivitySource, which lives in the base class library — and it is exactly what OpenTelemetry .NET is built on. So the .NET module needs no dependency at all: it emits Activity objects, and the consumer’s OTel pipeline collects them by calling AddSource("BabelQueue"). The core stays zero-dep not by isolating the dependency but by not having one.
PHP — a Composer suggest plus a dev requirement, mirroring the existing optional helpers. The tracing namespace is there; open-telemetry/api is only needed if you use it.

Six idioms, one rule held in all of them: using observability is a choice the consumer makes, never a tax the core charges.

What you actually get

Strip away the mechanics and the payoff is small to describe and large to have. In your existing tracing backend, a message that used to be a needle in three log haystacks becomes a single waterfall: which service produced it, how long it waited on the broker, how long each consumer took, whether it was retried, whether it was dead-lettered — across languages and brokers you never had to make agree on anything but a UUID they were already passing around.

And the cost of that, on the wire, is nothing. The envelope that shipped before tracing existed and the envelope that ships with it are byte-for-byte identical. The trace was hiding in the trace_id the whole time; all the work was in noticing that a UUID and a TraceID are the same sixteen bytes.

See also: the BabelQueue observability spec writes this design down as a standard, and the SDKs that implement it — across PHP, Go, Python, Node, Java and .NET — live in the BabelQueue ecosystem.