The next decade of software is streaming. Most of our frameworks still think in turns. Here’s what that mismatch cost us, and why it’s about to cost everyone.
The thirty-second background
If you serve HTTP in any modern stack, you’ve probably been told: propagate the request context everywhere. When the client goes away, the context cancels, your database query gets cut off, your goroutines unwind, and nothing leaks. It’s a beautiful pattern — for normal request/response work.
It is also exactly the wrong pattern for Server-Sent Events.
SSE is a streaming protocol: one HTTP request stays open for minutes or hours while the server pushes events as they happen. Log tails, deployment status, AI completions — anything live. The request lifecycle and the work lifecycle look the same from the outside (both are “the client is connected”), but they aren’t. Conflating them broke three different things for us in production.
Here’s what we learned the hard way.
Bug 1 — The phantom panic
panic: runtime error: invalid memory address or nil pointer dereference
goroutine 255 [running]:
bufio.(*Writer).Flush(...)
The server would crash with no warning. The stack trace pointed at flushing the response body — a thing we’d done a million times.
What was actually happening: a client disconnected, the OS closed the TCP socket, and the next time our SSE worker tried to write a heartbeat the underlying buffered writer dereferenced a nil connection. The crash wasn’t in our code; it was underneath our code, in the standard library.
The fix is small and boring, which is exactly why it works: every goroutine that writes to an HTTP response needs panic recovery. Not because panics are good, but because the writer’s internal state is not part of your API contract. The connection can die between your “is this still open?” check and the actual write. There is no language feature that prevents this. You have to wrap it.
defer func() {
if r := recover(); r != nil {
log.Warn().Interface("panic", r).Msg("client likely disconnected")
}
}()
The takeaway isn’t “Go is unsafe.” It’s: the network is a peer, not a function call. Anything you write to it can fail in ways the type system can’t help you with. Long-lived connections multiply the surface area where that matters.
Bug 2 — The 404s for deployments that existed
A user would open a status stream for a deployment, the page would load, and then the API would return Deployment not found. The deployment existed. The user had created it ten seconds ago.
The logs told us:
ERR Failed to get deployment: context canceled
Here’s what we’d written:
deployment, err := repo.GetByID(r.Context(), id)
if err != nil {
http.NotFound(w, r)
return
}
Looks fine. Is fine, for a normal endpoint. But this was the SSE handler, and on a flaky network the request context could be cancelled before we’d even finished the initial database lookup. Postgres would receive a cancel, return an error, and we’d map “context cancelled” to “404 not found” because we’d lumped every error into the same path.
Two things were wrong:
- We used the request context for setup work that wasn’t writing to the client. The database doesn’t care if the client is still listening; it just needs to answer.
- We treated cancellation as a “this thing doesn’t exist” error. Those are very different states.
The fix was to split the contexts. Setup work that prepares the stream uses context.Background(). Only the loop that actually writes events back to the client uses the request context, because only that loop should stop when the client leaves.
bgCtx := context.Background()
deployment, err := repo.GetByID(bgCtx, id) // never cancels mid-lookup
// ... later, inside the event loop:
case <-r.Context().Done():
return // client left, stop pushing events
The principle generalizes beyond Go. WebSockets, gRPC streaming, even chunked HTTP responses — the lifetime of the connection and the lifetime of the work behind the connection are not the same thing. If you tie them together you’ll get spurious errors every time a phone goes through a tunnel.
Bug 3 — The vanishing background workers
This one was the worst because it looked like a feature.
We had a background log streamer: when a deployment came up, a goroutine would attach to its pods, read their stdout, and fan logs out to any connected viewer. Standard pub/sub stuff. The goroutine was started from inside an SSE handler the first time someone subscribed.
Predictably, when the first subscriber disconnected, the goroutine died. The next subscriber would attach to a dead pipe and see nothing. The deployment looked broken.
The cause was the same shape as Bug 2: we’d started the worker with r.Context() because that’s what was in scope. The worker’s lifetime was now bound to the first viewer’s HTTP connection.
The fix is conceptually simple — start workers with context.Background() so they outlive any individual viewer — but it forced us to articulate a rule that’s easy to violate in a code review:
The context you pass to a function should describe whose deadline matters, not what’s convenient in scope.
If a goroutine should keep running after the current request ends, it does not get the request’s context. It gets its own, owned by whatever does its bookkeeping (a manager, a service, an errgroup).
We added a follow-up: stop the worker when the last subscriber leaves, not the first. That’s a reference-counting problem, not a context problem, and it deserves its own logic — not an accidental coupling.
The mental model that prevents all three
After cleaning up, we found ourselves drawing the same picture over and over. It’s worth internalizing:
[ client ] ────── HTTP request ──────► [ handler ]
│
│ starts / subscribes
▼
[ worker ]
│
│ reads from
▼
[ source: DB, k8s, redis, queue ]
There are at least three independent lifetimes in that picture:
- The client’s connection — short, fragile, cancels for any reason.
- The work that produces events — should usually outlive any single viewer.
- The data sources themselves — outlive everything; they have their own deadlines.
A single r.Context() cannot represent all three. Pretending it can is the bug.
Concretely, in our SSE handlers we now follow three rules:
- Setup (DB lookups, k8s reads, subscribing to a stream) uses a fresh background context. It must not be cancelled by a flaky client.
- The event-push loop uses
r.Context(). When the client leaves, this loop stops — and only this loop. - Background workers own their own context, started by whatever spawned them. They never inherit a request context unless they’re genuinely scoped to a single request.
We also distinguish cancellation from absence at every boundary: a context.Canceled error from the database is not a “not found.” It’s a “you didn’t wait long enough.” Treating them as the same hides bugs and lies to users.
Takeaways if you don’t write Go
The specifics here are Go-flavored, but the lesson is universal.
- Long-lived streams break assumptions baked into request/response frameworks. SSE, WebSockets, gRPC streaming, AI token streaming — all of them tempt you to staple short-lived primitives onto long-lived problems.
- Pick the right deadline for each piece of work. “How long should the database wait?” and “how long should we keep pushing events?” have different answers. Anything that treats them as the same will eventually misbehave.
- Disconnects are normal, not exceptional. Build the system as if every connection is going to drop in the next ten seconds, because most of them will.
- Synthesize the right user-facing error from the right internal state. “The client gave up” should never present to a user as “the resource is gone.”
The bigger picture
The web we learned to build was a sequence of small, polite exchanges: a request, a response, a clean handoff. Our frameworks, our middleware, our tracing, our error models — all of it was shaped by that rhythm. Cancellation propagating cleanly from a closed connection was a feature, because the connection was the whole interaction.
That assumption is quietly expiring.
The interfaces that define the next decade — live agents, collaborative editors, AI tokens streaming as they’re generated, dashboards that update themselves, IoT fleets reporting in — are not request/response. They are long, intermittent, partial, resumable conversations. The connection is no longer the interaction; it is just the current pipe carrying part of an interaction that began before it and will continue after it. Treating these as enhanced requests is the same category mistake as treating email as enhanced telegrams.
The teams that ship reliable streaming products over the next few years won’t be the ones with cleverer code. They will be the ones who model time and lifetime as first-class concepts in their systems: who can say, out loud and on the whiteboard, which clock owns this work and which deadline applies to which step. Everyone else will spend their roadmap chasing bugs that look like ours did — flaky, intermittent, blamed on the network — without ever naming the underlying confusion.
Our fix wasn’t a library or a pattern. It was a vocabulary. The moment we could distinguish the client’s lifetime, the work’s lifetime, and the source’s lifetime in plain English, the bugs we’d been chasing for weeks evaporated, and the new code was simpler than what it replaced.
If your stack still pretends those three are the same thing, you don’t have a streaming product. You have a request/response product with a streaming-shaped wound. The cure is not a better framework. It’s a clearer story about time.
Comments