Building Real-Time Streaming Chat with Server-Sent Events

One of the most-requested features during our beta was real-time streaming — seeing tokens appear as the model generates them, rather than waiting for the full response. In this post we'll walk through how we implemented it end-to-end.

The architecture

Our streaming pipeline has three layers:

Backend — The API returns a text/event-stream response. Each SSE event carries a JSON payload with a token chunk, source citations (when complete), and a final [DONE] sentinel.
Client SDK — Our @egret/api package exposes a streamQuery() method that wraps the native EventSource API and returns an async iterator.
React hook — useStreamingQuery consumes the iterator, buffers tokens, and exposes { text, citations, isStreaming } as reactive state.

Why SSE over WebSockets?

For our use case SSE was the better fit. LLM responses are unidirectional — the client sends one request and receives a long stream of tokens. SSE is built on plain HTTP, works through proxies and CDNs, and reconnects automatically. There's no need for the bidirectional channel that WebSockets provide.

Buffering and rendering

Rendering every individual token as a React state update would be wasteful. Instead, we batch tokens in 50 ms windows using requestAnimationFrame and flush them as a single state update. This keeps the UI smooth even on lower-end devices.

We render the accumulated markdown with our @egret/markdown package, which uses a custom renderer built on unified/remark. Syntax highlighting, tables, and citation links are all handled inline.

Error handling

SSE connections can drop. Our hook implements exponential back-off with a maximum of 3 retries. If the connection fails permanently, we surface the partial response with a clear error banner so the user never loses context.

What's next

We're exploring the ReadableStream API as a potential replacement for EventSource, which would give us more control over request headers (including auth tokens) without the proxy workarounds SSE sometimes requires.