Building Real-Time Streaming Chat with Server-Sent Events
One of the most-requested features during our beta was real-time streaming — seeing tokens appear as the model generates them, rather than waiting for the full response. In this post we'll walk through how we implemented it end-to-end.
The architecture
Our streaming pipeline has three layers:
- Backend — The API returns a
text/event-streamresponse. Each SSE event carries a JSON payload with a token chunk, source citations (when complete), and a final[DONE]sentinel. - Client SDK — Our
@egret/apipackage exposes astreamQuery()method that wraps the nativeEventSourceAPI and returns an async iterator. - React hook —
useStreamingQueryconsumes the iterator, buffers tokens, and exposes{ text, citations, isStreaming }as reactive state.
Why SSE over WebSockets?
For our use case SSE was the better fit. LLM responses are unidirectional — the client sends one request and receives a long stream of tokens. SSE is built on plain HTTP, works through proxies and CDNs, and reconnects automatically. There's no need for the bidirectional channel that WebSockets provide.
Buffering and rendering
Rendering every individual token as a React state update would be wasteful. Instead, we batch tokens in 50 ms windows using requestAnimationFrame and flush them as a single state update. This keeps the UI smooth even on lower-end devices.
We render the accumulated markdown with our @egret/markdown package, which uses a custom renderer built on unified/remark. Syntax highlighting, tables, and citation links are all handled inline.
Error handling
SSE connections can drop. Our hook implements exponential back-off with a maximum of 3 retries. If the connection fails permanently, we surface the partial response with a clear error banner so the user never loses context.
What's next
We're exploring the ReadableStream API as a potential replacement for EventSource, which would give us more control over request headers (including auth tokens) without the proxy workarounds SSE sometimes requires.