Streaming
Egret supports real-time response streaming. Set "stream": true in your query request and the response body will be a newline-delimited stream of JSON objects (NDJSON), each with a type field.
Enabling streaming
POST https://api.getegret.com/v1/query/
Content-Type: application/json
Authorization: Bearer egret_...
{
"query": "What are the BCM governance requirements under FFIEC?",
"domain": "business-continuity",
"category": "us",
"mode": "compliance",
"model": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
"knowledge_scope": "private",
"stream": true
}
Event types
The response stream emits three event types in order: sources → one or more text_delta → done.
1. sources
The first event. Contains all retrieved source documents before generation begins.
{
"type": "sources",
"sources": [
{
"uri": "s3://egret-docs/business-continuity/pdf/us/ffiec_bcm_v3.pdf",
"filename": "ffiec_bcm_v3.pdf",
"excerpt": "BCM is the process for management to oversee and implement...",
"score": 0.8438505,
"download_url": "https://..."
}
],
"retrieval_ms": 2255,
"chunks_retrieved": 5,
"insufficient_context": false,
"mode": "compliance",
"session_id": "6143159f-ef21-4a1f-97f1-ee9cb7ac3f08"
}
2. text_delta
One event per token chunk. Concatenate the text values to build the full response.
{"type": "text_delta", "text": "# Business Cont"}
{"type": "text_delta", "text": "inuity Management"}
{"type": "text_delta", "text": " (BCM) According to FFIEC"}
3. done
The final event. Contains token usage, latency metrics, the fully assembled response, sources, and suggestions.
{
"type": "done",
"input_tokens": 9042,
"output_tokens": 598,
"cost_usd": 0.003008,
"model": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
"total_latency_ms": 15565,
"retrieval_latency_ms": 2255,
"generation_latency_ms": 13310,
"full_text": "# Business Continuity Management (BCM)...",
"sources": [ ... ],
"suggestions": [
"What are the key differences between BCP and BCM?",
"How do organizations develop a BCM strategy?"
],
"session_id": "6143159f-ef21-4a1f-97f1-ee9cb7ac3f08",
"message_id": 122
}
Client implementation
Use fetch with a ReadableStream reader to process NDJSON lines as they arrive:
const response = await fetch("https://api.getegret.com/v1/query/", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer egret_..."
},
body: JSON.stringify({
query: "What are the BCM governance requirements under FFIEC?",
domain: "business-continuity",
category: "us",
mode: "compliance",
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"knowledge_scope": "all",
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop(); // keep incomplete line in buffer
for (const line of lines) {
if (!line.trim()) continue;
const event = JSON.parse(line);
if (event.type === "sources") {
console.log("Sources retrieved:", event.sources.length);
} else if (event.type === "text_delta") {
process.stdout.write(event.text);
} else if (event.type === "done") {
console.log("\nTokens used:", event.input_tokens + event.output_tokens);
console.log("Session:", event.session_id);
}
}
}
Streaming vs. non-streaming
Both modes use the same endpoint and return the same data. With streaming, sources arrive immediately before generation starts, and tokens appear as they are generated. Without streaming, you receive a single JSON response only after generation completes.
The done event's full_text field is identical to the response field in a non-streaming query response.