What you’ll build
Instead of waiting for the full AI response, stream it token-by-token for a ChatGPT-like experience.How streaming works
When you setstream: true on the messages endpoint, the API returns a Server-Sent Events (SSE) stream instead of a JSON response.
The stream emits these events:
| Event | Description |
|---|---|
message.delta | A chunk of the AI’s response (partial text) |
message.complete | The full message with usage stats |
error | Something went wrong |
TypeScript example
stream.ts
Python example
stream.py
React hook example
Build a streaming chat component in React:useStreamingChat.ts
Tips
When should I use streaming?
When should I use streaming?
Use streaming for user-facing chat interfaces where perceived speed matters. Use non-streaming (
stream: false) for automated pipelines where you need the complete response before proceeding.How do I handle errors in a stream?
How do I handle errors in a stream?
Listen for the
error event in the SSE stream. If the connection drops unexpectedly, retry the request. The message won’t be duplicated because the failed message isn’t saved.What's the latency like?
What's the latency like?
First token typically arrives in 300-800ms depending on the model. Total streaming time depends on response length.