We are speedrunning AI development. This week alone, Claude 3.7 and GPT-4.5 were released. Before that, Deepseek R1, Deep Research, and Grok 3 were released.
This speed makes it almost impossible for developers to keep up. No sooner have you implemented an OS Deepseek model into your chatbot than the newest OpenAI/Mistral/Llama/Anthropic model comes out, and you are already behind the curve. You have to rip and replace all your code for the latest option.
There’s a different way. Instead of building your code around a specific LLM SDK, you can build LLM-agnostic code and plug and play with other models as they are released. One day, you might use OpenAI; the next, an OS model or even give users the choice.
That is the pattern used with the Stream chatbot UI integration. Here’s how it works.
LLMs Are Chatbots
This seems like an obvious observation, but it's important to lay it out if you are building a chatbot. The way LLMs work is by being chatbots. Therefore, you can quickly build a basic chatbot using the generic API code. For instance, here is the basic chat completion call for the OpenAI API. This pattern represents the most straightforward possible integration with an LLM:
1234567891011121314import OpenAI from "openai"; const openai = new OpenAI(); const completion = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Write a haiku about recursion in programming.", }, ], store: true, }); console.log(completion.choices[0].message);
If you have played around with the OpenAI API, then you have definitely used this–it is from the OpenAI quickstart.
It works. If you change the "Write a haiku about recursion in programming."
to take input from your UI, you instantly have a fully functional chatbot. This is great for building simple prototypes or scripts and learning how the API works, but it quickly becomes unwieldy when building a production-grade chatbot.
First, conversation history management. The basic pattern requires you to manually construct and track conversation history:
123456789// You'll need code like this for every message previousMessages.push({ role: "user", content: userInput }); previousMessages.push({ role: "assistant", content: lastResponse }); // And then pass the growing history with each request const completion = await openai.chat.completions.create({ messages: previousMessages, // other params... });
As conversations grow longer, you'll need to handle:
- Token limits (most models have context windows of 8K-128K tokens)
- Conversation summarization or truncation
- Persistence across sessions
- User-specific conversation histories
Second, error handling and resilience. Production systems need robust error handling and retry mechanisms. You need to deal with API rate limiting and quotas, network failures and timeouts, service outages, and invalid responses:
1234567891011try { const completion = await openai.chat.completions.create({/*...*/}); } catch (error) { if (error.status === 429) { // Rate limited - implement exponential backoff } else if (error.status >= 500) { // Server error - retry with fallback options } else { // Handle other errors appropriately } }
Third is the user experience. During generation, this basic pattern is blocking. Users see nothing until the entire response is generated. For a responsive application, you need streaming responses, typing indicators, progress updates, and cancellation options:
1234567891011// This becomes complex quickly const stream = await openai.chat.completions.create({ stream: true, // other params... }); for await (const chunk of stream) { // Update UI incrementally // Handle pauses/cancellations // Manage partial message formatting }
As you can see, the simple pattern has expanded dramatically to handle the real-world requirements of production chatbots. This is why a robust, modular architecture becomes essential as your application scales.
An LLM-Agnostic Chatbot Architecture
Modularity, with the ability to add fallbacks and switch between providers, is essential for building resilient production chatbots. You want:
- Separation of concerns–Isolate different responsibilities into distinct components
- Provider abstraction–Hide provider-specific details behind common interfaces
- Configuration flexibility–Support different models, parameters, and options
This is the pattern Stream uses with its LLM assistants. Instead of hardcoding each LLM SDK, we abstracted the code to provide users with an LLM-agnostic chatbot architecture. Let’s review the key components.
1. Agent Interface Abstraction
The AIAgent interface standardizes core functionality across LLM providers:
12345678export interface AIAgent { init(): Promise<void>; dispose(): Promise<void>; getLastInteraction(): number; chatClient: StreamChat; channel: Channel; }
This interface ensures all LLM implementations provide consistent lifecycle management and message handling capabilities, regardless of the underlying provider. It defines a minimal contract with essential lifecycle methods and required properties. The init()
method handles initial setup and authentication, dispose()
properly cleans up resources, and getLastInteraction()
supports inactivity tracking for resource management.
2. Response Handler Pattern
Each LLM has a dedicated handler class that processes streaming responses:
12345678910111213141516171819202122232425262728293031323334353637383940414243// Base abstract pattern for handlers export abstract class BaseResponseHandler { protected message_text = ''; protected chunk_counter = 0; constructor( protected readonly chatClient: StreamChat, protected readonly channel: Channel, protected readonly message: MessageResponse ) { // Common setup for all handlers this.chatClient.on('ai_indicator.stop', this.handleStopGenerating); } abstract run(): Promise<void>; abstract dispose(): void; protected abstract handleStopGenerating(): Promise<void>; } // Provider-specific implementations export class AnthropicResponseHandler extends BaseResponseHandler { constructor( private readonly anthropicStream: Stream<RawMessageStreamEvent>, chatClient: StreamChat, channel: Channel, message: MessageResponse ) { super(chatClient, channel, message); } // Anthropic-specific implementation } export class OpenAIResponseHandler extends BaseResponseHandler { constructor( private readonly openaiStream: AssistantStream, // Other dependencies ) { super(chatClient, channel, message); } // OpenAI-specific implementation }
This pattern allows consistent response handling while accommodating each provider's unique streaming formats and event types. For instance, the abstract BaseResponseHandler
provides a template for processing different streaming formats while standardizing message accumulation.
Anthropic's Claude models emit text deltas with specific event types like content_block_delta
while OpenAI uses a different event structure with message.delta
. These handler implementations transform provider-specific events into standardized chat updates.
3. Factory Pattern
The createAgent function instantiates the appropriate agent based on the platform:
123456789101112131415161718export const createAgent = async ( user_id: string, platform: AgentPlatform, channel_type: string, channel_id: string, ): Promise<AIAgent> => { const client = new StreamChat(apiKey, { allowServerSideConnect: true }); const token = serverClient.createToken(user_id); await client.connectUser({ id: user_id }, token); const channel = client.channel(channel_type, channel_id); await channel.watch(); if (platform === AgentPlatform.OPENAI) { return new OpenAIAgent(client, channel); } return new AnthropicAgent(client, channel); };
The factory encapsulates the connection setup, authentication, and agent instantiation logic behind a single function call. It uses dependency injection to provide the chat client and channel to the agent, allowing for flexible configuration while maintaining a consistent initialization process.
4. Event-Driven Communication
Once the LLM is initialized, then we can use an event system to communicate with the chat client:
123456789101112// In agent implementation this.chatClient.on('message.new', this.handleMessage); // In response handler this.chatClient.on('ai_indicator.stop', this.handleStopGenerating); // Sending events to indicate state await this.channel.sendEvent({ type: 'ai_indicator.update', ai_state: 'AI_STATE_THINKING', message_id: channelMessage.id, });
This event system uses a publisher-subscriber pattern, where agents register callbacks for specific events, such as new messages. The state indicators (AI_STATE_THINKING
and AI_STATE_GENERATING
) provide real-time feedback to users, making the chatbot interaction more responsive despite the asynchronous nature of LLM API calls.
5. Lifecycle Management
Implementations can then share similar initialization, message handling, and disposal patterns:
12345678910111213141516// Initialization async init() { // Set up API client with credentials // Register event listeners this.chatClient.on('message.new', this.handleMessage); } // Disposal async dispose() { // Remove event listeners this.chatClient.off('message.new', this.handleMessage); // Clean up connections await this.chatClient.disconnectUser(); // Clean up handlers this.handlers.forEach((handler) => handler.dispose()); }
The consistent lifecycle methods ensure that all resources are properly managed throughout the agent's lifespan. Event listeners are registered during initialization and explicitly removed during disposal to prevent memory leaks, while the handlers
array tracks all active response handlers to ensure complete cleanup.
Provider-Specific Implementations
With the handlers created, the LLM-specific code can be implemented separately. For instance, this Anthropic agent handles message streaming and processing:
1234567891011121314151617181920212223242526272829303132333435363738export class AnthropicAgent implements AIAgent { private anthropic?: Anthropic; private handlers: AnthropicResponseHandler[] = []; private lastInteractionTs = Date.now(); constructor( readonly chatClient: StreamChat, readonly channel: Channel, ) {} async init() { // Initialize Anthropic client this.anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); // Set up message handler this.chatClient.on('message.new', this.handleMessage); } private handleMessage = async (e: Event<DefaultGenerics>) => { // Process incoming messages // Create message stream const anthropicStream = await this.anthropic.messages.create({ max_tokens: 1024, messages: this.getFormattedMessages(), model: 'claude-3-5-sonnet-20241022', stream: true, }); // Set up response handler const handler = new AnthropicResponseHandler( anthropicStream, this.chatClient, this.channel, channelMessage, ); void handler.run(); this.handlers.push(handler); }; }
The Anthropic implementation initializes a Claude client and establishes event listeners for incoming messages. When a message arrives, it formats the conversation history according to Anthropic's expected structure, creates a streaming connection, and delegates the stream processing to a specialized handler that incrementally updates the UI as text is generated.
The OpenAI agent similarly handles its specific stream format:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647export class OpenAIAgent implements AIAgent { private openai?: OpenAI; private assistant?: OpenAI.Beta.Assistants.Assistant; private openAiThread?: OpenAI.Beta.Threads.Thread; private lastInteractionTs = Date.now(); private handlers: OpenAIResponseHandler[] = []; constructor( readonly chatClient: StreamChat, readonly channel: Channel, ) {} async init() { // Initialize OpenAI client and assistant this.openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); this.assistant = await this.openai.beta.assistants.create({ name: 'Stream AI Assistant', instructions: 'You are an AI assistant. Help users with their questions.', model: 'gpt-4o', }); this.openAiThread = await this.openai.beta.threads.create(); // Set up message handler this.chatClient.on('message.new', this.handleMessage); } private handleMessage = async (e: Event<DefaultGenerics>) => { // Process incoming messages // Create run stream const run = this.openai.beta.threads.runs.stream(this.openAiThread.id, { assistant_id: this.assistant.id, }); // Set up response handler const handler = new OpenAIResponseHandler( this.openai, this.openAiThread, run, this.chatClient, this.channel, channelMessage, ); void handler.run(); this.handlers.push(handler); }; }
The OpenAI implementation is more complex as it utilizes the Assistants API, which requires creating both an assistant and a thread. It maintains state across multiple messages using OpenAI's thread architecture, which differs from Anthropic's stateless message API. The handler processes a different event model with events like thread.message.delta
that requires specialized parsing logic.
Implementing Your Own AI Chatbot
There are a few important implementation details to consider when you are building an AI chatbot. Perhaps the most important is message normalization. Different providers use different message formats, and you need to normalize these for consistency:
123456789private getFormattedMessages(): MessageParam[] { return this.channel.state.messages .slice(-5) .filter((msg) => msg.text && msg.text.trim() !== '') .map((message) => ({ role: message.user?.id.startsWith('ai-bot') ? 'assistant' : 'user', content: message.text || '', })); }
This function extracts the most recent messages from the channel state, filters out empty messages, and transforms them into a standardized format with correct role assignments. The slice(-5)
limits context to the five most recent messages to prevent token limit issues while still maintaining sufficient conversation context.
You also want to implement consistent error handling across providers:
12345678910try { // Provider-specific code } catch (error) { console.error('Error handling message stream event', error); await this.channel.sendEvent({ type: 'ai_indicator.update', ai_state: 'AI_STATE_ERROR', message_id: this.message.id, }); }
This try-catch pattern provides centralized error handling, transforming API-specific errors into user-friendly status indicators. The AI_STATE_ERROR
event triggers UI updates that inform users of issues while preventing the interface from appearing frozen or unresponsive during API failures.
Another important consideration is ensuring smooth UI updates during streaming:
1234567891011121314151617181920// Anthropic handler if ( this.chunk_counter % 20 === 0 || (this.chunk_counter < 8 && this.chunk_counter % 2 !== 0) ) { await this.chatClient.partialUpdateMessage(this.message.id, { set: { text: this.message_text, generating: true }, }); } // OpenAI handler if ( this.chunk_counter % 15 === 0 || (this.chunk_counter < 8 && this.chunk_counter % 2 === 0) ) { const text = this.message_text; await this.chatClient.partialUpdateMessage(id, { set: { text, generating: true }, }); }
These conditional update patterns balance UI responsiveness with performance considerations. Early in the response (when chunk_counter < 8
), updates are more frequent to provide immediate feedback, while later updates occur at fixed intervals to reduce unnecessary network traffic and DOM updates. The slight differences between implementations account for provider-specific streaming behavior.
You can then easily extend this architecture to add support for new LLM providers:
- Create a new agent class implementing the AIAgent interface
- Create a corresponding response handler for the provider's streaming format
- Update the AgentPlatform enum and createAgent factory function
- Implement provider-specific message formatting and error handling
This architecture provides a solid foundation for building LLM-agnostic chatbots that can leverage multiple providers while maintaining a consistent interface and user experience.
The LLM-Agnostic Chatbot Advantage
We're in an era of LLM whiplash. As new models with better capabilities and performance profiles drop weekly, pinning your application to a single provider is increasingly risky. The key patterns here—interface abstraction, response handlers, factory creation, event-driven communication, and proper lifecycle management—create a robust foundation to weather the constant shifts in the AI landscape.
But production chatbots need more than LLM integration. They also need robust conversation management, graceful error handling, responsive UI updates, and the ability to swap providers when necessary.
Building these patterns into your architecture from the beginning will pay dividends as your application scales. Stream will provide all this, plus this LLM-agnostic pattern. The SDK handles the complex provider integrations, streaming responses, and fallback logic, so you can focus on creating a great user experience rather than wrestling with the peculiarities of each LLM provider's API.
Whether you build it yourself or leverage an existing solution, an LLM-agnostic architecture is no longer a nice-to-have—it's essential for future-proofing your AI applications in this rapidly evolving landscape.