How to Connect Any AI Model to Your Chatbot

We are speedrunning AI development. This week alone, Claude 3.7 and GPT-4.5 were released. Before that, Deepseek R1, Deep Research, and Grok 3 were released.

This speed makes it almost impossible for developers to keep up. No sooner have you implemented an OS Deepseek model into your chatbot than the newest OpenAI/Mistral/Llama/Anthropic model comes out, and you are already behind the curve. You have to rip and replace all your code for the latest option.

There’s a different way. Instead of building your code around a specific LLM SDK, you can build LLM-agnostic code and plug and play with other models as they are released. One day, you might use OpenAI; the next, an OS model or even give users the choice.

That is the pattern used with the Stream chatbot UI integration. Here’s how it works.

LLMs Are Chatbots

This seems like an obvious observation, but it's important to lay it out if you are building a chatbot. The way LLMs work is by being chatbots. Therefore, you can quickly build a basic chatbot using the generic API code. For instance, here is the basic chat completion call for the OpenAI API. This pattern represents the most straightforward possible integration with an LLM:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import OpenAI from "openai";
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        {
            role: "user",
            content: "Write a haiku about recursion in programming.",
        },
    ],
    store: true,
});
console.log(completion.choices[0].message);

If you have played around with the OpenAI API, then you have definitely used this–it is from the OpenAI quickstart.

It works. If you change the "Write a haiku about recursion in programming." to take input from your UI, you instantly have a fully functional chatbot. This is great for building simple prototypes or scripts and learning how the API works, but it quickly becomes unwieldy when building a production-grade chatbot.
First, conversation history management. The basic pattern requires you to manually construct and track conversation history:

javascript

1
2
3
4
5
6
7
8
9
// You'll need code like this for every message
previousMessages.push({ role: "user", content: userInput });
previousMessages.push({ role: "assistant", content: lastResponse });

// And then pass the growing history with each request
const completion = await openai.chat.completions.create({
    messages: previousMessages,
    // other params...
});

As conversations grow longer, you'll need to handle:

Token limits (most models have context windows of 8K-128K tokens)
Conversation summarization or truncation
Persistence across sessions
User-specific conversation histories

Second, error handling and resilience. Production systems need robust error handling and retry mechanisms. You need to deal with API rate limiting and quotas, network failures and timeouts, service outages, and invalid responses:

javascript

1
2
3
4
5
6
7
8
9
10
11
try {
  const completion = await openai.chat.completions.create({/*...*/});
} catch (error) {
  if (error.status === 429) {
    // Rate limited - implement exponential backoff
  } else if (error.status >= 500) {
    // Server error - retry with fallback options
  } else {
    // Handle other errors appropriately
  }
}

Third is the user experience. During generation, this basic pattern is blocking. Users see nothing until the entire response is generated. For a responsive application, you need streaming responses, typing indicators, progress updates, and cancellation options:

javascript

1
2
3
4
5
6
7
8
9
10
11
// This becomes complex quickly
const stream = await openai.chat.completions.create({
  stream: true,
  // other params...
});

for await (const chunk of stream) {
  // Update UI incrementally
  // Handle pauses/cancellations
  // Manage partial message formatting
}

As you can see, the simple pattern has expanded dramatically to handle the real-world requirements of production chatbots. This is why a robust, modular architecture becomes essential as your application scales.

An LLM-Agnostic Chatbot Architecture

Modularity, with the ability to add fallbacks and switch between providers, is essential for building resilient production chatbots. You want:

Separation of concerns–Isolate different responsibilities into distinct components
Provider abstraction–Hide provider-specific details behind common interfaces
Configuration flexibility–Support different models, parameters, and options

This is the pattern Stream uses with its LLM assistants. Instead of hardcoding each LLM SDK, we abstracted the code to provide users with an LLM-agnostic chatbot architecture. Let’s review the key components.

1. Agent Interface Abstraction

The AIAgent interface standardizes core functionality across LLM providers:

javascript

1
2
3
4
5
6
7
8
export interface AIAgent {
  init(): Promise<void>;
  dispose(): Promise<void>;
  getLastInteraction(): number;

  chatClient: StreamChat;
  channel: Channel;
}

This interface ensures all LLM implementations provide consistent lifecycle management and message handling capabilities, regardless of the underlying provider. It defines a minimal contract with essential lifecycle methods and required properties. The init() method handles initial setup and authentication, dispose() properly cleans up resources, and getLastInteraction() supports inactivity tracking for resource management.

2. Response Handler Pattern

Each LLM has a dedicated handler class that processes streaming responses:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Base abstract pattern for handlers
export abstract class BaseResponseHandler {
  protected message_text = '';
  protected chunk_counter = 0;

  constructor(
    protected readonly chatClient: StreamChat,
    protected readonly channel: Channel,
    protected readonly message: MessageResponse
  ) {
    // Common setup for all handlers
    this.chatClient.on('ai_indicator.stop', this.handleStopGenerating);
  }

  abstract run(): Promise<void>;
  abstract dispose(): void;
  protected abstract handleStopGenerating(): Promise<void>;
}

// Provider-specific implementations
export class AnthropicResponseHandler extends BaseResponseHandler {
  constructor(
    private readonly anthropicStream: Stream<RawMessageStreamEvent>,
    chatClient: StreamChat,
    channel: Channel,
    message: MessageResponse
  ) {
    super(chatClient, channel, message);
  }

  // Anthropic-specific implementation
}

export class OpenAIResponseHandler extends BaseResponseHandler {
  constructor(
    private readonly openaiStream: AssistantStream,
    // Other dependencies
  ) {
    super(chatClient, channel, message);
  }

  // OpenAI-specific implementation
}

This pattern allows consistent response handling while accommodating each provider's unique streaming formats and event types. For instance, the abstract BaseResponseHandler provides a template for processing different streaming formats while standardizing message accumulation.

Anthropic's Claude models emit text deltas with specific event types like content_block_delta while OpenAI uses a different event structure with message.delta. These handler implementations transform provider-specific events into standardized chat updates.

3. Factory Pattern

The createAgent function instantiates the appropriate agent based on the platform:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
export const createAgent = async (
  user_id: string,
  platform: AgentPlatform,
  channel_type: string,
  channel_id: string,
): Promise<AIAgent> => {
  const client = new StreamChat(apiKey, { allowServerSideConnect: true });
  const token = serverClient.createToken(user_id);
  await client.connectUser({ id: user_id }, token);

  const channel = client.channel(channel_type, channel_id);
  await channel.watch();

  if (platform === AgentPlatform.OPENAI) {
    return new OpenAIAgent(client, channel);
  }
  return new AnthropicAgent(client, channel);
};

Integrate LLMs fast! Our UI components are perfect for any AI chatbot interface right out of the box. Try them today and launch tomorrow!

The factory encapsulates the connection setup, authentication, and agent instantiation logic behind a single function call. It uses dependency injection to provide the chat client and channel to the agent, allowing for flexible configuration while maintaining a consistent initialization process.

4. Event-Driven Communication

Once the LLM is initialized, then we can use an event system to communicate with the chat client:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
// In agent implementation
this.chatClient.on('message.new', this.handleMessage);

// In response handler
this.chatClient.on('ai_indicator.stop', this.handleStopGenerating);

// Sending events to indicate state
await this.channel.sendEvent({
  type: 'ai_indicator.update',
  ai_state: 'AI_STATE_THINKING',
  message_id: channelMessage.id,
});

This event system uses a publisher-subscriber pattern, where agents register callbacks for specific events, such as new messages. The state indicators (AI_STATE_THINKING and AI_STATE_GENERATING) provide real-time feedback to users, making the chatbot interaction more responsive despite the asynchronous nature of LLM API calls.

5. Lifecycle Management

Implementations can then share similar initialization, message handling, and disposal patterns:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Initialization
async init() {
  // Set up API client with credentials
  // Register event listeners
  this.chatClient.on('message.new', this.handleMessage);
}

// Disposal
async dispose() {
  // Remove event listeners
  this.chatClient.off('message.new', this.handleMessage);
  // Clean up connections
  await this.chatClient.disconnectUser();
  // Clean up handlers
  this.handlers.forEach((handler) => handler.dispose());
}

The consistent lifecycle methods ensure that all resources are properly managed throughout the agent's lifespan. Event listeners are registered during initialization and explicitly removed during disposal to prevent memory leaks, while the handlers array tracks all active response handlers to ensure complete cleanup.

Provider-Specific Implementations

With the handlers created, the LLM-specific code can be implemented separately. For instance, this Anthropic agent handles message streaming and processing:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
export class AnthropicAgent implements AIAgent {
  private anthropic?: Anthropic;
  private handlers: AnthropicResponseHandler[] = [];
  private lastInteractionTs = Date.now();

  constructor(
    readonly chatClient: StreamChat,
    readonly channel: Channel,
  ) {}

  async init() {
    // Initialize Anthropic client
    this.anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
    // Set up message handler
    this.chatClient.on('message.new', this.handleMessage);
  }

  private handleMessage = async (e: Event<DefaultGenerics>) => {
    // Process incoming messages
    // Create message stream
    const anthropicStream = await this.anthropic.messages.create({
      max_tokens: 1024,
      messages: this.getFormattedMessages(),
      model: 'claude-3-5-sonnet-20241022',
      stream: true,
    });

    // Set up response handler
    const handler = new AnthropicResponseHandler(
      anthropicStream,
      this.chatClient,
      this.channel,
      channelMessage,
    );
    void handler.run();
    this.handlers.push(handler);
  };
}

The Anthropic implementation initializes a Claude client and establishes event listeners for incoming messages. When a message arrives, it formats the conversation history according to Anthropic's expected structure, creates a streaming connection, and delegates the stream processing to a specialized handler that incrementally updates the UI as text is generated.

The OpenAI agent similarly handles its specific stream format:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
export class OpenAIAgent implements AIAgent {
  private openai?: OpenAI;
  private assistant?: OpenAI.Beta.Assistants.Assistant;
  private openAiThread?: OpenAI.Beta.Threads.Thread;
  private lastInteractionTs = Date.now();

  private handlers: OpenAIResponseHandler[] = [];

  constructor(
    readonly chatClient: StreamChat,
    readonly channel: Channel,
  ) {}

  async init() {
    // Initialize OpenAI client and assistant
    this.openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    this.assistant = await this.openai.beta.assistants.create({
      name: 'Stream AI Assistant',
      instructions: 'You are an AI assistant. Help users with their questions.',
      model: 'gpt-4o',
    });
    this.openAiThread = await this.openai.beta.threads.create();

    // Set up message handler
    this.chatClient.on('message.new', this.handleMessage);
  }

  private handleMessage = async (e: Event<DefaultGenerics>) => {
    // Process incoming messages
    // Create run stream
    const run = this.openai.beta.threads.runs.stream(this.openAiThread.id, {
      assistant_id: this.assistant.id,
    });

    // Set up response handler
    const handler = new OpenAIResponseHandler(
      this.openai,
      this.openAiThread,
      run,
      this.chatClient,
      this.channel,
      channelMessage,
    );
    void handler.run();
    this.handlers.push(handler);
  };
}

The OpenAI implementation is more complex as it utilizes the Assistants API, which requires creating both an assistant and a thread. It maintains state across multiple messages using OpenAI's thread architecture, which differs from Anthropic's stateless message API. The handler processes a different event model with events like thread.message.delta that requires specialized parsing logic.

Implementing Your Own AI Chatbot

There are a few important implementation details to consider when you are building an AI chatbot. Perhaps the most important is message normalization. Different providers use different message formats, and you need to normalize these for consistency:

javascript

1
2
3
4
5
6
7
8
9
private getFormattedMessages(): MessageParam[] {
  return this.channel.state.messages
    .slice(-5)
    .filter((msg) => msg.text && msg.text.trim() !== '')
    .map((message) => ({
      role: message.user?.id.startsWith('ai-bot') ? 'assistant' : 'user',
      content: message.text || '',
    }));
}

This function extracts the most recent messages from the channel state, filters out empty messages, and transforms them into a standardized format with correct role assignments. The slice(-5) limits context to the five most recent messages to prevent token limit issues while still maintaining sufficient conversation context.

You also want to implement consistent error handling across providers:

javascript

1
2
3
4
5
6
7
8
9
10
try {
  // Provider-specific code
} catch (error) {
  console.error('Error handling message stream event', error);
  await this.channel.sendEvent({
    type: 'ai_indicator.update',
    ai_state: 'AI_STATE_ERROR',
    message_id: this.message.id,
  });
}

This try-catch pattern provides centralized error handling, transforming API-specific errors into user-friendly status indicators. The AI_STATE_ERROR event triggers UI updates that inform users of issues while preventing the interface from appearing frozen or unresponsive during API failures.

Another important consideration is ensuring smooth UI updates during streaming:

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Anthropic handler
if (
  this.chunk_counter % 20 === 0 ||
  (this.chunk_counter < 8 && this.chunk_counter % 2 !== 0)
) {
  await this.chatClient.partialUpdateMessage(this.message.id, {
    set: { text: this.message_text, generating: true },
  });
}

// OpenAI handler
if (
  this.chunk_counter % 15 === 0 ||
  (this.chunk_counter < 8 && this.chunk_counter % 2 === 0)
) {
  const text = this.message_text;
  await this.chatClient.partialUpdateMessage(id, {
    set: { text, generating: true },
  });
}

These conditional update patterns balance UI responsiveness with performance considerations. Early in the response (when chunk_counter < 8), updates are more frequent to provide immediate feedback, while later updates occur at fixed intervals to reduce unnecessary network traffic and DOM updates. The slight differences between implementations account for provider-specific streaming behavior.

You can then easily extend this architecture to add support for new LLM providers:

Create a new agent class implementing the AIAgent interface
Create a corresponding response handler for the provider's streaming format
Update the AgentPlatform enum and createAgent factory function
Implement provider-specific message formatting and error handling

This architecture provides a solid foundation for building LLM-agnostic chatbots that can leverage multiple providers while maintaining a consistent interface and user experience.

The LLM-Agnostic Chatbot Advantage

We're in an era of LLM whiplash. As new models with better capabilities and performance profiles drop weekly, pinning your application to a single provider is increasingly risky. The key patterns here—interface abstraction, response handlers, factory creation, event-driven communication, and proper lifecycle management—create a robust foundation to weather the constant shifts in the AI landscape.

But production chatbots need more than LLM integration. They also need robust conversation management, graceful error handling, responsive UI updates, and the ability to swap providers when necessary.

Building these patterns into your architecture from the beginning will pay dividends as your application scales. Stream will provide all this, plus this LLM-agnostic pattern. The SDK handles the complex provider integrations, streaming responses, and fallback logic, so you can focus on creating a great user experience rather than wrestling with the peculiarities of each LLM provider's API.

Whether you build it yourself or leverage an existing solution, an LLM-agnostic architecture is no longer a nice-to-have—it's essential for future-proofing your AI applications in this rapidly evolving landscape.