Multi-Model AI Chat: How to Switch Between Different LLMs in Your Stream App

New
29 min read
Raymond F
Raymond F
Published April 11, 2025

An interesting quirk of large language models (LLMs) is that they aren’t all the same. ChatGPT tends to be better for analysis, but in the words of Paul Graham, “writes like a kid doing an assignment.” Claude is a much better writer but loves a little bit of hallucination. All other models have their strong points and idiosyncrasies.

Therefore, offering users the flexibility to choose between different language models can significantly enhance your application's appeal and versatility. By integrating multiple models into your chat application, you can allow users to select the AI that best suits their specific needs and preferences.

That’s what we’re going to build today: a complete multi-model AI chat system using Stream's Chat API.

Our implementation will support switching between:

  • Anthropic's Claude - Known for its helpfulness, harmlessness, and writing
  • OpenAI's GPT - Powerful general-purpose model with function-calling capabilities
  • Meta's Llama - An open-source alternative that can be self-hosted

By the end of this tutorial, you'll have a system that allows users to switch between AI models in real time during a conversation while maintaining a consistent user experience.

All of the code for this tutorial can be found in this repo.

Create a Stream Account

To get started, you'll need a Stream account and API credentials. Head over to Stream's signup page to create your free account.

Once you've created your account, follow these steps to set up your project:

  1. Log into the Stream Dashboard
  2. Click the "Create App" button in the top right corner
  3. Give your app a name (e.g., "Model Switcher Demo")
  4. Choose "Development" mode - this provides free API calls for testing
  5. Click "Create App" to generate your project

After creating your app, you'll land on the app dashboard, where you can find your API credentials:

  • The Stream API Key - Used to initialize the Stream client
  • The API Secret - Required for backend token generation

Keep these credentials handy, as you'll need them throughout this tutorial. The API Secret should be kept secure and never exposed in your frontend code.

To use Stream, we’ll need the Stream Chat SDK:

npm install stream-chat

Create Your AI Accounts

You’ll also need access to OpenAI, Anthropic, and Llama APIs.

OpenAI

To use OpenAI's models in your application, you'll need to set up an API key:

  1. Visit OpenAI's website and create an account if you don't already have one
  2. Navigate to the API section and sign up for API access
  3. Go to your API keys page
  4. Click "Create new secret key" and give it a name related to your project
  5. Copy your API key and store it securely - you won't be able to view it again

Once you have your API key, you'll need to install the OpenAI client library:

npm install openai

Anthropic

To access Anthropic's Claude models:

  1. Go to Anthropic's website
  2. Click "developer console" and create an account
  3. Once approved, navigate to your API keys
  4. Generate a new API key and copy it to a secure location

Next, install the Anthropic JavaScript SDK:

npm install @anthropic-ai/sdk

Ollama (for Llama models)

Ollama is an easy way to run Llama and other open-source models locally:

  1. Download and install Ollama from ollama.ai
  2. Open your terminal and pull the Llama model:
ollama pull llama3

With these three integrations set up, you now have all the necessary APIs configured to build your multi-model chat application using Stream's Chat API. The next section will explore how to create the backend infrastructure to handle model switching and message routing.

Building Our AI Model Switching Server

The backend of our code will be a simple Express server that:

We’ll have agents and handlers for each of our three models. For Anthropic and OpenAI, the agents and handlers are similar to the AI assistants from this repo. Here, we’ll go into detail on the Llama implementation. This follows a similar pattern with some key differences

Implementing the Llama Agent

When building a multi-model AI chat system, the agent is responsible for interacting with each AI model. Here's our implementation for the Llama agent:

Class Definition and Properties

javascript
1
2
3
4
5
6
7
8
9
10
export class LlamaAgent implements AIAgent { private apiEndpoint?: string; private modelName?: string; private handlers: LlamaResponseHandler[] = []; private lastInteractionTs = Date.now(); constructor( readonly chatClient: StreamChat, readonly channel: Channel, ) {}

The class implements the AIAgent interface as the foundation of our Llama agent:

  • apiEndpoint: Stores the URL where Ollama is running (typically localhost)
  • modelName: Tracks which specific Llama model we're using (llama2, llama3, etc.)
  • handlers: Maintains an array of response handlers for managing active streams
  • lastInteractionTs: Records when the agent was last used (for potential timeout/cleanup)

The constructor requires just two key parameters. First, the Stream Chat client for message management. Second, the specific chat channel where messages will be sent.

Resource Management Methods

javascript
1
2
3
4
5
6
dispose = async () => { this.handlers.forEach((handler) => handler.dispose()); this.handlers = []; }; getLastInteraction = (): number => this.lastInteractionTs;

Next, we have a couple of utility methods to handle resource management. dispose() cleans up all active response handlers when switching models or closing the chat, while getLastInteraction() returns a timestamp of the last interaction (useful for implementing idle timeouts).

Initialization Logic

javascript
1
2
3
4
5
6
7
8
9
10
11
init = async () => { const apiEndpoint = process.env.LLAMA_API_ENDPOINT || 'http://localhost:11434'; const modelName = process.env.LLAMA_MODEL_NAME || 'llama2'; if (!apiEndpoint) { throw new Error('Llama API endpoint is required'); } this.apiEndpoint = apiEndpoint; this.modelName = modelName; };

The initialization method reads configuration from environment variables with sensible defaults. LLAMA_API_ENDPOINT defaults to 'http://localhost:11434', Ollama's standard port LLAMA_MODEL_NAME defaults to 'llama2' but can be configured for any model pulled into Ollama.

Unlike cloud-based models, this configuration allows for targeting locally hosted models, giving you complete control over the deployment.

Message Processing

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
handleMessage = async (e: Event<DefaultGenerics>) => { if (!this.apiEndpoint || !this.modelName) { console.error('Llama API endpoint or model name is not initialized'); return; } if (!e.message || e.message.ai_generated) { console.log('Skip handling ai generated message'); return; } const message = e.message.text; if (!message) return; this.lastInteractionTs = Date.now();

This first part of the message handler:

  • Validates that the agent is initialized correctly
  • Checks if the incoming message is valid and not AI-generated (prevents feedback loops)
  • Updates the interaction timestamp for tracking when the agent was last used

Context Building

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Extract the last few messages for context const messages = this.channel.state.messages .slice(-5) .filter((msg) => msg.text && msg.text.trim() !== '') .map((message) => ({ role: message.user?.id.startsWith('ai-bot') ? 'assistant' : 'user', content: message.text || '', })); // Add the current message if it's a reply if (e.message.parent_id !== undefined) { messages.push({ role: 'user', content: message, }); }

This crucial section builds the conversation context that Llama will use to generate its response. We take the last five messages from the channel for conversation history. After filtering out empty messages to avoid noise, we transform Stream's message format into the format expected by Ollama:

  • Messages from users with IDs starting with "ai-bot" are marked as "assistant"
  • All other messages are marked as "user"

We also handle threaded conversations by adding the current message if it's a reply to another message.

Response Placeholder and Indicators

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
// Create a placeholder message while we wait for the response const { message: channelMessage } = await this.channel.sendMessage({ text: '', ai_generated: true, }); try { // Send thinking indicator await this.channel.sendEvent({ type: 'ai_indicator.update', ai_state: 'AI_STATE_THINKING', message_id: channelMessage.id, });

This creates a great user experience by:

  • Creating an empty message as a placeholder for the coming AI response
  • Marking it as ai_generated: true to prevent message handling loops
  • Sending a custom event to display a "thinking" indicator to the user

This provides immediate feedback that the system is processing their request

Stream Initialization and Error Handling

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Start streaming process with Llama const response = await this.startLlamaStream(messages); // Create handler for the response const handler = new LlamaResponseHandler( response, this.chatClient, this.channel, channelMessage, ); void handler.run(); this.handlers.push(handler); } catch (error) { console.error('Error creating Llama stream:', error); // Update the message to show the error await this.chatClient.partialUpdateMessage(channelMessage.id, { set: { text: 'Sorry, I encountered an error while processing your request.', generating: false }, }); // Clear the indicator await this.channel.sendEvent({ type: 'ai_indicator.clear', message_id: channelMessage.id, }); }

Here, we see robust error handling and stream setup. First, we call startLlamaStream to establish the connection to Ollama. Then, we create a specialized handler to process the incoming stream data. The void keyword indicates we're not awaiting the run() method (it runs asynchronously). We also add the handler to our tracking array for proper cleanup later.

If anything goes wrong, our comprehensive error handling kicks in:

  • Updates the placeholder message with an error notice
  • Clears the "thinking" indicator
  • Logs the error for debugging

API Connection to Ollama

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private startLlamaStream = async (messages: any[]) => { try { // Make the API call to Ollama const response = await axios.post( `${this.apiEndpoint}/api/chat`, { model: this.modelName, messages: messages, stream: true }, { responseType: 'stream' } ); return response.data; } catch (error) { console.error('Error calling Ollama API:', error); throw error; } };

This method handles the direct API interaction with Ollama. We use Axios to make a POST request to Ollama's chat endpoint, passing the conversation messages in the format Ollama expects. This is one of the major differences between the Llama implementation and the OpenAI/Anthropic agents. For those, we are using the SDKs to connect. Here, we are just using a regular POST command. This gives you more leeway to plug in the different models you might want to use.

By setting stream: true, we enable progressive response streaming. We configure Axios to handle the response as a stream and return the raw stream data to be processed by the handler.

Implementing the Llama Handler

Now let's look at the companion to our agent - the LlamaResponseHandler class that processes the streaming responses from Ollama:

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// LlamaResponseHandler.ts import type { Channel, MessageResponse, StreamChat } from 'stream-chat'; import { AIResponseHandler } from '../AIResponseHandler'; export class LlamaResponseHandler implements AIResponseHandler { private message_text = ''; private chunk_counter = 0; private controller = new AbortController(); constructor( private readonly llama_stream: NodeJS.ReadableStream, private readonly chatClient: StreamChat, private readonly channel: Channel, private readonly message: MessageResponse, ) { this.chatClient.on('ai_indicator.stop', this.handleStopGenerating); }

The handler class implements the AIResponseHandler interface, maintaining consistency across different model implementations. We track a few key pieces of state:

  • message_text: Accumulates the generated text as it streams in
  • chunk_counter: Counts how many chunks we've processed (for throttling updates)
  • controller: An AbortController to allow stopping generation mid-stream

The constructor takes four essential parameters to process the response. We also set up an event listener to handle stop requests from the user.

Stream Processing Method

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
run = async () => { try { // Process the stream for await (const chunk of this.llama_stream) { // Convert the chunk to text const chunkText = this.parseChunk(chunk); if (chunkText) { this.message_text += chunkText; this.chunk_counter++; // Update the message periodically // More frequent updates at the beginning for responsiveness if ( this.chunk_counter % 15 === 0 || (this.chunk_counter < 8 && this.chunk_counter % 3 === 0) ) { try { await this.chatClient.partialUpdateMessage(this.message.id, { set: { text: this.message_text, generating: true }, }); } catch (error) { console.error('Error updating message', error); } } } }

The run method is the heart of our handler, processing the stream from Ollama. For each chunk in the stream:

  1. We parse the chunk to extract the text using our helper method
  2. Add it to our accumulated message text
  3. Update our chunk counter
  4. Periodically update the UI message with our progress

Notice how we update more frequently at the beginning (every three chunks for the first eight chunks) to give the user immediate feedback, then throttle back to every 15 chunks to reduce API calls. This creates a responsive experience while managing resource usage.

Stream Completion and Error Handling

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Final update when stream is complete await this.chatClient.partialUpdateMessage(this.message.id, { set: { text: this.message_text, generating: false }, }); // Clear the indicator await this.channel.sendEvent({ type: 'ai_indicator.clear', message_id: this.message.id, }); } catch (error) { console.error('Error handling Llama stream', error); // Update with error state await this.channel.sendEvent({ type: 'ai_indicator.update', ai_state: 'AI_STATE_ERROR', message_id: this.message.id, }); // Update the message with error if (this.message_text) { await this.chatClient.partialUpdateMessage(this.message.id, { set: { text: this.message_text + "\n\n[Message generation was interrupted]", generating: false }, }); } else { await this.chatClient.partialUpdateMessage(this.message.id, { set: { text: "I'm sorry, but there was an error generating a response.", generating: false }, }); } } };

Once the stream completes, we make a final update to the message, setting generating:false and clearing the thinking indicator. If an error occurs, we provide appropriate feedback:

  1. We log the error for debugging
  2. Update the indicator to show an error state
  3. If we've already generated some text, we append an interruption notice
  4. If no text was generated, we show a complete error message

This approach ensures the user always gets appropriate feedback regardless of where the generation process fails.

Resource Management and Event Handling

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
dispose = () => { this.chatClient.off('ai_indicator.stop', this.handleStopGenerating); }; private handleStopGenerating = async () => { console.log('Stop generating'); // Abort the request this.controller.abort(); // Update the message state await this.chatClient.partialUpdateMessage(this.message.id, { set: { generating: false }, }); // Clear the indicator await this.channel.sendEvent({ type: 'ai_indicator.clear', message_id: this.message.id, }); };

The dispose method ensures we clean up event listeners when the handler is no longer needed. Our handleStopGenerating method responds to user requests to stop generation by:

  • Aborting the ongoing request via our controller
  • Updating the message to show it's no longer generating
  • Clearing the thinking indicator

This gives users direct control over the conversation and prevents unwanted lengthy outputs.

Parsing Ollama's Response Format

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Helper method to parse streaming chunks from Ollama private parseChunk(chunk: any): string { try { const data = chunk.toString().trim(); // Handle empty chunks if (!data) return ''; // Split by newlines to handle multiple JSON objects const lines: string[] = data.split('\n').filter((line: string) => line.trim()); let content = ''; // Parse each line as a separate JSON object for (const line of lines) { try { const json = JSON.parse(line); // Extract the content from the message if (json.message && json.message.content) { content += json.message.content; } } catch (e) { console.error('Error parsing JSON line:', e); // If we can't parse as JSON, try to extract content directly const match = line.match(/"content":"([^"]+)"/); if (match) { content += match[1]; } } } return content; } catch (error) { console.error('Error parsing chunk', error); return ''; } }

This method handles parsing Ollama's specific response format, which is one of the key differences from other providers. The function:

  1. Converts the chunk to a string and trims whitespace
  2. Handles empty chunks by returning an empty string
  3. Splits by newlines to process multiple JSON objects that might be in a single chunk
  4. Tries to parse each line as JSON and extract the content
  5. If parsing fails, falls back to a regex-based approach to extract content

This robust parsing is necessary because Ollama's streaming format can vary slightly and sometimes contain partial JSON objects or multiple objects in a single chunk.

The LlamaResponseHandler works in tandem with the LlamaAgent to provide a seamless experience for the user, handling all the complexities of working with local LLM streaming.

Creating Our Agents

With the individual agents in place, we need to consider using them within our Stream chat app. First, we need a factory function to create the appropriate agent based on the user's selection. Here's how we implement that:

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// createAgent.ts import { AgentPlatform, AIAgent } from './types'; import { StreamChat } from 'stream-chat'; import { OpenAIAgent } from './openai/OpenAIAgent'; import { AnthropicAgent } from './anthropic/AnthropicAgent'; import { LlamaAgent } from './llama/LlamaAgent'; import { MultiModelAgent } from './MultiModelAgent'; import { apiKey, serverClient } from '../serverClient'; export const createAgent = async ( user_id: string, platform: AgentPlatform, channel_type: string, channel_id: string, ): Promise<AIAgent> => { const client = new StreamChat(apiKey, { allowServerSideConnect: true }); const token = serverClient.createToken(user_id); await client.connectUser({ id: user_id }, token); console.log(`User ${user_id} connected successfully.`); const channel = client.channel(channel_type, channel_id); await channel.watch(); // Check if we're using the multi-model approach or a specific model const useMultiModel = process.env.USE_MULTI_MODEL === 'true'; if (useMultiModel) { return new MultiModelAgent(client, channel, platform); } else { // Use the specified platform switch (platform) { case AgentPlatform.OPENAI: return new OpenAIAgent(client, channel); case AgentPlatform.LLAMA: return new LlamaAgent(client, channel); case AgentPlatform.ANTHROPIC: default: return new AnthropicAgent(client, channel); } } };

This factory function handles all the setup needed to create an agent for a specific chat channel. It takes four parameters: the user ID for the AI bot, the selected model platform, and the channel type and ID where the conversation will take place.

First, the function establishes a new Stream client connection with our API key, allowing server-side connections. It then generates an authentication token for the bot user and connects them to the Stream platform. The function also connects to the specified channel and activates channel watching to receive message events.

The most interesting part is how we handle model selection. We check an environment variable called USE_MULTI_MODEL to determine if we're using our multi-model approach or a single dedicated model. If we're using the multi-model approach, we create a MultiModelAgent that can dynamically switch between models. Otherwise, we use a switch statement to create the appropriate agent based on the specified platform.

By using this factory pattern, our application can easily create the correct type of agent for each conversation while keeping the underlying Stream and AI provider connections encapsulated within the agent itself.

Next, we need our MultiModelAgent to coordinate between the different AI models and handle dynamic switching during conversations. This class is the heart of our multi-model system:

Class Definition and Properties

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
export class MultiModelAgent implements AIAgent { private agents: Map<AgentPlatform, AIAgent> = new Map(); private activeAgent: AgentPlatform; private lastInteractionTs = Date.now(); private handlers: AIResponseHandler[] = []; constructor( readonly chatClient: StreamChat, readonly channel: Channel, initialPlatform: AgentPlatform = AgentPlatform.ANTHROPIC ) { this.activeAgent = initialPlatform; }

The MultiModelAgent implements the same AIAgent interface as our individual model agents, creating a consistent pattern. We maintain several key pieces of state:

  • agents: A map connecting each platform type to its corresponding agent
  • activeAgent: Tracks which AI model is currently responding
  • lastInteractionTs: Records when the last message was processed
  • handlers: Maintains response handlers for proper cleanup

The constructor takes a Stream chat client, channel, and optional initial platform (defaulting to Claude if none specified).

Initialization and Cleanup

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
init = async () => { // Initialize all supported agents this.agents.set(AgentPlatform.ANTHROPIC, new AnthropicAgent(this.chatClient, this.channel)); this.agents.set(AgentPlatform.OPENAI, new OpenAIAgent(this.chatClient, this.channel)); this.agents.set(AgentPlatform.LLAMA, new LlamaAgent(this.chatClient, this.channel)); // Initialize all agents await Promise.all( Array.from(this.agents.values()).map((agent) => agent.init()), ); // Set up event listeners this.chatClient.on('message.new', this.handleMessage); this.chatClient.on('custom_model_switch', this.handleModelSwitch); this.chatClient.on('custom_model_switch_error', this.handleModelSwitchError); }; dispose = async () => { // Clean up event listeners this.chatClient.off('message.new', this.handleMessage); this.chatClient.off('custom_model_switch', this.handleModelSwitch); this.chatClient.off('custom_model_switch_error', this.handleModelSwitchError); // Clean up all agents await Promise.all( Array.from(this.agents.values()).map((agent) => agent.dispose()), ); this.agents.clear(); // Disconnect the chat client await this.chatClient.disconnectUser(); // Dispose all handlers this.handlers.forEach(handler => handler.dispose()); this.handlers = []; }; getLastInteraction = (): number => this.lastInteractionTs;
Integrate LLMs fast! Our UI components are perfect for any AI chatbot interface right out of the box. Try them today and launch tomorrow!

The init method creates and initializes instances of all three agent types upfront. This approach ensures that all models are ready to go when needed, without initialization delay when switching. We also set up event listeners for new messages and model-switching events.

The dispose method provides thorough cleanup by removing event listeners, disposing of all agents, disconnecting the chat client, and cleaning up response handlers. This prevents memory leaks and ensures resources are properly released.

Model Switching Handler

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
private handleModelSwitch = async (e: Event<DefaultGenerics>) => { const platform = (e as any).data?.platform as AgentPlatform; if (!platform) { console.error('No platform specified in model switch event'); return; } if (!this.agents.has(platform)) { console.error(`Agent for platform ${platform} is not initialized`); // Send error event to the channel await this.channel.sendEvent({ type: 'custom_model_switch_error' as any, data: { channel_id: this.channel.cid, error: `AI model ${platform} is not available` } }); return; } const agent = this.agents.get(platform); if (!agent) { await this.channel.sendEvent({ type: 'custom_model_switch_error' as any, data: { channel_id: this.channel.cid, error: `Agent not initialized for platform: ${platform}` } }); return; } this.activeAgent = platform; console.log(`Switched to ${platform} agent`); // Send confirmation event to the channel await this.channel.sendEvent({ type: 'custom_model_switched' as any, data: { channel_id: this.channel.cid, platform } }); };

This method handles requests to switch between AI models. The function:

  1. Extracts the requested platform from the event data
  2. Verifies that the platform exists and has an initialized agent
  3. Updates the activeAgent to the requested platform
  4. Sends a confirmation event to notify all channel participants of the switch

We include error handling throughout the process, with meaningful error messages sent as channel events. This keeps users informed about the success or failure of their model-switching requests.

Error Handling

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
private handleModelSwitchError = async (e: Event<DefaultGenerics>) => { const error = (e as any).data?.error; if (!error) return; console.log('Model switch error:', error); // Try to fall back to a different model const currentPlatform = this.activeAgent; for (const [platform, agent] of this.agents.entries()) { if (platform !== currentPlatform && agent) { console.log(`Attempting to fall back to ${platform} agent`); this.activeAgent = platform; // Send confirmation event await this.channel.sendEvent({ type: 'custom_model_switched' as any, data: { channel_id: this.channel.cid, platform } }); return; } } // If no fallback is available, send an error message await this.channel.sendMessage({ text: "I'm sorry, but all AI models are currently unavailable. Please try again later.", ai_generated: true }); };

The error handler provides graceful fallback when a model switch fails. It:

  1. Attempts to find any alternative model that's not the current one
  2. Switches to the first available alternative
  3. Sends a notification about the fallback
  4. If all models are unavailable, it sends a helpful error message

This robust error handling ensures that conversations can continue even if a particular model becomes unavailable.

Message Processing

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
private handleMessage = async (e: Event<DefaultGenerics>) => { // Update the last interaction timestamp this.lastInteractionTs = Date.now(); // Get the active agent const agent = this.agents.get(this.activeAgent); if (!agent) { console.error(`Active agent ${this.activeAgent} is not initialized`); // Try to fall back to a different agent for (const [platform, fallbackAgent] of this.agents.entries()) { if (fallbackAgent) { this.activeAgent = platform; console.log(`Falling back to ${platform} agent`); // Process the message with the fallback agent await this.processMessageWithAgent(fallbackAgent, e); return; } } // If no fallback is available, send an error message await this.channel.sendMessage({ text: "I'm sorry, but all AI models are currently unavailable. Please try again later.", ai_generated: true }); return; } // Process the message with the active agent await this.processMessageWithAgent(agent, e); }; private processMessageWithAgent = async (agent: AIAgent, e: Event<DefaultGenerics>) => { if (!e.message || e.message.ai_generated) { // Skip AI-generated messages or events without messages return; } const message = e.message.text; if (!message) return; // Let the agent handle the message // Since each agent has its own message handler, // we simply forward the event and let the agent process it if (typeof (agent as any).handleMessage === 'function') { await (agent as any).handleMessage(e); } };

The message handler is the core of our system, routing incoming messages to the appropriate agent:

  1. It first updates the interaction timestamp
  2. Gets the currently active agent
  3. Falls back to any available agent if the active one fails
  4. Forwards the message to the selected agent for processing

The processMessageWithAgent helper method filters out AI-generated messages to prevent feedback loops and ensures the agent has a proper handleMessage method before calling it.

Utility Methods

javascript
1
2
3
4
5
6
7
8
9
// Method to get available models getAvailableModels = (): AgentPlatform[] => { return Array.from(this.agents.keys()); }; // Method to get the currently active model getActiveModel = (): AgentPlatform => { return this.activeAgent; };

These utility methods provide information about available models and the currently active model. They're useful for UI components that need to display which model is active or show available switching options to users.

By implementing the MultiModelAgent class following the same interface as our individual agents, we maintain a consistent architecture while adding the powerful ability to switch between models in real time. This approach gives users the flexibility to choose the right model for their needs at any point in the conversation.

Building the Express Server

Finally, on the backend (don't worry, the client is waaaay simpler), we have the actual Express server that ties everything together. Let's break down how it works:

Server Setup and Initialization

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// index.ts import 'dotenv/config'; import express from 'express'; import cors from 'cors'; import { AIAgent, AgentPlatform, ModelInfo } from './agents/types'; import { createAgent } from './agents/createAgent'; import { apiKey, serverClient } from './serverClient'; import { MultiModelAgent } from './agents/MultiModelAgent'; import { AnthropicAgent } from './agents/anthropic/AnthropicAgent'; import { OpenAIAgent } from './agents/openai/OpenAIAgent'; import { LlamaAgent } from './agents/llama/LlamaAgent'; import { AIModelSwitchEvent } from './types/stream'; const app = express(); app.use(express.json()); app.use(cors({ origin: '*' })); // Map to store the AI Agent instances // [cid: string]: AI Agent const aiAgentCache = new Map<string, AIAgent>(); const pendingAiAgents = new Set<string>();

We start with the standard Express setup, importing our dependencies and configuring middleware. The most important part is the creation of two data structures:

  • aiAgentCache: A map that stores active AI agents indexed by user ID
  • pendingAiAgents: A set that helps prevent duplicate agent creation requests

This caching mechanism is essential for maintaining active connections and preventing resource duplication.

Agent Cleanup

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// TODO: temporary set to 8 hours, should be cleaned up at some point const inactivityThreshold = 480 * 60 * 1000; setInterval(async () => { const now = Date.now(); for (const [userId, aiAgent] of aiAgentCache) { if (now - aiAgent.getLastInteraction() > inactivityThreshold) { console.log(`Disposing AI Agent due to inactivity: ${userId}`); await disposeAiAgent(aiAgent, userId); aiAgentCache.delete(userId); } } }, 5000); app.get('/', (req, res) => { res.json({ message: 'GetStream AI Server is running', apiKey: apiKey, activeAgents: aiAgentCache.size, }); });

We implement an automatic cleanup mechanism that runs every 5 seconds. It checks all agents and disposes of those that have been inactive for more than 8 hours. This prevents resource leaks from abandoned conversations.

We also provide a simple health check endpoint that shows the server is running and reports basic stats about active agents.

Starting an AI Agent

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
app.post('/start-ai-agent', async (req, res) => { const { channel_id, channel_type = 'messaging', platform = AgentPlatform.ANTHROPIC, } = req.body; // Simple validation if (!channel_id) { res.status(400).json({ error: 'Missing required fields' }); return; } let channel_id_updated = channel_id; if (channel_id.includes(':')) { const parts = channel_id.split(':'); if (parts.length > 1) { channel_id_updated = parts[1]; } } const user_id = `ai-bot-${channel_id_updated.replace(/!/g, '')}`; try { if (!aiAgentCache.has(user_id) && !pendingAiAgents.has(user_id)) { pendingAiAgents.add(user_id); await serverClient.upsertUser({ id: user_id, name: 'AI Bot', role: 'admin', }); const channel = serverClient.channel(channel_type, channel_id_updated); try { await channel.addMembers([user_id]); } catch (error) { console.error('Failed to add members to channel', error); } await channel.watch(); const agent = await createAgent( user_id, platform, channel_type, channel_id_updated, ); await agent.init(); if (aiAgentCache.has(user_id)) { await agent.dispose(); } else { aiAgentCache.set(user_id, agent); } } else { console.log(`AI Agent ${user_id} already started`); } res.json({ message: 'AI Agent started', data: [] }); } catch (error) { const errorMessage = (error as Error).message; console.error('Failed to start AI Agent', errorMessage); res .status(500) .json({ error: 'Failed to start AI Agent', reason: errorMessage }); } finally { pendingAiAgents.delete(user_id); } });

This endpoint handles starting a new AI agent for a specific channel. It:

  1. Validates and normalizes the channel ID
  2. Creates a unique bot user ID based on the channel
  3. Checks if an agent already exists to prevent duplicates
  4. Creates a Stream user for the bot and adds it to the channel
  5. Calls our createAgent function to instantiate the appropriate agent
  6. Initializes the agent and adds it to our cache

The pendingAiAgents set is used as a lock mechanism to prevent race conditions when multiple requests try to create the same agent.

Stopping an AI Agent

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
app.post('/stop-ai-agent', async (req, res) => { const { channel_id } = req.body; try { const userId = `ai-bot-${channel_id.replace(/!/g, '')}`; const aiAgent = aiAgentCache.get(userId); if (aiAgent) { await disposeAiAgent(aiAgent, userId); aiAgentCache.delete(userId); } res.json({ message: 'AI Agent stopped', data: [] }); } catch (error) { const errorMessage = (error as Error).message; console.error('Failed to stop AI Agent', errorMessage); res .status(500) .json({ error: 'Failed to stop AI Agent', reason: errorMessage }); } });

This simple endpoint gracefully stops and disposes of an AI agent. It:

  1. Finds the agent by its user ID
  2. Calls our helper function to dispose of it properly
  3. Removes it from the cache

Model Information

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
app.get('/available-models', (req, res) => { const models: ModelInfo[] = [ { id: AgentPlatform.ANTHROPIC, name: 'Claude', description: 'Anthropic\'s Claude, known for helpfulness, harmlessness, and honesty', iconUrl: '/model-icons/claude.png', available: !!process.env.ANTHROPIC_API_KEY }, { id: AgentPlatform.OPENAI, name: 'GPT', description: 'OpenAI\'s GPT model with function calling capabilities', iconUrl: '/model-icons/openai.png', available: !!process.env.OPENAI_API_KEY }, { id: AgentPlatform.LLAMA, name: 'Llama', description: 'Meta\'s open-source Llama model', iconUrl: '/model-icons/meta.png', available: !!process.env.LLAMA_API_ENDPOINT } ]; res.json({ models }); });

This endpoint provides information about all supported models. For each model, we include:

  • id: The internal identifier
  • name: A user-friendly display name
  • description: A brief explanation of the model's capabilities
  • iconUrl: Path to an icon for UI display
  • available: A dynamic flag based on whether the necessary API key is configured

This makes it easy for the frontend to display only the available models.

Model Switching

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
app.post('/switch-model', async (req, res) => { const { channel_id, platform } = req.body; console.log('Switching model', channel_id, platform); if (!channel_id || !platform) { res.status(400).json({ error: 'Missing required fields' }); return; } try { const userId = `ai-bot-${channel_id.replace(/!/g, '')}`; const aiAgent = aiAgentCache.get(userId); console.log('AI agent', aiAgent); if (!aiAgent) { res.status(404).json({ error: 'AI agent not found for this channel' }); return; } // Check if this is a MultiModelAgent if (aiAgent instanceof MultiModelAgent) { // Send the model switch event to the channel await aiAgent.channel.sendEvent({ type: 'custom_model_switch' as any, data: { channel_id: channel_id, platform } }); res.json({ message: 'Model switch initiated', platform }); } else { res.status(400).json({ error: 'Model switching is only available with multi-model agents', hint: 'Set USE_MULTI_MODEL=true in your environment variables' }); } } catch (error) { const errorMessage = (error as Error).message; console.error('Failed to switch model', errorMessage); res .status(500) .json({ error: 'Failed to switch model', reason: errorMessage }); } });

This endpoint is the magic that enables switching between models during a conversation. It:

  1. Finds the agent for the specified channel
  2. Checks that it's a MultiModelAgent (which supports switching)
  3. Sends a custom event to the channel to trigger the model switch

Rather than directly calling a method on the agent, we send an event through the Stream channel. This approach allows the model switch to be visible to all participants in the conversation and ensures the switch happens in the correct sequence relative to other messages.

Getting Active Model Information

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
app.get('/active-model/:channel_id', (req, res) => { const { channel_id } = req.params; if (!channel_id) { res.status(400).json({ error: 'Missing channel_id' }); return; } try { const userId = `ai-bot-${channel_id.replace(/!/g, '')}`; const aiAgent = aiAgentCache.get(userId); if (!aiAgent) { res.status(404).json({ error: 'AI agent not found for this channel' }); return; } // Check if this is a MultiModelAgent if (aiAgent instanceof MultiModelAgent) { const activeModel = aiAgent.getActiveModel(); const availableModels = aiAgent.getAvailableModels(); res.json({ activeModel, availableModels }); } else { // For single-model agents, return the type of agent let activeModel: AgentPlatform; if (aiAgent instanceof AnthropicAgent) { activeModel = AgentPlatform.ANTHROPIC; } else if (aiAgent instanceof OpenAIAgent) { activeModel = AgentPlatform.OPENAI; } else if (aiAgent instanceof LlamaAgent) { activeModel = AgentPlatform.LLAMA; } else { activeModel = AgentPlatform.ANTHROPIC; // Default } res.json({ activeModel, availableModels: [activeModel] }); } } catch (error) { const errorMessage = (error as Error).message; console.error('Failed to get active model', errorMessage); res .status(500) .json({ error: 'Failed to get active model', reason: errorMessage }); } });

This endpoint returns information about the currently active model for a channel. It handles both multi-model and single-model scenarios:

  1. For a MultiModelAgent, it uses the agent's methods to get the active model and available models
  2. For single-model agents, it determines the type based on the agent's class and returns just that model

This allows the frontend to display the current model and available options to the user.

Helper Function and Server Start

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
async function disposeAiAgent(aiAgent: AIAgent, userId: string) { await aiAgent.dispose(); const channel = serverClient.channel( aiAgent.channel.type, aiAgent.channel.id, ); await channel.removeMembers([userId]); } app.listen(3000, () => { console.log('Server is running on port 3000'); });

Our helper function properly disposes of an agent by:

  1. Calling the agent's dispose method to clean up resources
  2. Removing the bot user from the channel it was participating in

Finally, we start the Express server on port 3000 and we're ready to go.

npm start

This Express server provides all the backend functionality needed to support our multi-model chat application, handling agent creation, model switching, and resource management in a clean, organized way.

Wow. That was a lot.

Building Our AI Model Switching Client

This won’t be as long. All the hard work is done on the server side. The client is just about allowing the user to gracefully switch models.

Main Chat Application

We’ll start with our main client code:

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// App.tsx import React, { useEffect, useState } from 'react'; import { StreamChat } from 'stream-chat'; import { Chat, Channel, ChannelHeader, MessageList, MessageInput, Window, } from 'stream-chat-react'; import { ModelSwitcher } from './components/ModelSwitcher'; // Initialize Stream Chat client const chatClient = StreamChat.getInstance(process.env.REACT_APP_STREAM_API_KEY || '');

The application starts by importing the necessary React and Stream Chat components. We initialize a Stream Chat client using an API key from environment variables. This client is what connects our frontend to Stream's infrastructure.

React Component and Setup

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
const App: React.FC = () => { const [channel, setChannel] = useState<any>(null); useEffect(() => { const setupChat = async () => { // Add check for API key if (!process.env.REACT_APP_STREAM_API_KEY) { console.error('Stream API key is not defined'); return; } // Add check for user token if (!process.env.REACT_APP_STREAM_USER_TOKEN) { console.error('Stream user token is not defined'); return; } try { // Connect user with role await chatClient.connectUser( { id: 'andrew356', role: 'admin', }, process.env.REACT_APP_STREAM_USER_TOKEN ); console.log('User connected'); // Create or join a channel const channel = chatClient.channel('messaging', 'demo-channel', { name: 'Demo Channel', }); await channel.watch(); setChannel(channel); console.log('Channel created or joined'); console.log('Token available:', !!process.env.REACT_APP_STREAM_USER_TOKEN); } catch (error) { console.error('Error connecting to Stream:', error); } }; setupChat(); }, []);

The main App component uses React's useState and useEffect hooks to manage its state. Inside the useEffect:

  1. It performs validation checks to make sure we have the necessary API key and user token
  2. Connects a hardcoded user ('andrew356') to the Stream Chat service
  3. Creates or joins a demo channel and begins watching it for updates
  4. Updates the component's state with the channel object

The empty dependency array [] ensures this setup only runs once when the component mounts.

Rendering the Chat Interface

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
if (!channel) { return <div>Loading...</div>; } return ( <Chat client={chatClient}> <Channel channel={channel}> <Window> <ChannelHeader /> <ModelSwitcher /> <MessageList /> <MessageInput /> </Window> </Channel> </Chat> ); }; export default App;

The render method first shows a loading message while the channel is being set up. Once ready, it renders a complete chat interface using Stream's React components:

  1. The Chat component provides the context for the entire chat application
  2. Channel sets up the specific conversation we're viewing
  3. Window contains the visual container for all chat elements
  4. ChannelHeader shows information about the current conversation
  5. ModelSwitcher is our custom component that allows users to switch between AI models
  6. MessageList displays the conversation history
  7. MessageInput provides the text input for sending messages

The key part of our multi-model functionality is the ModelSwitcher component, which is inserted between the channel header and message list. This component will provide the interface for users to select different AI models during the conversation.

This client-side code is straightforward because it leverages Stream's React components to handle most chat UI complexities, while our backend handles the AI model management. The only custom element needed is the ModelSwitcher component, which would communicate with our API endpoints for switching between models.

The Model Switcher Component

Now let's examine the core component that enables users to switch between different AI models during a conversation:

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// ModelSwitcher.tsx import React, { useState, useEffect } from 'react'; import { useChannelStateContext } from 'stream-chat-react'; import axios from 'axios'; import './ModelSwitcher.css'; // Define type for model information interface ModelInfo { id: string; name: string; description: string; iconUrl?: string; available: boolean; } // Your backend API base URL const API_BASE_URL = process.env.REACT_APP_API_BASE_URL || 'http://localhost:3000';

The ModelSwitcher component starts with the necessary imports and type definitions. We use React hooks for state management, the Stream Chat context to access the current channel, and Axios for API requests. The ModelInfo interface defines the structure of the model data we'll receive from our backend.

Component Setup and API Initialization

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
export const ModelSwitcher: React.FC = () => { const { channel } = useChannelStateContext(); const [models, setModels] = useState<ModelInfo[]>([]); const [activeModel, setActiveModel] = useState<string>(''); const [loading, setLoading] = useState<boolean>(true); const [error, setError] = useState<string | null>(null); // Initialize AI agent and fetch models on component mount useEffect(() => { const initializeAI = async () => { try { setLoading(true); setError(null); // Start the AI agent if we have a channel if (channel.id) { try { await axios.post(`${API_BASE_URL}/start-ai-agent`, { channel_id: channel.id, channel_type: 'messaging', platform: 'anthropic' // Default to Anthropic }); } catch (err) { // If the agent is already started, this will throw an error // We can ignore it as it means the agent is already running console.log('AI agent may already be running:', err); } } // Fetch available models const modelsResponse = await axios.get(`${API_BASE_URL}/available-models`); console.log('Models response:', modelsResponse.data); setModels(modelsResponse.data.models); // Fetch active model for this channel if (channel.id) { const activeModelResponse = await axios.get(`${API_BASE_URL}/active-model/${channel.id}`); setActiveModel(activeModelResponse.data.activeModel); } } catch (err) { console.error('Error initializing AI:', err); //setError('Failed to initialize AI. Please try again later.'); } finally { setLoading(false); } }; initializeAI(); }, [channel.id]);

The component maintains several pieces of state:

  • models: List of available AI models
  • activeModel: Currently selected model's ID
  • loading: Flag to indicate when API requests are in progress
  • error: Any error messages to display

In the useEffect hook, we:

  1. Start an AI agent for the channel if needed (defaulting to Anthropic's Claude)
  2. Fetch the list of available models from our backend
  3. Get the currently active model for this channel

The catch block for starting the agent is interesting - it gracefully handles the case where an agent is already running, treating this as a normal condition rather than an error.

Model Switching Handler

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Handle model selection const handleModelChange = async (modelId: string) => { try { setLoading(true); setError(null); // Call the API to switch the model await axios.post(`${API_BASE_URL}/switch-model`, { channel_id: channel.id, platform: modelId }); // Update the active model locally setActiveModel(modelId); } catch (err) { console.error('Error switching model:', err); setError('Failed to switch AI model. Please try again.'); } finally { setLoading(false); } };

This function handles user selection of a new model. It:

  1. Sets loading state to show a visual indicator
  2. Makes an API call to our backend's switch-model endpoint
  3. Updates the local state to reflect the change
  4. Handles any errors that might occur during the process

This clean separation of concerns - with the UI component calling the API and our backend handling the actual model-switching logic - makes the code more maintainable.

Component UI Rendering

javascript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
if (loading && models.length === 0) { return <div className="model-switcher-loading">Loading AI models...</div>; } if (error && models.length === 0) { return <div className="model-switcher-error">{error}</div>; } return ( <div className="model-switcher"> <h3 className="model-switcher-title">Select AI Model</h3> {error && <div className="model-switcher-error">{error}</div>} <div className="model-switcher-models"> {models.map((model) => ( <div key={model.id} className={`model-option ${activeModel === model.id ? 'active' : ''} ${!model.available ? 'disabled' : ''}`} onClick={() => model.available && handleModelChange(model.id)} > {model.iconUrl && ( <img src={model.iconUrl} alt={`${model.name} icon`} className="model-icon" /> )} <div className="model-details"> <h4 className="model-name">{model.name}</h4> <p className="model-description">{model.description}</p> {!model.available && <span className="model-unavailable">Unavailable</span>} {activeModel === model.id && <span className="model-active">Active</span>} </div> </div> ))} </div> {loading && <div className="model-switcher-loading-overlay">Updating...</div>} </div> );

The rendering logic first handles loading and error states with simple messages. The main UI consists of:

  1. A title for the component
  2. Any error messages that need to be shown
  3. A list of model options rendered dynamically from our models array
  4. A loading overlay that appears during model switching

Each model option includes:

  • The model's icon (if available)
  • The model name and description
  • Status indicators for unavailable models and the currently active model
  • CSS classes to visually distinguish active, disabled, and regular options

The onClick handler only triggers model switching if the model is available, preventing users from selecting unavailable options.

This simple yet effective UI gives users a clear way to switch between AI models during their conversation, with appropriate visual feedback throughout the process. Combined with our backend system, it creates a seamless multi-model experience with minimal client-side code.

We can run this with:

npm run dev

Then, if we head to http://localhost:3001, we can chat with our different models. We default to Claude, so that is first up:

Then, we can switch to make GPT active:

Finally, we’ll check in on Llama:

Bringing AI Together to Chat

With our multi-model AI chat system now complete, you've got a powerful tool that lets users switch between different AI personalities during conversations. The beauty of this approach is its flexibility – as new models emerge or your users' needs evolve, you can easily integrate additional LLMs without disrupting the user experience.

Stream's Chat API handles all the heavy lifting of real-time messaging, while our backend orchestration layer manages the complexities of model switching. The result is a chat experience that combines the best of multiple AI worlds – Claude's writing finesse, GPT's analytical prowess, and Llama's open-source flexibility – all within a single, cohesive interface.

Whether you're building a customer support system that needs different AI strengths for different queries, or a creative writing assistant that benefits from varied AI perspectives, this multi-model approach opens up exciting new possibilities for your applications.

So go ahead and give your users the power of choice – because sometimes, the right AI for the job depends entirely on the question being asked.

Integrating Video With Your App?
We've built a Video and Audio solution just for you. Check out our APIs and SDKs.
Learn more ->