Create LLM-powered Chatbot For Your Documentation

11 min read

The capabilities of AI systems have transformed how we interact with technology. Chatting with systems and expecting meaningful, up-to-date, thoughtful answers has become more natural.

Stefan B.
Stefan B.
Published December 1, 2023

Because many large language models are trained on general data, they can only answer general questions. However, businesses are left with the question of leveraging their data to incorporate it into the knowledge corpus of the models.

Picture this: Your software's user guide isn't just a manual anymore; it's a conversation starter with your AI. In this article, we demonstrate building a chatbot that users can interact with, which is fed knowledge of Stream’s Chat and Video SDKs. We will work with Markdown files, but you can achieve the same result for PDFs, text files, and other formats.

There are different ways to approach this. The first would be to train a custom language model from scratch, which isn’t feasible. These models are called large language models (LLM) for a reason. Not only do they have billions of parameters that must be fine-tuned in the training process, but they also require vast amounts of quality data. They first learn from generic text to evolve a basic understanding, and then they are refined on custom text.

We neither have the required data nor the computing capacity to perform this training so we will take a different route. Using the data we have, we will extract embeddings. We’ll explain what they are in more detail later on, but they are a mathematical representation of small chunks of our documentation texts. These embeddings are then compared to the user's chat message to our chatbot. We will compare the message and find the most similar chunk of our embeddings before handing this information to an LLM as additional information for answering the question. This bonus knowledge and its pre-trained natural language capabilities will provide a detailed answer tailored to the question.

We’ll go into more detail on each step, but let’s look at the tech stack for now. We choose Python as a language because it gives us a lot of flexibility and great tooling for the things we want to achieve. The LangChain package quickly integrates different LLMs and embedding providers. For this article, we chose to go with OpenAI, but we might explore other options in the future, so please let us know if you’re interested.

Let’s start setting up the project and the environment we need. You can either follow along or clone the final project here.

1. Project Setup

We must configure a Python environment with the necessary packages to run the project. Also, we need an OpenAI API key to access their embedding and LLM services. We will set up both in this chapter so that we’re up and running for the implementation of the project.

For this guide, we choose Python 3.9. It will probably work on lower versions, but we haven’t tested it, so use it at your own risk. Find out which version you have installed by running the python --version command.

First, we create a folder for our project (we go with the name llm-docs-chat but feel free to choose your own) and change into that directory by running:

mkdir llm-docs-chat
cd llm-docs-chat

Having many different projects on a machine with different package installations and versions can get messy quickly, so we will create a separate environment for our project. We will use venv for that.

Let’s create a venv environment and correctly specify our versions. Run the following command:

python -m venv llm-env

What actions does this command perform?

  • create a directory in our project folder (called llm-env - we can also name it any other way)
  • places a pyvenv.cfg file inside of the directory (which points to the Python installation from which the command was run)
  • create a bin sub-directory, which contains the Python binaries and another one for the packages that’ll be installed

💡 Note: we can add the llm-env folder to the .gitignore file.

Next, we want to activate the virtual environment with a blank slate and install packages only in our newly created space. Run the appropriate command:

// macOS, Linux
source llm-env/bin/activate

// Windows

We have a working separate Python environment now and are ready to install the required packages for the project.

First, install all the packages with this command:

pip install langchain python-dotenv beautifulsoup4 streamlit

What do these packages do?

langchain: We interact with the LLM and create embeddings using this package
python-dotenv: We load the OpenAI key using a .env file to not commit it directly inside of our code
beautifulsoup4: transform Markdown to HTML elements that we can easily split
streamlit: a framework to easily create web apps with Python

The last thing we must create is a .env file in the root of our project that we fill with the OpenAI API key. If you don’t have a key yet, create one here. Copy this into the file:


The Langchain package will be smart enough to look for that key in the environment when communicating with the OpenAI API.

Now that we have everything set up, we begin the implementation.

2. Loading Documentation Content

We want to enrich our request to the LLM with the most relevant snippet from our documentation. To achieve this, we must first split it into digestible pieces. There are many ways to do this, and it depends on our data structure.

In our case, it consists of nested folders with distinct Markdown files, each containing relevant info for a specific topic. To achieve our goal, we follow these steps:

Recursively get a list of all files in the relevant directory and filter them only to contain Markdown format (we also accept mdx files which are a Markdown flavor with some React functionality mixed in).
Convert each file into HTML (using BeautifulSoup) and split it by the headlines (<h1> - <h5>).
Inside each of these chunks, convert the HTML inside of them (for example, div, p, span) into text and concatenate them.

Python makes handling lists of data easy, so here is the code to load the text from the documentation:

import os
from glob import iglob
import itertools
import markdown
from bs4 import BeautifulSoup

def load_text_from_docs():
    rootdir = "./data/iOS/**/*"
    file_types = ["md", "mdx"]
    files = [f for f in iglob(rootdir, recursive=True) if os.path.isfile(f)]
    filtered_files = list(
        filter(lambda x: x.split(".")[-1] in file_types, files)

    text_elements = []

We achieved our goal of splitting the entire body of our documentation files into a single array of text chunks. This process can look different depending on our data, how it’s structured, and its format. But while this part of the article may vary per use case, the rest is primarily generic and can be applied to many problems similarly.

3. Embedding Documentation Content

The data is now prepared for pure text format. Why is that not enough? Remember that we want to create a chatbot that can answer questions about the documentation. To find the relevant piece of text related to a query, we must be able to measure the similarity between texts.

Building your own app? Get early access to our Livestream or Video Calling API and launch in days!

Now, embeddings come into play. We will not cover the mathematics behind them here (see our machine learning engineer Chiara Caratelli’s talk about sentence embeddings to learn more), but they are fascinating. We only need to understand what happens on the surface. Finding objective similarity in texts is hard, but it can be modeled as a mathematical problem. Numbers close to each other are more similar than those far apart. Embeddings take this concept into a multi-dimensional space.

Each chunk of text transforms into a vector. Think of it as an array of numbers. These numbers are not random, but they are modeled so that the more similar texts are, the closer the mathematical distance between them is.

Achieving this is a complex problem, which is why different providers offer solutions to do that work for us. In our case, we chose the embeddings from OpenAI (specifically, the text-embedding-ada-002 model). We can send our chunks of texts to their service, giving us the vectors we need to compute similarity.

The second problem we must solve is storing these vectors, a perfect case for vector databases. They not only store the data efficiently but also optimize to find and compute similarity in a fast and reliable way. There are many providers out there. We will use FAISS, an open-source solution by Meta.

Amazingly, this implementation will only take us two lines of code. Here it is:

from OpenAIEmbeddings import OpenAIEmbeddings
from faiss_vector_db import FAISS

def put_data_into_db(text_chunks):
    embeddings = OpenAIEmbeddings()
    vector_store = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
    return vector_store

Remember that this only works when we have created an environment (.env) file and put our OpenAI API key inside it. Additionally, we need to load the environment using load_dotenv(), which we will do in a later chapter.

We have transformed the data into chunks of text and now created vector representations of them. Now, we are ready to receive queries and search for similar pieces in our database.

4. Create a Chat Interface Using LangChain

The other building blocks are ready. We are only missing the chatbot itself. We can already present a text from our knowledge base similar to the user's query. However, we want to be able to present the answer in natural language so that the communication with our system feels more conversational.

The popular LangChain package will significantly help us. We will leverage three components from it, and we want to describe what they do quickly:

  1. ChatOpenAI: This object is the LLM. We initialize it from OpenAI using this convenient wrapper.
  2. ConversationalBufferMemory: We want to keep the information from previous requests so that the user can ask follow-up questions and the conversation is more natural. This object buffers the previous conversation and keeps it in memory.
  3. ConversationalRetrievalChain: This ties the previous objects together and takes in the vector database to combine it into an easy-to-use conversation.

The advantage of using LangChain is that everything ties together really well, and we can use many of the objects directly. Here is the code to create this conversation bot:

from OpenAIChat import ChatOpenAI
from LangChain import ConversationalBufferMemory, ConversationalRetrievalChain

def create_conversation_bot(vector_store):
    llm = ChatOpenAI()
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    conversation_bot = ConversationalRetrievalChain.from_llm(
        llm=llm, retriever=vector_store.as_retriever(), memory=memory
    return conversation_bot

We will have to hand in the query to get an answer, which we will do in the next chapter. We will create a web application to conveniently chat with our bot to have a nice interface.

5. Create a Basic Streamlit Application

We have created all the necessary tools but have yet to interact with them. We will now build an interface using Streamlit. The reason for that is that it integrates well with our setting.

In the end, the tool will offer this functionality:

  1. Upon startup, load the data from our documentation files, parse them into embeddings, and create the conversation bot.
  2. Allow for user input to post a question.
  3. Take that input and ask our conversation bot for an answer.
  4. Display these chat messages on the page.

Due to the nature of this blog post, we’ll not go into too much detail with the Streamlit specifics (read more on its ideas here). It allows us to add user elements with simple Python code (all components can be found here). We’ll also use the session_state to store data for as long as the browser tab runs.

Specifically, we’ll store the conversation_bot and the chat_history inside of it so that we don’t have to re-create these from scratch again and again. Especially the conversation_bot would require parsing all documentation data and creating the embeddings again, which we don’t want.

Here’s the code:

if __name__ == "__main__":

    # setup the Streamlit app
    st.set_page_config(page_title="Chat with Stream docs", page_icon=":boat:")
    st.write(css, unsafe_allow_html=True)
    st.header("Chat with Stream docs :boat:")

    if "conversation_bot" not in st.session_state:
        with st.spinner("Loading data..."):
            # load docs data into texts
            texts = load_text_from_docs(SDK.iOS)

            # load embeddings
            vector_store = put_data_into_db(texts)

The remaining function for us to implement is handle_question, which receives a String.

Here’s the code, and we’ll explain what happens afterward:

def handle_question(question):
    response = st.session_state.conversation_bot({"question": question})
    st.session_state.chat_history = response["chat_history"]
    for i, message in enumerate(reversed(st.session_state.chat_history)):
        if i % 2 == 0:
                bot_template.replace("{{MSG}}", message.content), 
                user_template.replace("{{MSG}}", message.content),

In essence, we ask the conversation_bot a question. We’ll get a response containing a chat_history of all the messages we have exchanged with the bot. This one we’ll save to the session_state. After that, we’ll write the text of each message on the screen using Streamlit’s st.write() component.

For that, we define HTML components (with some basic styling) and replace the {{MSG}} template with the message content.

Add a file to the project root and fill it with this code:

css = """
.chat-message {
  padding: 1rem;
  border-radius: 0.5rem;
  border: 1px solid #e5e5e5;
  margin-top: 1rem;

.chat-message h3 {
  font-size: medium;
  font-weight: bold;
  color: #919191;
  margin-bottom: 0;
  padding-bottom: 0;

We can now import these templates in like this:

from htmlTemplates import css, bot_template, user_template

The app is now ready, and we can run it using streamlit run We can chat with our documentation and get detailed responses with the wisdom of our knowledge base.


In this post, we’ve achieved a lot of things. We have parsed our knowledge base into chunks of wisdom. These were Markdown files in our case, but they can be anything from PDFs to plain text or websites.

We have created embedding vectors and saved them from these chunks in a vector database to find similar chunks that we can pair with a question the user asks.

We combined this with a powerful Large Language Model (LLM) to receive impressive answers enriched with our custom domain knowledge.

Through these steps alone, we’ve figured out how to add specific data for our business and feed it into state-of-the-art AI models. This can significantly improve the performance and value of the systems that we have in place and that our customers can interact with.

And while doing that, we didn’t need expensive re-training of these models. We have created custom embeddings that we can easily re-create once our knowledge base is updated, and the results will account for that.

Stay tuned for more content, such as exploring other options for creating embeddings using open-source tooling and experimenting with other language models outside the OpenAI world. Feel free to chat with us on Twitter about your ideas for improving this setup. We’re very excited about the possibilities.

decorative lines
Integrating Video With Your App?
We've built a Video and Audio solution just for you. Check out our APIs and SDKs.
Learn more ->