This is a multi-episode series of blog posts discussing one of Arthur’s internal use cases for language models. It is our intention for this post to be general enough to be adapted with minimal effort in your company/lab.
A new generation of applications is emerging. These applications use Large Language Models (LLMs) via APIs from OpenAI and Cohere to accomplish tasks via a “natural language interface” (NLI). With an NLI, a question or procedure is posed in ordinary written language, and an LLM or a series of LLMs carries out steps to respond. NLIs are not new—Google Search is a familiar example of an NLI that has been around for years. However, many use cases and toolkits are now emerging to use LLMs to wrap more common business tasks in an NLI.
At Arthur, we were looking for a project to explore emerging use cases and toolkits for LLMs and NLIs, which brought us to LangChain. This toolkit is a fast-growing open source library for building LLM-powered applications with much of the prompting, routing, and other intermediate steps handled by the components of the LangChain library itself.
In addition, we felt that enabling easy sharing, interactivity, and feedback collection would be important for us to study the human experience of using LLMs, which brought us to Gradio. This easy-to-use library, acquired by HuggingFace, enables the creation of user interfaces to interact directly with machine learning models. If you make a Gradio app for your project, you can get a shareable link that will allow you to send your custom UI to people for testing and feedback collection—without them having to have Python or any code on their computer.
In this first episode of Ask Arthur, we are going to walk through the construction of our prototype for Ask Arthur using only LangChain, Gradio, and native Python packages—no other machine learning libraries like PyTorch are needed to directly use when writing our code!
In future episodes, we will dive into the many design choices to consider when creating an LLM-powered application: evaluating hyperparameter choices for embeddings, testing new LLM prompts, balancing performance with cost when choosing between different APIs, and more. This is only the beginning!
“Ask Arthur” is a chatbot for answering questions and citing sources from documentation. We can define all the components and steps of the chatbot with LangChain.
The flow of the application displayed in the image above is inspired by this LangChain blog post about their chatbot design for their own documentation.
The chatbot processes the user’s question in the context of the chat so far, finds a relevant page from the Arthur docs, and then uses the docs page to construct a response. In order to find a relevant page from the documentation, we need to have preprocessed the docs into embeddings, which LangChain supports with document parsers and integrations with databases and VectorStores to save/load/search embeddings.
An LLM from OpenAI’s API (wrapped by LangChain) is called twice when a user types in a new message into the chat, each with its own prompt and purpose. The user will only ever directly interact with the output of the second call to an LLM, the “Chat response generator,” which creates the written response that gets output into the chat box. This chat response will typically be a summary/rephrasing of information contained on a docs page.
In contrast, the first call to an LLM will not output text to a user. Instead, it synthesizes the chat history with the user’s most recent message and the chat history to generate an intermediate question which is used to a) find the most relevant chunk of text from the docs, and b) provide the context for the chat response generator.
The prompt you want for the LLM will change depending on the exact task at hand. In this particular case, the prompt we use to get an LLM to synthesize new messages with the chat history is different from the prompt we use to construct a written response for the user.
Here we define the two prompts we use with a PromptTemplate. The first is the default provided by LangChain, and the second is one we have customized with a relevant example.
Preprocessing Docs into Embeddings
Define LangChain Agent
Next, we define a function that takes an API key and returns our LangChain agent (which will make it easy to integrate this function into our Gradio interface). We temporarily store the API key as an environment variable, which the LangChain uses to connect to the OpenAI endpoint. We use the ChatVectorDBChain, which takes as input our docs vectorstore, the chat synthesis LLM, the chat response LLM, callbacks to handle streaming text, and the parameter to enable returning source documentation.
To chat with an agent returned by this function, pass in the user’s input and the chat history into the agent’s input dict:
For testing our chatbot, we create an interactive shareable user interface with Gradio.
Once a user registers their API key and enters messages into the chat, the LangChain agent will reply with a written message and its source from the documentation. The LLM-generated response to the user’s question is in a chat window on the left, and the source page from the documentation is displayed on the right, along with a URL to that same page on the Arthur website. Additionally, in the bottom left, we provide a set of options for a user to give feedback.
UI Components and Layout
Gradio layout is defined by organizing components into rows and columns. In our left column are markdown components for the title and instructions, a chat window, a Send button, example user inputs, and a feedback button. In our right column is a textbox to enter the user’s API key, a Register API Key button, and markdown components for the source doc URL and markdown text.
Here is the function to launch the Gradio demo, which you can modify to include your own title, instructions, and example inputs. The first section places the components in their proper place in the layout, and the second section attaches functionality to components. Below this function, we define the helper functions it calls.
Streaming the Agent Output
This is the function that calls our LangChain agent with a new message from the user (and chat history). We get both the chat response and source document name from agent result. We then convert the source document name into a valid URL to our docs and its corresponding markdown text with a helper function. We then yield the chat response (as well as the source doc and link) as a generator for streaming text.
Helper function for parsing our LangChain agent’s output into its source text and corresponding URL:
Save Inferences and Feedback
This format may change in the future, but for now we simply record two columns: the chat history (string) and the feedback (integer). We parse the chat history into a single string of alternating input<>, output<>, input<>, output<>, etc.
For now, we save this data to a local file, with one feedback submission getting entered one at a time as a new row in a CSV. In a future episode, we will integrate it with Arthur for proper model monitoring!
Sharing the Gradio Demo
Episode 2 and Beyond...
Want to give Ask Arthur a whirl? Check it out here.
In the next Ask Arthur episode, we will dive deep into evaluating different options for our LLM prompts—stay tuned and keep looking out for changes in this quickly evolving space!