Introduction

Large language models (LLMs) producing nonsensical content or contradictory content based on the prompt—what is normally called hallucinations—is quite a hard problem to solve. Hallucinations can take on many forms, such as not solving a math problem correctly or making false statements about presidents. However, describing hallucinated content in this way, although helpful for our colloquial understanding, may not benefit us in the long term.

How we describe hallucinations is important. If we decide to look at high-level definitions of hallucinated content, such as unverifiable or false responses, it is hard to dissect what exactly we mean. But if we stick to describing specific instances, such as getting the step of a math problem wrong or producing a research paper when answering a question, we don’t have a good way to gain general understanding for how hallucinations occur. Moreover, all these ways of talking about hallucinations give some agency to the model. Although unintentional, when we say that an LLM got a math problem wrong or said something false about a celebrity, we are implying that the LLM somehow has the capability of knowing the correct answer—when in reality, any of the released LLMs to date don’t have the capability to understand.

Why does this matter?

Without proper definitions and understandings behind hallucinations, the AI community is not able to create high-quality datasets about hallucinations. And without high-quality datasets, our ability to build solutions to tackle hallucinations is hindered because we aren’t able to train models or produce valid evaluations.

The current datasets that exist today are fairly broad, binary (thus, unable to get any granular feel for the types of hallucinations), and at times a bit dirty. And this isn’t to say there haven’t been some great attempts at analyzing hallucinations. One of our favorite papers provides preliminary taxonomies for hallucinated content, while another produces what are seen as some of the best datasets to date. But overall, the field of hallucination detection and mitigation is quite nascent, and the need for high quality data amongst the entire AI community is needed.

Here at Arthur, one of our focuses is on hallucinations. We believe that having a rigorous understanding of hallucinations, where they come from, and how they are generated can help us not only gain a deeper understanding of hallucinations, but also help us create such a dataset for the AI community. Read the blog posts from the Arthur team on some of our work trying to compare the rates of hallucinations from different language models and start to analyze the types of hallucinations that are occurring.

Call to Action: Help us collect data!

Recently, we have been creating a taxonomy, so that we can create high-quality data. There are two ways to go about this: either start by generating themes from research, datasets, etc. that exist and then collect data against it, or collect data and start seeing what themes emerge from the data itself. We are at the point where we need to start collecting some data! 

As you stumble across hallucinations, please fill out this form. Any and all hallucinated content will be useful. We will use this to inform our taxonomy development and, in the near future, we will open source a high-quality hallucinations dataset for the AI community to build upon. If you have any questions, concerns, or want to collaborate, feel free to email me at daniel.nissani@arthur.ai.