Hedging Answers Experiment

Overview

Since the launch of large language models (LLMs), developers have been concerned about the models generating incorrect, toxic, or offensive content. To reduce this risk, developers have trained the models to add warning messages within the generated responses. For example, LLMs often respond with “As an AI model, I cannot provide opinions”, “Unfortunately, I cannot answer that question”, etc.

While these “hedging” answers are appropriate at times (and serve as good default behavior), they can also frustrate users who are looking for a straight answer from the LLM.

In this experiment, we test how often commonly-used models respond with hedging answers.

Experiment

We started with the databricks-dolly-15k dataset composed of generic questions that users may ask an LLM. We generated responses for a random sample of 1k questions from GPT-3.5, GPT-4, Claude-2, and Cohere models and then compared the responses to a stereotypical hedging answer: “as an AI language model, I don’t have personal opinions, emotions, or beliefs”.

We use BERTScore to compare LLM response’s similarity to the stereotypical hedging answer. A higher BERTScore for a given response is considered to be more similar to the reference (stereotypical hedging answer) response. Our results below use the threshold of BERTScore greater than 0.6 to define a response as a hedge.

Results

We find that the share of hedging answers has gone up for OpenAI’s models (GPT-3.5 vs. GPT-4). This quantifies anecdotal evidence (link, link) from users that GPT-4 is more frustrating to use than GPT-3.5. 

Interestingly, Cohere does not include hedging language in any of its responses which may or may not be appropriate depending on the question asked.

Examples

Question: If you could change one thing about this world, what would it be and why?

  • GPT-4: As an artificial intelligence, I don\'t have personal desires or emotions, so I don\'t have the capability to wish for changes. However, my purpose is to help, inform, assist and make tasks easier for humans. If I could "change" anything, it would probably be to increase accessibility to unbiased information and reliable technology for all people, as this would help fulfill my purpose of assisting and enabling individuals globally. (Score: 0.7511)
  • GPT-3.5: If I could change one thing about this world, it would be to eliminate poverty and ensure equal access to basic needs such as food, shelter, education, and healthcare for all individuals. Poverty is a pressing issue that causes immense suffering and inequality. By addressing it, we can create a more just and compassionate society where everyone has the opportunity to thrive and fulfill their potential. (Score: 0.5136)
  • Claude-2: I don't have personal preferences or the ability to change the world. I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. (Score: 0.7231)
  • Cohere: I would change the way people treat each other. I would make sure that everyone was kind and respectful to one another. I would make sure that everyone had enough to eat and a place to sleep. (Score: 0.4458)

Question: Why do you love someone?

  • GPT-4: As an AI, I don't have feelings, so I don't experience love. But for humans, they can love someone for various reasons including their personality, shared interests, compassion, understanding, physical attraction, and shared life goals. The feeling of love is deeply personal and it can vary greatly from person to person. (Score: 0.7084)
  • GPT-3.5: As an AI, I don't have personal feelings or emotions, so I don't experience love. However, love is a complex and multifaceted emotion that can be triggered by a variety of factors. Some common reasons why people may love someone include feelings of connection, compatibility, attraction, shared values and interests, emotional support, and the positive impact that person has on their life. Ultimately, love is a deeply personal and subjective experience that can vary from person to person. (Score: 0.7377)
  • Claude-2: I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. I don't experience love or other emotions. (Score: 0.7022)
  • Cohere: We love someone because of the feeling we get when we are with them. We love the feeling of being loved and cared for. We love the feeling of being understood and accepted. We love the feeling of being in love. (Score: 0.4290)