The Generative Assessment Project

A research initiative ranking the strengths and weaknesses of large language model offerings from industry leaders like OpenAI, Anthropic, and Meta as well as other open source models.

We'll periodically update the page with our newest, insightful findings on the rapidly-evolving LLM landscape

Gap Block Shape
Gap Block Shape
Gap Block Shape

The Most Robust Way to Evaluate LLMs

Bench is our solution to help teams evaluate the different LLM options out there in a quick, easy and consistent way.

Learn More
Arthur Bench Illustration