Company Updates

Arthur in 2025: Building Trust and Governance for the Agentic AI Era

By:Arthur Team

December 22, 2025

Arthur in 2025: Building Trust and Governance for the Agentic AI Era

2025 marked a defining year for Arthur.

As AI systems rapidly evolved from single-model deployments to multi-agent, autonomous workflows, the industry crossed a critical threshold. Scale was no longer the hard part. Trust, governance, and reliability were.

Throughout 2025, Arthur focused on one core mission: helping teams confidently build, deploy, and scale AI systems they can trust. We shipped foundational platform capabilities, expanded our evaluation and governance tooling for agentic AI, deepened partnerships across cloud ecosystems, and invested heavily in in-person community and education.

This year-in-review reflects how we built in the open with our customers, partners, and community.

Product Momentum: From Model Monitoring to Agentic Governance

In 2025, Arthur moved beyond traditional model observability to meet the demands of agentic AI in production. As autonomous systems scaled, Arthur safeguarded and monitored more than 1 billion tokens across real-world deployments, giving teams the visibility and control required to govern agents, not just models.

Across monthly releases, Arthur evolved into a platform purpose-built for the realities of getting agentic AI into production.

Major Product Milestones

We launched a new Agent Discovery and Governance (ADG) Platform

Automated discovery of agentic systems across environments
Centralized inventory of agent behavior, execution, and dependencies
Governance controls designed for autonomous and multi-agent workflows

We Open Sourced the Arthur Evals Engine

And launched it at the Open Source AI Conference
Production-ready evaluation workflows spanning development through post-deployment monitoring, completely free and open source

We introduced the Agent Development Lifecycle (ADLC) Methodology

An end-to-end lifecycle management for agentic systems to build reliable AI agents
Built-in evaluation and governance checkpoints at every stage of the agent lifecycle
A unified framework to manage autonomous agents across teams, environments, and vendors with consistent standards

Customer impact in action: See how Upsolve used Arthur to detect a critical GPT-5 regression before it reached users, enabling trusted agentic AI in a high-stakes financial environment.

We continued to improve our core capabilities

Custom metrics you can define, and reuse across all your AI projects
Expanded PII detection and policy enforcement, for out-of-the-box value
Increased aggregation timeouts for enterprise-scale workloads
Continuous performance improvements driven by real customer usage

Together, these releases laid the foundation for continuous improvement loops across agentic systems, not just static model checks. Ultimately, it enabled teams to move faster without sacrificing visibility or control.

Advancing Responsible AI in Production

As AI agents became more autonomous in 2025, responsible deployment shifted from principle to practice.

Arthur invested deeply in:

Defining operational standards for “safe and useful” AI behavior
Enabling experimentation, human feedback loops, and evaluation at scale
Helping enterprises apply consistent governance across vendor-supplied and custom-built AI systems

This year, we also launched the Arthur Start Up Partner Program to support founders and technical teams building agentic AI from day one. The program is designed to help startups bring agents safely and reliably into production faster by providing best-in-class evals and governance tooling and hands-on support from our team. Apply here.

Our work this year reinforced a simple belief: governance should accelerate innovation, not slow it down.

Turning AI Ambition into Production Readiness

In 2025, we doubled down on something increasingly rare in AI: being in the same room.

Across community-hosted IRL events and major industry conferences, we met builders, researchers, and enterprise leaders where real agentic systems are being designed, deployed, and governed. These were not polished demo environments. They were working sessions grounded in production reality.

Community Events That Go Beyond Demos

In 2025, Arthur hosted and co-hosted in-person community gatherings across New York, San Francisco, Los Angeles, and Las Vegas, bringing together builders, hackers, and enterprise teams.

*Arthur Team, CloudFlare HQ, San Francisco, Tech Week, October 2025*

These sessions were intentionally designed for shared learning and candid discussion, not polished product demos. Each event focused on the real challenges teams face as AI systems move from prototypes into production.

Topics included:

What builders need next from AI tools as the developer experience evolves
How teams are operationalizing evaluation and scaling AI, including hands-on workshops with AWS
The tools and best practices teams use to move beyond prototypes and ship real-world AI agents
Lessons from real-world failures in agentic systems and how continuous evaluation enables safe scaling
Where governance breaks down in production and how to maintain trust as systems grow more autonomous

Arthur, Vercel, Deskree Teams, NY Tech Week, June 2025

The outcome was simple and powerful: stronger relationships, sharper product feedback, and a growing community aligned around making AI systems more reliable, accountable, and useful.

Conferences Driving Production Readiness

Alongside our own events, Arthur engaged deeply at leading industry conferences where AI ambition meets operational complexity.

AWS re:Invent | Las Vegas
Engaged with thousands of AI builders and cloud teams on agentic architectures, evaluation, and governance at scale. We also hosted our IRL Arthur Run Club down the Las Vegas Strip, blending community and conversation outside the expo floor.

*Arthur Booth at the Agentic AI Insurance Conference, NYC, Nov 2025*

Agentic AI Insurance Conference | New York
Explored real-world adoption challenges for agentic AI in regulated industries, with a focus on transparency, accountability, and system reliability.

Ai4 Conference | Las Vegas
Joined business executives and technical leaders shaping how AI is applied across industries, with clear signals that agentic systems are moving from experimentation into core operations.

Artificial Intelligence (AI) Global Leadership Summit | New York

Our CEO, Adam Wenchel, spoke on stage at the New York Stock Exchange and shared best practices for innovating responsibly with a group of business, government, academia, and civil society senior leaders.

The Signal We Heard Everywhere

Across community rooms and conference halls alike, one theme was consistent: enterprises are moving fast into agentic AI, but governance tooling is struggling to keep up.

Arthur is helping close that gap.

Building Where Teams Already Work

From native integrations to open standards, we met AI teams inside their existing stacks.

*Arthur and AWS Team, Amazon JFK Frank, NYC, August 2025*

We continued to deepen partnerships across the AI ecosystem in 2025, including cloud platforms like AWS and Google Cloud, developer tooling, and open standards. We released the Arthur Platform in the newly launched AI Agents and Tools category of AWS Marketplace.

By supporting specifications like OpenInference and working closely with partners across infrastructure and application layers, we made it easier for teams to adopt Arthur without rearchitecting their stack. These partnerships and integrations allowed teams to move from reactive monitoring to proactive optimization without added complexity.

Customer impact in action: See how Expel cut ML monitoring time by 50 percent while improving coverage across production systems, proving that strong governance can increase velocity rather than slow it down.

Looking Ahead to 2026

If 2025 was about laying the groundwork for agentic governance, 2026 will be about acceleration.

We are entering the next year focused on:

Policy Agents - agents that actively interact within an AI system to oversee the agents carrying out the core tasks, ensuring alignment and great outcomes
Automated Discovery & Governance - Catalog agents across complex enterprise environments and make sure they are not out of control‍
The “Do Nothing” performance boost! - the major model makers will roll out new versions of their models that are better than ever at following instructions, calling tools correctly, and coordinating with other agents - your applications will work better even if you do nothing!

Thank you to our customers, partners, and community for building alongside us. We are just getting started.

If you joined us at an event, contributed feedback, opened a pull request, or pushed our platform to its limits, we are grateful to be building with you.

Here’s to making AI more reliable, more accountable, and more human in 2026.