AI Agent Compliance Tools for Banks: A Guide

June 12, 20266 min read

Banks are deploying AI agents across customer support, fraud detection, KYC and AML, credit decisioning, and back-office operations. These agents do not just answer questions. They reason, call tools and APIs, access sensitive systems, and take multi-step actions on their own. That autonomy is exactly what makes them useful, and exactly what makes them a regulatory problem.

No single tool makes a bank compliant. In practice, banks assemble a stack of governance, discovery, observability, evaluation, and guardrail tools that together satisfy a fast-moving set of authorities: the EU AI Act, the NIST AI Risk Management Framework, DORA, consumer protection and fair lending law, and data protection rules such as GLBA and GDPR. Notably, as of an April 2026 interagency rewrite, SR 11-7 model risk guidance no longer covers generative and agentic AI, which reshapes how banks have to frame agent compliance. This post breaks down the categories of tools banks use, maps them to the regulations that actually apply, and explains how to choose them.

Why AI agents create new regulatory risk in banking

Traditional model risk management was built around a clear question: is this model accurate, fair, and explainable? Agentic AI shifts the question. The concern is no longer only whether a model is safe, but whether the bank can discover, observe, govern, and reconstruct every action an agent took.

Three properties of agents drive the new risk:

Autonomy. Agents take actions, not just predictions. An agent that can move money, update a record, or send a customer message carries operational risk a static model never did.
Tool and data access. Agents call APIs, query databases, and touch internal systems. Over-broad permissions turn a single agent into a wide risk surface.
Proliferation. Agents enter the enterprise from every direction: in-house application teams, third-party software that quietly adds agentic features, and new vendor solutions. Most organizations quickly lose track of what is running where.

This is the "shadow agent" problem, and it is already here. According to McKinsey, 80% of organizations are reporting risky behavior from AI agents. For a regulated bank, an unmanaged agent with access to customer data or core systems is a compliance failure waiting to be found in an audit.

The compliance tool stack for AI agents

Because agents span discovery, runtime control, and oversight, banks need several categories of tooling that work together. Here is how the stack breaks down.

1. Agent discovery and inventory

You cannot govern what you cannot see. Discovery tools automatically scan cloud and compute environments to find and catalog agents as they appear, instead of relying on manual spreadsheets. Common discovery techniques include:

Telemetry scanning: listening to OpenTelemetry (OTEL) streams to detect new agents, tools, and configuration changes.
MCP monitoring: detecting Model Context Protocol servers that expose agents and tools.
Network layer analysis: inspecting traffic for LLM and agent signatures.
API-driven discovery: querying platforms like Google Vertex AI and AWS Bedrock for what is running.

A complete inventory, with a named owner for every agent, is the foundation of any audit. An agent without an owner is an agent without accountability.

2. AI governance and policy enforcement

Governance tools turn a raw inventory into a controlled, auditable operation. They provide a unified policy framework across the enterprise, agnostic governance that works no matter the cloud, framework, or model, and customizable policies because one size does not fit all. A customer support agent for retail banking needs different controls than an internal AML investigation agent.

The most important capability is enforcement that adapts to each use case: which data an agent can access, which tools it can invoke, and what behaviors are acceptable.

3. Observability and audit trails

Regulators and auditors want to answer "what did the agent do, and why?" Observability tools trace every agent run end to end: prompts, completions, tool calls, retrievals, reasoning steps, token counts, and cost. Built on open standards like OpenTelemetry, this tracing produces the decision lineage and forensic replay that audits depend on. The teams that instrument early are the ones that can demonstrate control later.

4. Continuous evaluation and monitoring

Agents are non-deterministic. One that passes a test suite today can fail the same case tomorrow, and production inputs are far more varied than any handwritten test set. Continuous evaluations run automated checks against live traffic to catch issues like hallucination, incomplete answers, off-topic responses, and wrong tool use before customers or regulators do. Pairing evals with alerting means the compliance team is notified the moment behavior drifts, rather than after a complaint.

5. Guardrails for real-time policy enforcement

Discovery, observability, and evals are retrospective. Guardrails intercept agent behavior in real time. They fall into two types:

Pre-LLM guardrails run before input reaches the model: PII detection and redaction, sensitive data blocking (credentials, card numbers, proprietary data), and prompt injection defense. For a customer-facing banking agent, redacting PII before anything leaves the corporate environment is often non-negotiable for compliance.
Post-LLM guardrails run before a response reaches the user: hallucination detection, toxicity checks, tool and action validation, and output format compliance. The most powerful pattern feeds a failed check back to the agent in a self-correction loop, so the user only sees responses where every factual claim is grounded.

6. Model governance and explainability

Banks still need model inventory, validation, documentation, bias checks, and explainability for the agents they run. The newer agent tooling complements existing model governance platforms rather than replacing them, adding the action-level traceability that traditional model risk management was never designed to capture.

A word of caution on framing here. Many guides still cite SR 11-7 as the regulation that governs AI agents. As of the April 17, 2026 interagency rewrite (OCC Bulletin 2026-13), that is no longer accurate. The OCC, Federal Reserve, and FDIC formally placed generative and agentic AI outside the scope of SR 11-7, with a request for information (RFI) on AI-specific model risk expected to follow. SR 11-7 still applies to traditional quantitative models like credit scoring and VaR, but not to your loan-underwriting RAG pipeline or your KYC LLM classifier.

That carve-out does not create an unregulated zone. It removes one consolidated frame and leaves the agent governed by a different set of authorities.

What actually governs AI agents in banking now

The regulatory picture is more fragmented than most "AI compliance" content suggests. With SR 11-7 stepping back from GenAI, the controls banks build still have to satisfy a range of overlapping authorities. The tool categories above map onto them directly.

Consumer protection and fair lending (Reg B, FCRA adverse-action rules, ECOA): an agent that touches credit decisions needs explainability and decision lineage. Observability, audit trails, and governance views supply that evidence.
EU AI Act: risk classification, human oversight, transparency, and logging. Discovery, governance, continuous evals, and guardrails support these obligations.
NIST AI RMF: govern, map, measure, and manage AI risk. Discovery, governance, evals, and monitoring align to these functions.
DORA: operational resilience, incident detection, and third-party oversight. Observability, monitoring, and alerting are the relevant controls.
Data privacy (GLBA, GDPR, state laws like the Colorado AI Act and California DFPI activity): data protection, minimization, and access control. Pre-LLM guardrails for PII redaction, data lineage, and access policies apply here.
Internal governance commitments and third-party risk (FFIEC): the board-level and vendor controls a bank has already committed to, which apply regardless of which federal frame is in force.

The throughline holds even as the specific pegs shift: banks need to demonstrate visibility, control, traceability, and accountability for every agent in production. The forthcoming RFI is also a reason to build defensible controls now, since the firms engaging with that process are the ones whose real-world architecture will shape whatever replaces SR 11-7's coverage of GenAI.

How Arthur helps banks govern AI agents

Arthur was built for exactly this challenge: giving teams the visibility and control to move agents from experimental pilots to governed production systems. It spans both the discovery and governance layer and the development lifecycle that produces a governable agent in the first place.

Agent Discovery and Governance (ADG). Arthur's ADG platform automatically discovers and catalogs agents across fragmented environments like Vertex AI, Bedrock, and others, then brings them under a single control plane. It provides a unified policy framework, agnostic governance across clouds and frameworks, and customizable policies so each agent gets controls that fit its use case. Compliance teams can see the tools, models, data sources, and subagents each agent uses, and assign a clear owner for accountability.

The Agent Development Lifecycle (ADLC). Beyond discovery, Arthur covers the practices that make an agent enterprise-ready:

Observability and tracing built on OpenTelemetry and OpenInference, so governance tooling can discover agents and auditors can reconstruct any action.
Continuous evaluations that run on live traffic to catch hallucinations, off-topic answers, and wrong tool use before users do.
Guardrails for PII redaction, sensitive data blocking, prompt injection, hallucination detection, and toxicity, applied in real time. A major airline Arthur works with uses pre-LLM guardrails to redact PII from customer support conversations before they ever leave the corporate environment, a pattern that maps directly to banking compliance needs.
Discovery and governance views that surface an agent's full risk surface for compliance review.

Because Arthur runs natively inside your own cloud with a federated data plane and control plane, sensitive inference data, prompts, completions, retrieved documents, and PII stay inside your environment. Only lightweight, anonymized metrics flow to the control plane. For regulated industries, keeping production data in-VPC is often the difference between an agent that clears compliance review and one that does not.

What to look for when choosing tools

When evaluating tools to help your bank comply with AI regulations for agents, the questions that matter most are:

Does it discover agents automatically? Manual inventories miss shadow agents. Look for telemetry, MCP, network, and API-based discovery.
Is governance agnostic and unified? Controls should work across every cloud, framework, and model, and roll up into one policy framework rather than fragmenting per team.
Are policies customizable per use case? A KYC agent and a customer chatbot need different guardrails and evaluators.
Does it produce audit-ready evidence? You should be able to show traces, eval results, running guardrails, and named ownership during a review.
Are guardrails real-time? PII redaction and prompt injection defense need to run in the hot path, not after the fact.
Does sensitive data stay in your environment? A federated architecture that keeps inference data in your VPC is often essential for regulated workloads.

TLDR

AI agents shift the compliance question for banks from "is the model safe?" to "can we discover, observe, govern, and reconstruct every action an agent took?"
No single tool makes a bank compliant. Banks assemble a stack: agent discovery, governance and policy enforcement, observability and audit trails, continuous evaluation, guardrails, and model risk management.
As of the April 17, 2026 rewrite, SR 11-7 no longer covers generative or agentic AI. Agents are instead governed by consumer protection and fair lending law, the EU AI Act, NIST AI RMF, DORA, data privacy rules, and internal governance commitments, with an AI-specific RFI expected.
Arthur brings discovery, governance, observability, continuous evals, and real-time guardrails into one platform that runs inside your own cloud, keeping sensitive data in your environment.

Want to bring visibility and governance to your bank's agentic AI? Explore Arthur's Agent Discovery and Governance platform or book a demo with an AI expert.

‍