Sardis

Who Owns Accountability When an AI Agent Moves Money?

When an AI agent pays the wrong vendor, overspends, or gets exploited -- who is responsible? The operator, the model provider, the framework, or the agent itself?

An AI agent, acting on behalf of a company, pays $14,000 to a vendor that does not exist. The invoice was fabricated by a prompt injection attack embedded in an email the agent processed. The money is gone. Who is responsible?

The Accountability Gap

In traditional finance, accountability is clear. An employee makes a payment -- the employee is responsible. A payment processor moves funds -- the processor is responsible. The chain of custody is well-defined, regulated, and insured.

AI agents break this model. An agent is not an employee. It does not have legal personhood. It cannot be fired, sued, or held liable. Yet it can move real money at the speed of an API call.

This creates a gap that no one in the current stack is designed to fill:

  • The model provider (OpenAI, Anthropic, Google) -- They provide the reasoning engine. Their terms of service explicitly state they are not responsible for actions taken based on model outputs.
  • The agent framework (LangChain, CrewAI, AutoGPT) -- They provide orchestration. They do not govern what tools do with money.
  • The operator (you) -- You gave the agent access to a wallet. But did you define what it could and could not pay? Did you set limits?
  • The agent itself -- It has no legal standing. It cannot own assets, sign contracts, or be held liable.

The Real Question: Who Defined the Guardrails?

Accountability in the agent economy is not about blame -- it is about who defined the constraints under which the agent operated. If an agent overspends, the question is not "why did the agent do this?" but rather "why was the agent allowed to do this?"

  • An agent that could spend $14,000 because no one set a limit -- the operator is accountable.
  • An agent that was limited to $500/day but a bug bypassed the check -- the infrastructure provider is accountable.
  • An agent that was properly constrained but the policy was ambiguous -- a design flaw that needs better tooling.

The Infrastructure Layer Owns the Enforcement

Our position at Sardis is clear: accountability must be enforced at the infrastructure level, not at the application level.

If you enforce spending limits in your application code, the agent can potentially reason its way around them. If you enforce them in a policy engine that the agent cannot access, modify, or influence -- the rules hold.

The principle: The agent proposes. The infrastructure disposes. The human defines the rules. Every decision is recorded with cryptographic evidence. Accountability is distributed across these three layers -- not dumped on the operator after the fact.

What Good Accountability Looks Like

1. Policy-as-Code

Every spending rule is versioned, auditable, and enforced at the infrastructure level. When something goes wrong, you can point to the exact policy version that was active and the exact check that passed or failed.

2. Cryptographic Evidence

Every transaction produces a signed attestation envelope -- a tamper-evident receipt that includes the policy snapshot, evaluation results, agent identity, and a Merkle proof. This is courtroom-grade evidence.

3. Separation of Concerns

The agent decides what to buy. The policy decides whether it is allowed. The infrastructure decides how the payment executes. The human defines the rules and reviews the evidence.

4. Kill Switch as a First-Class Primitive

If something goes wrong, you need to stop it immediately. One API call. Under 100ms. Five scopes of freeze: agent, wallet, rail, chain, or global.

The Regulatory Landscape

Regulators are watching. The EU AI Act classifies AI systems by risk level. Financial AI agents will almost certainly be classified as high-risk. The companies that will survive regulatory scrutiny can demonstrate:

  • Clear policy definition (who set the rules?)
  • Deterministic enforcement (were the rules followed?)
  • Complete audit trail (can you prove it?)
  • Human oversight mechanisms (could a human intervene?)

The Bottom Line

When an AI agent moves money, accountability is shared across the stack. The model provider is accountable for reasoning quality. The operator is accountable for policy definition. The infrastructure is accountable for enforcement. And every layer needs cryptographic evidence to prove it did its job.

The companies that figure this out will build the financial backbone of the agent economy. The ones that do not will be the case studies in why they should have.


Written by Efe Baran Durmaz, Founder @ Sardis