Skip to main content

RFC-004

AI Governance & Trust Architecture

Governance Through Infrastructure, Not Policy Documents

A proposal for governing AI systems through infrastructure, not policy documents — covering decision classification, authorization boundaries, audit trails, override mechanisms, and the operating model that keeps humans accountable for what AI does.


Motivation

Every organization adopting AI will eventually face the same question: who is responsible when the AI makes a wrong decision? Not a hallucinated email — a wrong decision that costs money, violates a regulation, or harms a customer.

The instinct is to write a policy. An "AI Ethics Framework" goes on the intranet. A governance board meets quarterly. Principles are drafted: fairness, transparency, accountability. The document is thorough, well-intentioned, and completely unenforceable.

ApproachWhat Happens
Policy documentPDF on the intranet. Nobody reads it. No mechanism to enforce it. Discovered during an audit — after the incident.
Ethics boardMeets quarterly. Reviews proposals in theory. Has no visibility into what's actually running in production.
Manual reviewWorks at low volume. Breaks when you have 50 agents processing 10,000 decisions per day.
No governanceFast until the first incident. Then everything stops while legal and compliance figure out what happened.

The problem is not a lack of principles. The problem is that governance without infrastructure is fiction. You can't audit what you don't log. You can't override what you don't control. You can't classify decisions you can't see.

This RFC proposes a governance architecture — the systems, APIs, and operational patterns that make AI governance enforceable, auditable, and real.

Governance is a system property, not a document. If your governance framework can't answer "what did the AI decide, why, and who approved it?" for any decision in the last 90 days within 5 minutes, it's not governance. It's theater.


The Governance Stack

AI governance operates in four layers. Each layer depends on the one below it — policy without controls is aspiration, controls without infrastructure are manual, infrastructure without observability is blind.

┌─────────────────────────────────────────────┐
│              POLICY LAYER                   │
│  Principles, risk appetite, decision rights │
├─────────────────────────────────────────────┤
│              CONTROL LAYER                  │
│  Decision classification, authorization,    │
│  approval workflows, escalation rules       │
├─────────────────────────────────────────────┤
│           INFRASTRUCTURE LAYER              │
│  Audit logs, kill switches, rate limits,    │
│  model gateway, prompt registry             │
├─────────────────────────────────────────────┤
│           OBSERVABILITY LAYER               │
│  Decision traces, drift detection,          │
│  compliance dashboards, anomaly alerts      │
└─────────────────────────────────────────────┘

What Each Layer Does

LayerOwnsExample
PolicyWhat AI is allowed to do and why"AI may not make final hiring decisions"
ControlHow the policy is enforced"Hiring recommendations require human approval before the offer is sent"
InfrastructureThe systems that enforce controlsAuthorization middleware rejects any AI action tagged hiring.offer without a human approval token
ObservabilityEvidence that controls are workingDashboard showing 100% of hiring recommendations were human-approved, with audit trail

The Policy-Infrastructure Gap

In practice, policy and infrastructure develop in parallel — and they pull in different directions.

What HappensWhy
Policy writes rules the infrastructure can't enforcePolicy teams don't know what's technically feasible. They mandate "all AI decisions must be explainable" without asking whether the model supports it.
Infrastructure builds controls nobody asked forEngineering teams anticipate requirements, build sophisticated monitoring — then discover the organization doesn't care about those metrics.
They drift apart over timePolicy gets updated after a board meeting. Infrastructure gets updated after an incident. Neither team tells the other.
Each side tries to steer the otherPolicy says "we need X." Engineering says "we can only do Y." The compromise satisfies neither and governs nothing.

This tension is permanent. It doesn't resolve with a kickoff meeting or a phased rollout. The stack diagram above is not a build sequence — it's a negotiation framework between two groups that will never fully agree.

What Actually Works

The organizations that make this work don't eliminate the tension. They manage it with a short feedback loop.

MechanismPurpose
Shared governance backlogPolicy and infrastructure teams work from the same prioritized list — not separate roadmaps
Monthly alignment reviewPolicy states what it needs enforced. Infrastructure states what it can enforce. The gap is visible and tracked.
Enforceability taggingEvery policy rule is tagged: enforced, monitored, or aspirational. No pretending.
Infrastructure-informed policyBefore a new rule is written, engineering provides a feasibility assessment — what it costs, how long, what trade-offs
Policy-informed infrastructureBefore a new control is built, governance confirms it maps to an actual requirement — no speculative tooling

The goal is not alignment — that implies agreement. The goal is visibility into the gap. When policy says "all T3 decisions require explainability" and infrastructure says "we can provide reasoning traces but not causal explanations," that disagreement should be documented, not buried.

The gap between policy and infrastructure is not a bug — it's the permanent state of governance. Policy will always want more than infrastructure can deliver. Infrastructure will always know things policy doesn't. The governance framework's job is to make the gap visible, small, and shrinking — not to pretend it doesn't exist.


Decision Classification

Not all AI decisions carry the same risk. A chatbot suggesting a help article is not the same as an agent approving a $50,000 purchase order. Governance should be proportional — light for low-risk decisions, rigorous for high-risk ones.

The Four Tiers

TierRisk LevelDescriptionExamples
T1InformationalAI generates content for human consumption — no action is taken automaticallyDrafting emails, summarizing documents, generating reports
T2OperationalAI takes routine actions within well-defined boundariesCategorizing support tickets, routing emails, extracting invoice data
T3ConsequentialAI makes decisions with financial, legal, or customer impactApproving expense reports, generating invoices, escalating compliance flags
T4CriticalAI makes decisions that are difficult or impossible to reverseTerminating access, submitting regulatory filings, executing financial transactions

Governance Requirements by Tier

RequirementT1T2T3T4
Audit loggingRequired (lightweight)RequiredRequiredRequired
Human approvalNoNoConfigurableAlways
Override mechanismNoYesYesYes + kill switch
ExplainabilityBest effortOn requestAutomaticAutomatic + review
Compliance reviewAnnualQuarterlyPer-changePer-change + legal sign-off
Incident responseStandardStandardExpeditedImmediate

Classification in Practice

Every AI agent or system is assigned a tier at deployment. The tier determines which controls apply. The tier is stored in the agent's metadata and enforced by the infrastructure layer.

agent:
  name: "invoice-processor"
  tier: T3  # consequential — financial impact
  controls:
    audit: required
    human_approval: "above_threshold"
    approval_threshold: 10000  # human approval for invoices > $10,000
    override: enabled
    explainability: automatic

Tier assignment is not self-service. It requires sign-off from a domain owner and a governance reviewer. Getting the tier wrong — classifying a T4 decision as T2 — is the most dangerous governance failure.

When in doubt, tier up. It's cheaper to over-govern a low-risk decision temporarily than to under-govern a high-risk one permanently. You can always reclassify downward after observation.

Autonomy Model

The tier classifies what is being decided. The autonomy model classifies how much human involvement exists in the decision. These are independent axes — a T3 decision with a human in the loop is a fundamentally different governance problem than a T3 decision made autonomously.

PatternDefinitionGovernance Implication
Human-in-the-loopAI recommends, human decides and executesAI is advisory. Governance focuses on recommendation quality and whether humans can meaningfully evaluate the output — not just rubber-stamp it.
Human-on-the-loopAI decides and executes, human monitors and can interveneAI has authority. Governance requires real-time monitoring, override capability, and evidence that human oversight is substantive.
Human-out-of-the-loopAI decides and executes autonomously, human reviews post-hocFull autonomy. Highest governance burden. Requires tight blast radius limits and high eval confidence.

The combination of tier and autonomy model determines the actual governance posture:

Human-in-the-loopHuman-on-the-loopHuman-out-of-the-loop
T1Minimal controlsMinimal controlsMinimal controls
T2Light controlsStandard controlsStandard controls
T3Standard controlsElevated controlsRequires explicit governance approval
T4Elevated controlsMaximum controlsNot permitted

T4 decisions are never human-out-of-the-loop. T3 decisions default to human-on-the-loop unless the governance lead explicitly approves full autonomy with documented justification.


Authorization & Boundaries

Every AI system operates within an authorization boundary — what it can access, what actions it can take, and what decisions it can make. These boundaries are enforced by infrastructure, not by the prompt.

The Authorization Model

BoundaryWhat It ControlsEnforcement
Data accessWhat data the AI can readAPI scopes, database row-level security, network segmentation
Action scopeWhat the AI can doTool allowlists per agent, action-level permissions
Decision scopeWhat the AI can decideTier-based approval gates, threshold limits
Blast radiusHow much damage a single decision can causeTransaction limits, rate limits, budget ceilings

Why Prompts Are Not Boundaries

A system prompt that says "do not access customer financial data" is not an access control. It's a suggestion. Prompts can be circumvented — by prompt injection, by model updates that change behavior, or by edge cases the prompt author didn't anticipate.

Enforcement MethodReliabilityUse For
System prompt instructionLow — advisory onlyGuiding agent behavior within its authorized scope
Application-layer validationMedium — can be bypassed by bugsInput/output filtering, format validation
Infrastructure-layer enforcementHigh — agent cannot circumventData access controls, action permissions, budget limits

The rule: anything that matters must be enforced at the infrastructure layer. The prompt shapes behavior. The infrastructure enforces boundaries. These are different things.

Treat AI authorization like service-to-service auth. The same way a microservice gets scoped IAM credentials — not root access and a note saying "please be careful" — an AI agent gets scoped tool access and data permissions. Zero trust applies to AI too.


Audit & Traceability

Every AI decision must be traceable. When a regulator, auditor, or incident responder asks "what happened and why," the answer must be available within minutes — not reconstructed from logs over weeks.

The Audit Record

Every AI decision produces an immutable audit record with these fields:

FieldDescriptionExample
decision_idUnique identifierdec-2026-02-14-a8f3c
agent_idWhich AI system made the decisioninvoice-processor-v3
tierDecision classificationT3
timestampWhen the decision was made2026-02-14T09:23:41Z
input_hashHash of the input datasha256:e3b0c44...
input_summaryHuman-readable summary of what was processed"Invoice #4821 from Acme Corp, $12,400"
decisionWhat the AI decidedapproved
reasoningAI-generated explanation of why"Amount matches PO #3921, vendor verified, within budget"
confidenceModel's confidence signal — treat as uncalibrated unless independently validated. Not a reliability metric.0.94 (uncalibrated)
human_reviewWhether a human reviewed this decisionnot_required (below threshold)
model_versionExact model and prompt version usedclaude-sonnet-4-5@prompt-v7
tools_invokedWhich tools were called[vendor_lookup, po_match, approve_invoice]
outcomeFinal outcome after any overridesapproved

Retention & Access

TierRetentionAccess
T130 daysOps team
T290 daysOps + compliance
T35 years (or regulatory requirement)Ops + compliance + legal
T47 years (or regulatory requirement)Ops + compliance + legal + auditors

Audit as a Product

The audit trail is not a log dump. It's a queryable, searchable system that answers questions like:

  • "Show me all T3 decisions made by the invoice processor in January where the amount exceeded $10,000"
  • "How many decisions were overridden by humans last quarter, and why?"
  • "Which agent has the highest override rate?"
  • "Show me the full decision chain for transaction #4821 — every agent that touched it, every decision made"

If you can't query it, it's not an audit trail — it's a log file. Structured, queryable audit records are the foundation of everything else in governance. This is the first infrastructure component to build once the policy mandate is in place.


Override & Kill Switches

Every AI system must have a mechanism for humans to intervene — from correcting a single decision to shutting down an entire agent fleet. The override architecture has three levels.

Override Levels

LevelScopeMechanismWho Can TriggerResponse Time
Decision overrideSingle decisionReject or modify a specific AI output before it takes effectProcess owner, reviewerReal-time
Agent pauseOne agentHalt processing for a specific agent, queue messages for laterOps team, on-call< 5 minutes
Fleet kill switchAll agentsShut down all AI decision-making, fall back to manual processesIncident commander< 1 minute

Design Requirements

RequirementWhy
Overrides are always availableNo scenario where "the system doesn't let me stop it"
Overrides are loggedEvery override becomes an audit record — who, when, why
Overrides don't require the AI system to cooperateKill switches operate at infrastructure level (stop containers, drain queues) — not by asking the agent to stop
Fallback procedures existWhen agents are stopped, the business process must continue manually — runbooks must be maintained
Override authority is pre-assignedDon't figure out who can pull the kill switch during an incident — define it in advance

The Kill Switch Problem

The most dangerous failure mode in AI governance is not an AI making a bad decision. It's an organization that can't stop the AI from continuing to make bad decisions because:

  1. Nobody knows who has authority to shut it down
  2. The shutdown mechanism requires access that's not available at 2 AM
  3. Shutting down one agent breaks downstream agents with no fallback
  4. The business has become so dependent on AI that stopping it causes more damage than the original problem

All four must be addressed before any T3 or T4 system goes live.

Practice the kill switch. Run a quarterly "AI fire drill" — trigger the fleet kill switch, verify agents stop, verify fallback processes activate, verify the business continues to operate. If you've never tested it, it doesn't work.


Drift Detection & Continuous Compliance

AI systems drift. Models get updated, prompts get tuned, input distributions shift, edge cases accumulate. A system that was compliant at deployment may not be compliant six months later — not because someone changed the rules, but because the world changed under it.

What Drifts

Drift TypeWhat ChangesHow to Detect
Model driftProvider updates the model — behavior shifts subtlyEval suite regression (run golden set weekly)
Prompt driftPrompts are edited without re-evaluationPrompt version tracking, mandatory eval on change
Input driftThe distribution of real-world inputs changes over timeStatistical monitoring on input features
Accuracy driftDecision quality degrades graduallyHuman spot-check sampling, downstream error rates
Scope driftThe agent starts being used for tasks it wasn't designed forAction and tool usage pattern monitoring

Continuous Compliance

ControlFrequencyAction on Failure
Eval suite (golden set)Weekly automated runAlert if accuracy drops > 2% from baseline
Human spot-checkOngoing (sample rate based on tier)Review failures, retrain or adjust prompt
Input distribution checkDaily automatedAlert if distribution shifts beyond threshold
Authorization auditMonthlyVerify agent permissions match current policy
Override rate monitoringContinuousInvestigate if override rate exceeds tier threshold (T2: 10%, T3: 3%, T4: 1%)
Cost anomaly detectionDailyAlert if cost exceeds 120% of 30-day rolling average (tighter for T3/T4)

Compliance is not a point-in-time assessment. It's continuous monitoring. A system that passed review in January may be non-compliant by March — not because someone broke a rule, but because the inputs changed, the model was updated, or the business rules evolved.


Vendor & Third-Party Model Governance

Most organizations don't train their own models. They consume third-party APIs where model updates happen without notice, pricing changes overnight, and the provider's safety policies may shift in ways that affect your compliance posture. The governance framework must account for the parts of the system you don't control.

What You Don't Control

RiskWhat HappensWhy It Matters
Silent model updatesProvider deploys a new model version — behavior changes subtlyYour eval baseline is invalidated. Decisions that were compliant yesterday may not be today.
Provider policy changesSafety filters, content policies, or rate limits changeWorkflows that relied on specific model behavior break or produce different outputs.
Provider outageAPI goes down during business hoursIf you have no fallback, your AI-dependent business processes stop.
Data handling changesProvider changes how they store, log, or use your prompt dataYour data governance posture changes without your knowledge.
Concentration risk90% of decisions flow through one providerA single vendor incident becomes an organizational incident.

Vendor Governance Requirements

RequirementWhat to Specify
Model change notificationContractual SLA for advance notice of model updates — minimum 30 days for major versions, 7 days for minor. Evaluate whether your contracts actually include this.
Eval-gated rolloverWhen a vendor updates a model, production traffic stays on the previous version until the eval suite passes on the new one. No automatic rollover.
Fallback strategyDefine per tier: T1/T2 may tolerate degraded service. T3/T4 need a fallback — a second provider, a cached model, or a manual process with defined activation criteria.
Data processing agreementWhere prompts are processed, how they're stored, whether they're used for training, and deletion SLAs. Reviewed annually or on contract renewal.
Vendor risk assessmentAnnual review of provider's security certifications (SOC 2, ISO 27001), incident history, and regulatory compliance posture.

You are accountable for decisions made by models you don't own. A regulator will not accept "the vendor updated the model" as an explanation for a compliance failure. Vendor governance is not optional — it's the gap between the infrastructure you control and the infrastructure you depend on.


Incident Response

When an AI system makes a consequential error — a wrong financial decision, a compliance violation, a customer harm — the response must be fast, structured, and different from a traditional software incident.

What's Different About AI Incidents

AspectTraditional SoftwareAI System
Root causeBug in code — deterministic, reproducibleProbabilistic — the same input might produce a different output tomorrow
Blast radiusUsually bounded by the bug's scopePotentially unbounded — if the model is wrong on one case, it may be wrong on similar cases
FixDeploy a code fixMay require prompt change, model rollback, or retraining — none of which are instant
RecurrenceFix the bug, it doesn't come backFix one failure mode, a new one may emerge

AI Incident Playbook

StepActionOwner
1. DetectAlert fires from observability layer — anomaly, override spike, compliance failureAutomated
2. ClassifyDetermine tier and blast radius — how many decisions are affected?On-call + governance lead
3. ContainPause the agent or trigger kill switch — stop further damageOn-call (pre-authorized)
4. AssessQuery audit trail — what decisions were made, how many, what's the impact?Incident team
5. RemediateCorrect affected decisions (reverse transactions, notify customers, file amended reports)Domain team + legal
6. Root causeAnalyze why — model behavior, prompt gap, input drift, missing validation?Engineering + prompt team
7. PreventUpdate controls — add validation rule, tighten tier, increase human review rateGovernance team
8. ReportDocument the incident, update the governance framework, share lessonsGovernance lead

AI incidents are not just engineering incidents. They may involve legal, compliance, and customer-facing teams. The incident playbook must include these stakeholders from the start — not as an afterthought when someone realizes there's a regulatory dimension.


The Governance Operating Model

Governance is not a one-time setup. It's an ongoing operating model with defined roles, cadences, and decision rights.

Roles

RoleResponsibilityScope
AI Governance LeadOwns the governance framework, chairs the review board, reports to leadershipOrganization-wide
Domain OwnersAccountable for AI decisions in their domain — they own the outcomesPer business function
Platform TeamBuilds and maintains governance infrastructure (audit, authorization, kill switches)Technical
Compliance & LegalRegulatory alignment, incident escalation, audit readinessAdvisory + approval
On-Call / OpsDay-to-day monitoring, first responder for incidents, override authorityOperational

Cadences

ActivityFrequencyParticipantsOutput
Governance review boardMonthlyGovernance lead, domain owners, complianceTier reviews, policy updates, incident retrospectives
Eval suite reviewWeeklyPlatform team, prompt engineersAccuracy trends, drift detection results
AI fire drillQuarterlyAll rolesKill switch test, fallback activation, response time measurement
Compliance auditAnnually (or per regulation)Governance lead, legal, external auditorsFormal compliance assessment
Incident retrospectiveAfter every T3/T4 incidentIncident team + governance leadRoot cause analysis, control updates

Decision Rights

DecisionWho DecidesWho Approves
Deploy a new AI agentPlatform teamDomain owner + governance lead
Assign a decision tierDomain ownerGovernance lead
Change a prompt for T3/T4 agentPrompt engineerDomain owner + eval suite pass
Override a single AI decisionProcess ownerSelf (logged)
Pause an agentOn-callSelf (logged, notify governance lead)
Fleet kill switchIncident commanderSelf (logged, immediate notification to leadership)
Reclassify a tier downwardDomain ownerGovernance lead + compliance

Governance that nobody owns is governance that nobody follows. Assign a named human to the governance lead role. Give them authority to pause deployments, require tier changes, and block agents that don't meet governance requirements. Without authority, the role is decoration.

Back to RFCs