A reference architecture for accountable AI agents in UK government services.
A Turborepo monorepo with 2 Next.js apps and 7 shared packages, demonstrating how AI agents could interact with government services while maintaining transparency, accountability, and citizen control. This is a prototype — not a live government tool.
When AI agents start acting on behalf of citizens with government services, who is accountable? If an agent applies for benefits incorrectly, submits the wrong data, or misunderstands eligibility rules — what happened, who authorized it, and how does the citizen get redress?
Current web-form-based services have implicit accountability: the citizen fills in the form themselves. Agentic services need explicit accountability infrastructure: formal service definitions, consent records, evidence trails, and human escalation paths.
Consent models track what data was shared, with whom, and for what purpose. Every data share is recorded.
The Evidence Plane records every agent action in an append-only trace store. Nothing is silently edited or deleted.
Handoff to humans with full context. Redress pathways and complaint routes are part of every service definition.
The stack enforces two invariants: all service calls go through a single CapabilityInvoker, and all LLM calls go through the AnthropicAdapter. These choke points are where policy, consent, and tracing are enforced.
A citizen talks to an agent (DOT or MAX). The agent calls Claude via the AnthropicAdapter. When it needs a government service, the call goes through the CapabilityInvoker, which evaluates policy rules, checks consent, and produces a receipt. The Evidence Plane sits underneath everything, recording all activity.
Every government service is formally defined by four machine-readable JSON files. Together they answer: what does this service do, who is eligible, how does it progress, and what data is shared.
Defines what the service does, its inputs and outputs, SLAs, fees, redress pathways, and audit requirements.
{
"id": "dvla.renew-driving-licence",
"name": "Renew Driving Licence",
"department": "DVLA",
"constraints": {
"fee": "14 GBP",
"sla": "10 working days"
}
}
Structured rules that determine eligibility, with conditions, failure messages, and alternative services.
{
"rules": [{
"id": "age-check",
"condition": {
"field": "age",
"operator": ">=",
"value": 18
},
"reason_if_failed": "Must be 18+"
}]
}
Defines valid states and transitions. Terminal states (completed, rejected, handed-off) require receipts.
{
"states": [
"not-started",
"identity-verified",
"application-submitted",
"completed"
],
"transitions": [
{ "from": "not-started",
"to": "identity-verified" }
]
}
Declares exactly what data is shared, from where, with whom, and for what purpose. Consent is per-session and revocable.
{
"grants": [{
"id": "identity-verification",
"data_shared": [
"full_name",
"date_of_birth",
"ni_number"
],
"purpose": "Confirm identity"
}]
}
data/services/{service-name}/. They are the contract
between the government service and any AI agent that wants to interact with it.
An append-only SQLite trace store that records every action the system takes. Nothing is silently edited or deleted. It enables full replay and audit of any agent session.
| Event Type | What It Records |
|---|---|
llm.request / llm.response | What was sent to and received from Claude |
policy.evaluated | Eligibility check results (rules passed, failed, edge cases) |
consent.granted | What data was shared, with whom, and why |
capability.invoked / capability.result | Service invocation with timing and outcome |
receipt.issued | Citizen-facing proof of what the agent did |
state.transition | Movement through the service state model |
handoff.initiated | When and why escalation to a human was triggered |
Citizens see their own receipts, consent history, and a timeline of what the agent did on their behalf.
Administrators see the full trace explorer, replay engine, service definitions, gap analysis, and operational dashboards.
The Service Ledger extends the Evidence Plane with case-level operational tracking. It provides a dashboard view of all active cases across services, with KPI metrics, bottleneck analysis, state progress flows, and case review workflows.
Every citizen journey becomes a trackable case with state history, timeline views, and review capabilities.
Completion rates, average processing times, bottleneck identification, and sparkline trend charts per service.
Cases can be reviewed, approved, reset, or escalated. Full audit trail maintained for every action.
The same architecture supports different interaction styles. DOT and MAX demonstrate the fundamental trade-off between transparency and efficiency.
Four services and four citizen personas, each designed to exercise different parts of the architecture.
Fee: £14. SLA: 10 working days. Tests payment flow, photo submission, and Wallet credential sharing.
Complex eligibility with household income assessment, savings thresholds, and edge cases. Tests the policy engine thoroughly.
Read-only forecast lookup. No payment, no application. Tests information retrieval and credential sharing.
A mock demonstration service used for testing the architecture with intentionally absurd scenarios.
Young expecting couple, first baby. Good for benefits, family services, and driving scenarios.
Self-employed IT consultant, two children. Tests tax, child benefit, and self-employment edge cases.
Retired, 71, managing health conditions. Tests pension, over-pension-age eligibility, and driving renewal for over-70s.
Tests interactive task card flows, form-based inputs, and the full Universal Credit journey with consent and state tracking.
The monorepo is organized as 2 Next.js applications and 7 shared packages. Each package has a single responsibility.
| Package | Responsibility |
|---|---|
@als/schemas | Shared TypeScript types and interfaces for all data structures |
@als/runtime | CapabilityInvoker, ServiceRegistry, HandoffManager — the orchestration layer |
@als/evidence | SQLite trace store, trace emitter, receipt generator — the Evidence Plane |
@als/legibility | PolicyEvaluator, StateMachine, ConsentManager — artefact interpretation |
@als/identity | GOV.UK One Login and Wallet simulators |
@als/personal-data | Two-tier data model: verified credentials + incidental data |
@als/adapters | AnthropicAdapter, GOV.UK Content adapter, MCP adapter — external connectors |
| App | Port | Purpose |
|---|---|---|
citizen-experience | 3100 | Citizen-facing chat with persona picker, agent selection, interactive task cards, consent flows, and journey tracking |
legibility-studio | 3101 | Admin dashboard: service designer, evidence explorer, gap analysis, service ledger, and case management |
Prerequisites: Node.js 18+ and an Anthropic API key.
git clone https://github.com/datadowns/agentic-legibility-stack.git
npm install
apps/citizen-experience/.env.localANTHROPIC_API_KEY=sk-ant-...your-key...
npm run seed
npm run dev
See the full Getting Started Guide for a detailed walkthrough of personas, agent personalities, and features to try.