Reference Architecture

Agentic Legibility Stack

A reference architecture for accountable AI agents in UK government services.

A Turborepo monorepo with 2 Next.js apps and 7 shared packages, demonstrating how AI agents could interact with government services while maintaining transparency, accountability, and citizen control. This is a prototype — not a live government tool.

TypeScript Next.js 15 Turborepo SQLite Claude API MCP React 19

The Problem

When AI agents start acting on behalf of citizens with government services, who is accountable? If an agent applies for benefits incorrectly, submits the wrong data, or misunderstands eligibility rules — what happened, who authorized it, and how does the citizen get redress?

Current web-form-based services have implicit accountability: the citizen fills in the form themselves. Agentic services need explicit accountability infrastructure: formal service definitions, consent records, evidence trails, and human escalation paths.

Who authorized this?

Consent models track what data was shared, with whom, and for what purpose. Every data share is recorded.

What actually happened?

The Evidence Plane records every agent action in an append-only trace store. Nothing is silently edited or deleted.

What if something goes wrong?

Handoff to humans with full context. Redress pathways and complaint routes are part of every service definition.

Architecture

The stack enforces two invariants: all service calls go through a single CapabilityInvoker, and all LLM calls go through the AnthropicAdapter. These choke points are where policy, consent, and tracing are enforced.

Citizen Picks persona & agent
DOT Agent Cautious
or
MAX Agent Proactive
Single LLM choke point
AnthropicAdapter Claude claude-sonnet-4-5 + thinking
Single service choke point
CapabilityInvoker Policy • Consent • Trace • Receipt
Renew Driving Licence DVLA
Apply Universal Credit DWP
Check State Pension DWP
Become a Robot Mock
Evidence Plane Append-only SQLite — records every LLM call, policy evaluation, consent grant, service invocation, receipt, and handoff

A citizen talks to an agent (DOT or MAX). The agent calls Claude via the AnthropicAdapter. When it needs a government service, the call goes through the CapabilityInvoker, which evaluates policy rules, checks consent, and produces a receipt. The Evidence Plane sits underneath everything, recording all activity.

The 4-Artefact Pattern

Every government service is formally defined by four machine-readable JSON files. Together they answer: what does this service do, who is eligible, how does it progress, and what data is shared.

Capability Definition

manifest.json — WHAT

Defines what the service does, its inputs and outputs, SLAs, fees, redress pathways, and audit requirements.

{
  "id": "dvla.renew-driving-licence",
  "name": "Renew Driving Licence",
  "department": "DVLA",
  "constraints": {
    "fee": "14 GBP",
    "sla": "10 working days"
  }
}
Eligibility Rules

policy.json — WHO

Structured rules that determine eligibility, with conditions, failure messages, and alternative services.

{
  "rules": [{
    "id": "age-check",
    "condition": {
      "field": "age",
      "operator": ">=",
      "value": 18
    },
    "reason_if_failed": "Must be 18+"
  }]
}
State Transitions

state-model.json — HOW

Defines valid states and transitions. Terminal states (completed, rejected, handed-off) require receipts.

{
  "states": [
    "not-started",
    "identity-verified",
    "application-submitted",
    "completed"
  ],
  "transitions": [
    { "from": "not-started",
      "to": "identity-verified" }
  ]
}
Data Sharing

consent.json — WHY

Declares exactly what data is shared, from where, with whom, and for what purpose. Consent is per-session and revocable.

{
  "grants": [{
    "id": "identity-verification",
    "data_shared": [
      "full_name",
      "date_of_birth",
      "ni_number"
    ],
    "purpose": "Confirm identity"
  }]
}
These artefacts live in data/services/{service-name}/. They are the contract between the government service and any AI agent that wants to interact with it.

Evidence Plane

An append-only SQLite trace store that records every action the system takes. Nothing is silently edited or deleted. It enables full replay and audit of any agent session.

Event Type What It Records
llm.request / llm.responseWhat was sent to and received from Claude
policy.evaluatedEligibility check results (rules passed, failed, edge cases)
consent.grantedWhat data was shared, with whom, and why
capability.invoked / capability.resultService invocation with timing and outcome
receipt.issuedCitizen-facing proof of what the agent did
state.transitionMovement through the service state model
handoff.initiatedWhen and why escalation to a human was triggered

Citizen Experience

Citizens see their own receipts, consent history, and a timeline of what the agent did on their behalf.

Legibility Studio

Administrators see the full trace explorer, replay engine, service definitions, gap analysis, and operational dashboards.

Service Ledger

The Service Ledger extends the Evidence Plane with case-level operational tracking. It provides a dashboard view of all active cases across services, with KPI metrics, bottleneck analysis, state progress flows, and case review workflows.

Case Tracking

Every citizen journey becomes a trackable case with state history, timeline views, and review capabilities.

Operational KPIs

Completion rates, average processing times, bottleneck identification, and sparkline trend charts per service.

Review Workflows

Cases can be reviewed, approved, reset, or escalated. Full audit trail maintained for every action.

Two Agent Personalities

The same architecture supports different interaction styles. DOT and MAX demonstrate the fundamental trade-off between transparency and efficiency.

DOT — The Cautious Agent

  • Asks permission before using personal data
  • Explains reasoning step by step
  • Requests explicit consent for each data share
  • Best for understanding the accountability architecture

MAX — The Proactive Agent

  • Auto-fills data without asking
  • Gets things done quickly and efficiently
  • Still records everything in the Evidence Plane
  • Best for seeing the efficiency vs. transparency trade-off
Both agents run on identical infrastructure. The difference is purely in their system prompt personality. The Evidence Plane captures the same events for both — the question is whether the citizen sees consent requests before or after the fact.

Demo Services & Personas

Four services and four citizen personas, each designed to exercise different parts of the architecture.

Services

DVLA

Renew Driving Licence

Fee: £14. SLA: 10 working days. Tests payment flow, photo submission, and Wallet credential sharing.

DWP

Apply Universal Credit

Complex eligibility with household income assessment, savings thresholds, and edge cases. Tests the policy engine thoroughly.

DWP

Check State Pension

Read-only forecast lookup. No payment, no application. Tests information retrieval and credential sharing.

Mock

Become a Robot

A mock demonstration service used for testing the architecture with intentionally absurd scenarios.

Personas

Emma & Liam

Young expecting couple, first baby. Good for benefits, family services, and driving scenarios.

Rajesh

Self-employed IT consultant, two children. Tests tax, child benefit, and self-employment edge cases.

Margaret

Retired, 71, managing health conditions. Tests pension, over-pension-age eligibility, and driving renewal for over-70s.

Priya

Tests interactive task card flows, form-based inputs, and the full Universal Credit journey with consent and state tracking.

Package Architecture

The monorepo is organized as 2 Next.js applications and 7 shared packages. Each package has a single responsibility.

PackageResponsibility
@als/schemasShared TypeScript types and interfaces for all data structures
@als/runtimeCapabilityInvoker, ServiceRegistry, HandoffManager — the orchestration layer
@als/evidenceSQLite trace store, trace emitter, receipt generator — the Evidence Plane
@als/legibilityPolicyEvaluator, StateMachine, ConsentManager — artefact interpretation
@als/identityGOV.UK One Login and Wallet simulators
@als/personal-dataTwo-tier data model: verified credentials + incidental data
@als/adaptersAnthropicAdapter, GOV.UK Content adapter, MCP adapter — external connectors

Applications

AppPortPurpose
citizen-experience3100Citizen-facing chat with persona picker, agent selection, interactive task cards, consent flows, and journey tracking
legibility-studio3101Admin dashboard: service designer, evidence explorer, gap analysis, service ledger, and case management
The Legibility Studio fetches evidence data via HTTP from the citizen-experience API — it does not directly import the evidence package. This avoids native SQLite module bundling issues.

Getting Started

Prerequisites: Node.js 18+ and an Anthropic API key.

  1. Clone the repository
    git clone https://github.com/datadowns/agentic-legibility-stack.git
  2. Install dependencies
    npm install
  3. Set your API key — create apps/citizen-experience/.env.local
    ANTHROPIC_API_KEY=sk-ant-...your-key...
  4. Seed demo trace data (optional)
    npm run seed
  5. Start both apps
    npm run dev
  6. Open the apps
    localhost:3100 (Citizen Experience) • localhost:3101 (Legibility Studio)

See the full Getting Started Guide for a detailed walkthrough of personas, agent personalities, and features to try.