Agentic Legibility Stack

The Problem

When AI agents start acting on behalf of citizens with government services, who is accountable? If an agent applies for benefits incorrectly, submits the wrong data, or misunderstands eligibility rules — what happened, who authorized it, and how does the citizen get redress?

Current web-form-based services have implicit accountability: the citizen fills in the form themselves. Agentic services need explicit accountability infrastructure: formal service definitions, consent records, evidence trails, and human escalation paths.

Who authorized this?

Consent models track what data was shared, with whom, and for what purpose. Every data share is recorded.

What actually happened?

The Evidence Plane records every agent action in an append-only trace store. Nothing is silently edited or deleted.

What if something goes wrong?

Handoff to humans with full context. Redress pathways and complaint routes are part of every service definition.

Architecture

The stack enforces two invariants: all service calls go through a single CapabilityInvoker, and all LLM calls go through the AnthropicAdapter. These choke points are where policy, consent, and tracing are enforced.

Citizen Picks persona & agent

DOT Agent Cautious

or

MAX Agent Proactive

Single LLM choke point

AnthropicAdapter Claude claude-sonnet-4-5 + thinking

Single service choke point

CapabilityInvoker Policy • Consent • Trace • Receipt

Renew Driving Licence DVLA

Apply Universal Credit DWP

Check State Pension DWP

Become a Robot Mock

Evidence Plane Append-only SQLite — records every LLM call, policy evaluation, consent grant, service invocation, receipt, and handoff

A citizen talks to an agent (DOT or MAX). The agent calls Claude via the AnthropicAdapter. When it needs a government service, the call goes through the CapabilityInvoker, which evaluates policy rules, checks consent, and produces a receipt. The Evidence Plane sits underneath everything, recording all activity.

The 4-Artefact Pattern

Every government service is formally defined by four machine-readable JSON files. Together they answer: what does this service do, who is eligible, how does it progress, and what data is shared.

Capability Definition

manifest.json — WHAT

Defines what the service does, its inputs and outputs, SLAs, fees, redress pathways, and audit requirements.

{
  "id": "dvla.renew-driving-licence",
  "name": "Renew Driving Licence",
  "department": "DVLA",
  "constraints": {
    "fee": "14 GBP",
    "sla": "10 working days"
  }
}

Eligibility Rules

policy.json — WHO

Structured rules that determine eligibility, with conditions, failure messages, and alternative services.

{
  "rules": [{
    "id": "age-check",
    "condition": {
      "field": "age",
      "operator": ">=",
      "value": 18
    },
    "reason_if_failed": "Must be 18+"
  }]
}

State Transitions

state-model.json — HOW

Defines valid states and transitions. Terminal states (completed, rejected, handed-off) require receipts.

{
  "states": [
    "not-started",
    "identity-verified",
    "application-submitted",
    "completed"
  ],
  "transitions": [
    { "from": "not-started",
      "to": "identity-verified" }
  ]
}

Data Sharing

consent.json — WHY

Declares exactly what data is shared, from where, with whom, and for what purpose. Consent is per-session and revocable.

{
  "grants": [{
    "id": "identity-verification",
    "data_shared": [
      "full_name",
      "date_of_birth",
      "ni_number"
    ],
    "purpose": "Confirm identity"
  }]
}

These artefacts live in data/services/{service-name}/. They are the contract between the government service and any AI agent that wants to interact with it.

Evidence Plane

An append-only SQLite trace store that records every action the system takes. Nothing is silently edited or deleted. It enables full replay and audit of any agent session.

Event Type	What It Records
`llm.request` / `llm.response`	What was sent to and received from Claude
`policy.evaluated`	Eligibility check results (rules passed, failed, edge cases)
`consent.granted`	What data was shared, with whom, and why
`capability.invoked` / `capability.result`	Service invocation with timing and outcome
`receipt.issued`	Citizen-facing proof of what the agent did
`state.transition`	Movement through the service state model
`handoff.initiated`	When and why escalation to a human was triggered

Citizen Experience

Citizens see their own receipts, consent history, and a timeline of what the agent did on their behalf.

Legibility Studio

Administrators see the full trace explorer, replay engine, service definitions, gap analysis, and operational dashboards.

Service Ledger

The Service Ledger extends the Evidence Plane with case-level operational tracking. It provides a dashboard view of all active cases across services, with KPI metrics, bottleneck analysis, state progress flows, and case review workflows.

Case Tracking

Every citizen journey becomes a trackable case with state history, timeline views, and review capabilities.

Operational KPIs

Completion rates, average processing times, bottleneck identification, and sparkline trend charts per service.

Review Workflows

Cases can be reviewed, approved, reset, or escalated. Full audit trail maintained for every action.

Two Agent Personalities

The same architecture supports different interaction styles. DOT and MAX demonstrate the fundamental trade-off between transparency and efficiency.

DOT — The Cautious Agent

Asks permission before using personal data
Explains reasoning step by step
Requests explicit consent for each data share
Best for understanding the accountability architecture

MAX — The Proactive Agent

Auto-fills data without asking
Gets things done quickly and efficiently
Still records everything in the Evidence Plane
Best for seeing the efficiency vs. transparency trade-off

Both agents run on identical infrastructure. The difference is purely in their system prompt personality. The Evidence Plane captures the same events for both — the question is whether the citizen sees consent requests before or after the fact.

Demo Services & Personas

Four services and four citizen personas, each designed to exercise different parts of the architecture.

Services

DVLA

Renew Driving Licence

Fee: £14. SLA: 10 working days. Tests payment flow, photo submission, and Wallet credential sharing.

DWP

Apply Universal Credit

Complex eligibility with household income assessment, savings thresholds, and edge cases. Tests the policy engine thoroughly.

DWP

Check State Pension

Read-only forecast lookup. No payment, no application. Tests information retrieval and credential sharing.

Mock

Become a Robot

A mock demonstration service used for testing the architecture with intentionally absurd scenarios.

Personas

Emma & Liam

Young expecting couple, first baby. Good for benefits, family services, and driving scenarios.

Rajesh

Self-employed IT consultant, two children. Tests tax, child benefit, and self-employment edge cases.

Margaret

Retired, 71, managing health conditions. Tests pension, over-pension-age eligibility, and driving renewal for over-70s.

Priya

Tests interactive task card flows, form-based inputs, and the full Universal Credit journey with consent and state tracking.

Package Architecture

The monorepo is organized as 2 Next.js applications and 7 shared packages. Each package has a single responsibility.

Package	Responsibility
`@als/schemas`	Shared TypeScript types and interfaces for all data structures
`@als/runtime`	CapabilityInvoker, ServiceRegistry, HandoffManager — the orchestration layer
`@als/evidence`	SQLite trace store, trace emitter, receipt generator — the Evidence Plane
`@als/legibility`	PolicyEvaluator, StateMachine, ConsentManager — artefact interpretation
`@als/identity`	GOV.UK One Login and Wallet simulators
`@als/personal-data`	Two-tier data model: verified credentials + incidental data
`@als/adapters`	AnthropicAdapter, GOV.UK Content adapter, MCP adapter — external connectors

Applications

App	Port	Purpose
`citizen-experience`	3100	Citizen-facing chat with persona picker, agent selection, interactive task cards, consent flows, and journey tracking
`legibility-studio`	3101	Admin dashboard: service designer, evidence explorer, gap analysis, service ledger, and case management

The Legibility Studio fetches evidence data via HTTP from the citizen-experience API — it does not directly import the evidence package. This avoids native SQLite module bundling issues.

Getting Started

Prerequisites: Node.js 18+ and an Anthropic API key.

Clone the repository
git clone https://github.com/datadowns/agentic-legibility-stack.git
Install dependencies
npm install
Set your API key — create apps/citizen-experience/.env.local
ANTHROPIC_API_KEY=sk-ant-...your-key...
Seed demo trace data (optional)
npm run seed
Start both apps
npm run dev
Open the apps
localhost:3100 (Citizen Experience) • localhost:3101 (Legibility Studio)

See the full Getting Started Guide for a detailed walkthrough of personas, agent personalities, and features to try.