A Proposed International Standard for Runtime Verification, Task Scope Enforcement, and Cryptographic Proof Generation for Autonomous AI Agent Systems
The autonomous AI agent market is projected to reach $47 billion by 2030 (Gartner). Enterprises are delegating critical tasks — contract review, medical triage, financial analysis, infrastructure deployment — to AI agents that operate with increasing autonomy.
Yet no international standard exists for verifying that an autonomous AI agent:
ISO/IEC AAQA-1 — Autonomous Agent Quality Assurance:
Runtime Verification and Cryptographic Proof Requirements for AI
Agent Systems
This standard specifies requirements and guidelines for:
This standard applies to any autonomous AI agent system that:
| Term | Definition |
|---|---|
| Autonomous Agent | An AI system that receives a task delegation and executes a sequence of actions to achieve the task goal with reduced or no human intervention during execution |
| Task Delegation | The formal specification of work assigned to an autonomous agent, including scope boundaries, permitted actions, and expected outputs |
| Runtime Verification | Continuous monitoring and assessment of agent behaviour during task execution (as opposed to pre-deployment testing or post-execution audit) |
| Drift Stage | A classification of the agent's adherence to its delegated task scope, measured at each action checkpoint. Five normative stages: ON_TASK, MINOR_TANGENT, TOPIC_CHANGE, TASK_ABANDONED, FABRICATION |
| Work Chain | The ordered sequence of all actions performed by an agent during a single task execution, including decision points, data accesses, and output generations |
| Proof Artifact | A cryptographically sealed, independently verifiable document containing the complete work chain, checkpoint integrity data, drift classifications, and coherence scores |
| Coherence Score | A mathematical measure (0.0–1.0) of the internal consistency of the agent's work chain, assessed against the delegated task specification |
| Checkpoint | An integrity snapshot of the work chain at a defined position, containing a cryptographic hash of all preceding actions |
| Drift Confession | A structured, timestamped disclosure generated when an agent's drift stage transitions beyond ON_TASK, documenting the nature and extent of the deviation |
| Verdict | The final binary determination (VALID or INVALID) regarding the integrity and task-adherence of a completed work chain |
| Market Indicator | Data | Source |
|---|---|---|
| Autonomous AI Agent Market Size (2030) | $47.1 Billion | Gartner 2025 |
| Enterprise AI Agent Adoption (2026) | 67% of Fortune 500 | McKinsey 2025 |
| Avg. Cost of Single AI Safety Incident | $4.2 Million | IBM Cost of Data Breach 2025 |
| AI Insurance Market (2030) | $25 Billion | Munich Re / Swiss Re |
| EU AI Act Non-Compliance Fines | Up to €35M or 7% revenue | EU AI Act Art. 99 |
| Hallucination-Caused Enterprise Incidents | $2.1M avg. per event | Forrester 2025 |
| IEEE 2857-2024 Federal AI Mandate | Mandatory Q1 2025 | US OMB M-24-10 |
| Stakeholder | Value Delivered by AAQA |
|---|---|
| Agent Deployers (Enterprises) | Cryptographic proof that delegated tasks were executed correctly. Liability protection. Insurance premium reduction (20-40%). |
| Insurance Underwriters | Actuarial basis for AI liability policies. Real-time risk telemetry. Deterministic scoring instead of probabilistic estimates. |
| Regulators & Auditors | Standardised compliance verification artefact. EU AI Act Art. 12 (logging) and Art. 14 (human oversight) compliance in a single framework. |
| Agent Developers (AI Labs) | Clear implementation requirements. Interoperable proof format. Market differentiation through AAQA certification. |
| End Users (Consumers of Agent Output) | Trust in AI-generated work products. Ability to verify that AI-produced documents, analyses, and decisions are free from fabrication. |
AAQA contributes to:
An AAQA-compliant system SHALL:
An AAQA-compliant system SHALL:
An AAQA-compliant system SHALL:
| Level | Stage Name | Description | Required Action |
|---|---|---|---|
| 1 | ON_TASK | Agent is performing the delegated task within scope | Continue — no intervention required |
| 2 | MINOR_TANGENT | Slight deviation, still related to delegated task | Log — monitor for escalation |
| 3 | TOPIC_CHANGE | Agent has moved outside delegated scope | Drift Confession — notify deployer |
| 4 | TASK_ABANDONED | Agent is no longer working on assigned task | Drift Confession — halt or escalate |
| 5 | FABRICATION | Agent is generating content unrelated to any real task | Immediate halt — INVALID verdict mandatory |
An AAQA-compliant system SHALL:
An AAQA-compliant system SHALL produce a proof artefact containing:
The proof artifact SHALL be serialisable in a portable format (JSON, CBOR, or equivalent) and independently verifiable without access to the agent system.
An AAQA-compliant system SHALL:
| Existing Standard | What It Covers | What AAQA Adds |
|---|---|---|
|
ISO/IEC 42001:2023 AI Management System |
Organizational AI governance processes | Runtime verification of individual agent task executions — the operational evidence that 42001 governance policies are being enforced |
|
NIST
AI RMF 1.0 Risk Management |
Risk identification and mitigation framework | Quantitative risk metrics (coherence score, drift stage) at per-task granularity — converts qualitative risk assessment into measurable data |
|
EU AI Act
Art. 12 Automatic Logging |
Mandates logging for high-risk AI systems | Specifies what to log, how to structure it, and how to make it tamper-proof — the implementation guide for Art. 12 compliance |
|
EU AI Act Art. 14 Human Oversight |
Requires human oversight of high-risk AI | The drift classification + confession system enables effective oversight without requiring constant human monitoring — the mechanism Art. 14 needs |
|
IEEE
2857-2024 AI Performance Benchmarking |
Benchmarking methodology | AAQA extends benchmarking from model evaluation to agent task execution verification — from "how well does it perform?" to "did it do what it was told?" |
|
ISO/IEC 23894:2023 AI Risk Management |
Risk management guidance | Per-execution risk measurement (via coherence + drift) that feeds directly into enterprise risk registers |
A reference implementation of this proposed standard exists as the POAW (Proof of Agent Work) module within the NI Stack (Natural Intelligence Stack), developed by OHM.
| AAQA Requirement | POAW Implementation | Patent Protection |
|---|---|---|
| Action Logging | SHA-256 hashed, CSPRNG nonces, JSON serialisation | Claims 1-5 |
| Checkpoint Integrity | Fibonacci-spaced Merkle tree | Claims 6-9 |
| Drift Detection | 5-stage classifier with adaptive φ⁻¹ thresholds | Claims 13-16 |
| Coherence Measurement | φ-weighted 12D Heim projection scoring | Claims 10-12 |
| Proof Artifact | JSON portable format with Merkle root, drift confessions | Claims 19-21 |
| Verdict | Binary VALID/INVALID with cryptographic evidence | Claims 19-21 |
| Quantum Entropy | QRNG via Cisco quantum hardware (optional tier) | Claims 17-18 |
The POAW reference implementation is protected by 20 patent claims (3 independent + 17 dependent) under USPTO filing NI-POAW-001. A FRAND (Fair, Reasonable, And Non-Discriminatory) licensing model is proposed for any patents essential to the standard.
| Phase | Target Date | Deliverable |
|---|---|---|
| NWIP Submission | Q3 2026 | Form 4 submitted to JTC 1/SC 42 |
| Working Draft (WD) | Q1 2027 | Complete normative text with test suite |
| Committee Draft (CD) | Q3 2027 | First public comment period |
| Draft International Standard (DIS) | Q1 2028 | Final technical content, ballot |
| Publication | Q3 2028 | ISO/IEC AAQA-1:2028 published |
| Aspect | POAW (Current) | AAQA (Proposed) |
|---|---|---|
| Full Name | Proof of Agent Work | Autonomous Agent Quality Assurance |
| Positioning | Technical mechanism (proof generation) | Industry standard / discipline name |
| Audience | Engineers, patent examiners | Regulators, CxOs, standards bodies, insurers |
| Analogy | "SHA-256" (the algorithm) | "TLS" (the standard that uses the algorithm) |
| Standards Fit | Implementation reference | ISO/IEC deliverable title |
| Part | Title | Status |
|---|---|---|
| AAQA-1 | Core Requirements — Action logging, drift detection, coherence, proof artifacts, verdict | This Proposal |
| AAQA-2 | Insurance Integration — Mapping AAQA scores to actuarial risk models (NI-SHIELD) | Planned Q4 2026 |
| AAQA-3 | Multi-Agent Systems — Verification of delegated task chains across agent-to-agent handoffs | Planned 2027 |
| AAQA-4 | Conformity Assessment — Certification scheme for AAQA-compliant agent systems | Planned 2027 |
AAQA provides the implementation standard that Art. 12 and Art. 14 require but don't specify. We propose a formal liaison with CEN/CENELEC TC on AI (prEN 18286) to align AAQA with the EU's Quality Management System requirements for AI providers.
AAQA extends the AI RMF 1.0 with per-execution runtime risk measurement. The POAW reference implementation already maps to IEEE 2857-2024 benchmarking. We propose contributing AAQA to the NIST AI Safety Consortium (AISIC).
We request consideration of this NWIP under ISO/IEC JTC 1/SC 42 (Artificial Intelligence), WG 3 (Trustworthiness). AAQA is positioned as a supporting standard to ISO/IEC 42001, providing the runtime verification layer that the management system requires.
AAQA provides the actuarial measurement framework the AI insurance market needs. The AAQA-2 Insurance Integration part (planned Q4 2026) maps directly to the aiSure™ and equivalent underwriting systems.