NWIP · Pre-Submission Draft

AAQA

A Proposed International Standard for Runtime Verification, Task Scope Enforcement, and Cryptographic Proof Generation for Autonomous AI Agent Systems

Proposed Standard: ISO/IEC AAQA-1:2027 Target Committee: ISO/IEC JTC 1/SC 42 (Artificial Intelligence) Date: 6 March 2026 Status: Pre-NWIP Draft v1.0
ISO/IEC 42001 Extension EU AI Act Art. 12 & 14 Compliance NIST AI RMF 1.0 Compatible
Executive Summary

The Missing Standard: Nobody Verifies What AI Agents Actually Do

The autonomous AI agent market is projected to reach $47 billion by 2030 (Gartner). Enterprises are delegating critical tasks — contract review, medical triage, financial analysis, infrastructure deployment — to AI agents that operate with increasing autonomy.

Yet no international standard exists for verifying that an autonomous AI agent:

The Gap: ISO 42001 certifies AI management systems. EU AI Act mandates logging. NIST RMF addresses risk. But none of these answer the fundamental question an agent deployer must answer: "Can I prove my agent only did what I told it to do?"

AAQA fills this gap. It defines the requirements for runtime task verification, delegation scope enforcement, and cryptographic proof generation for autonomous AI agents.
Section 1 · ISO/IEC Form 4 — Element 1

Title of the Proposed Standard

ISO/IEC AAQA-1 — Autonomous Agent Quality Assurance:
Runtime Verification and Cryptographic Proof Requirements for AI Agent Systems

1.1 Scope

This standard specifies requirements and guidelines for:

1.2 Applicability

This standard applies to any autonomous AI agent system that:

Explicitly in scope: AI coding agents, document processing agents, medical AI assistants, financial analysis agents, autonomous customer service agents, infrastructure management agents, multi-agent orchestration systems.
Explicitly out of scope: Traditional ML models without agency, rule-based automation systems, simple chatbot interfaces without autonomous action capability.
Section 2 · Normative Definitions

Terms and Definitions

Term Definition
Autonomous Agent An AI system that receives a task delegation and executes a sequence of actions to achieve the task goal with reduced or no human intervention during execution
Task Delegation The formal specification of work assigned to an autonomous agent, including scope boundaries, permitted actions, and expected outputs
Runtime Verification Continuous monitoring and assessment of agent behaviour during task execution (as opposed to pre-deployment testing or post-execution audit)
Drift Stage A classification of the agent's adherence to its delegated task scope, measured at each action checkpoint. Five normative stages: ON_TASK, MINOR_TANGENT, TOPIC_CHANGE, TASK_ABANDONED, FABRICATION
Work Chain The ordered sequence of all actions performed by an agent during a single task execution, including decision points, data accesses, and output generations
Proof Artifact A cryptographically sealed, independently verifiable document containing the complete work chain, checkpoint integrity data, drift classifications, and coherence scores
Coherence Score A mathematical measure (0.0–1.0) of the internal consistency of the agent's work chain, assessed against the delegated task specification
Checkpoint An integrity snapshot of the work chain at a defined position, containing a cryptographic hash of all preceding actions
Drift Confession A structured, timestamped disclosure generated when an agent's drift stage transitions beyond ON_TASK, documenting the nature and extent of the deviation
Verdict The final binary determination (VALID or INVALID) regarding the integrity and task-adherence of a completed work chain
Section 3 · Purpose and Justification

Why This Standard Is Urgently Needed

3.1 Verified Market Need

Market Indicator Data Source
Autonomous AI Agent Market Size (2030) $47.1 Billion Gartner 2025
Enterprise AI Agent Adoption (2026) 67% of Fortune 500 McKinsey 2025
Avg. Cost of Single AI Safety Incident $4.2 Million IBM Cost of Data Breach 2025
AI Insurance Market (2030) $25 Billion Munich Re / Swiss Re
EU AI Act Non-Compliance Fines Up to €35M or 7% revenue EU AI Act Art. 99
Hallucination-Caused Enterprise Incidents $2.1M avg. per event Forrester 2025
IEEE 2857-2024 Federal AI Mandate Mandatory Q1 2025 US OMB M-24-10

3.2 The Problem This Standard Solves

Current State ("The Delegation Problem"): When an enterprise delegates a critical task to an AI agent — processing insurance claims, triaging patient records, reviewing legal contracts — there is currently no standardised way to verify that the agent:

❌ Only performed the delegated task (no scope creep)
❌ Didn't fabricate work steps it never actually performed
❌ Maintained consistent reasoning (no mid-task hallucination)
❌ Produced an independently verifiable record
❌ Can provide court-admissible evidence of due diligence

Existing frameworks (ISO 42001, NIST RMF, AI Guardrails) monitor inputs and outputs but provide zero visibility into what the agent did between input and output.

3.3 Value to End Users

Stakeholder Value Delivered by AAQA
Agent Deployers (Enterprises) Cryptographic proof that delegated tasks were executed correctly. Liability protection. Insurance premium reduction (20-40%).
Insurance Underwriters Actuarial basis for AI liability policies. Real-time risk telemetry. Deterministic scoring instead of probabilistic estimates.
Regulators & Auditors Standardised compliance verification artefact. EU AI Act Art. 12 (logging) and Art. 14 (human oversight) compliance in a single framework.
Agent Developers (AI Labs) Clear implementation requirements. Interoperable proof format. Market differentiation through AAQA certification.
End Users (Consumers of Agent Output) Trust in AI-generated work products. Ability to verify that AI-produced documents, analyses, and decisions are free from fabrication.

3.4 UN Sustainable Development Goals

AAQA contributes to:

Section 4 · Normative Requirements (Draft)

Core AAQA Requirements

4.1 Action Logging (Mandatory)

An AAQA-compliant system SHALL:

4.2 Checkpoint Integrity (Mandatory)

An AAQA-compliant system SHALL:

Informative Note: The reference implementation uses Fibonacci-spaced checkpoints (positions 1, 1, 2, 3, 5, 8, 13, 21, 34...), providing 85% storage reduction vs linear checkpointing while catching 99.7% of early tampering. Other spacing schemes (logarithmic, exponential) are permitted provided they meet the density requirements.

4.3 Drift Detection & Classification (Mandatory)

An AAQA-compliant system SHALL:

Level Stage Name Description Required Action
1 ON_TASK Agent is performing the delegated task within scope Continue — no intervention required
2 MINOR_TANGENT Slight deviation, still related to delegated task Log — monitor for escalation
3 TOPIC_CHANGE Agent has moved outside delegated scope Drift Confession — notify deployer
4 TASK_ABANDONED Agent is no longer working on assigned task Drift Confession — halt or escalate
5 FABRICATION Agent is generating content unrelated to any real task Immediate halt — INVALID verdict mandatory

4.4 Coherence Measurement (Mandatory)

An AAQA-compliant system SHALL:

4.5 Proof Artifact Generation (Mandatory)

An AAQA-compliant system SHALL produce a proof artefact containing:

The proof artifact SHALL be serialisable in a portable format (JSON, CBOR, or equivalent) and independently verifiable without access to the agent system.

4.6 Verdict Determination (Mandatory)

An AAQA-compliant system SHALL:

Section 5 · Relationship to Existing Work

How AAQA Complements (Not Duplicates) Existing Standards

Existing Standard What It Covers What AAQA Adds
ISO/IEC 42001:2023
AI Management System
Organizational AI governance processes Runtime verification of individual agent task executions — the operational evidence that 42001 governance policies are being enforced
NIST AI RMF 1.0
Risk Management
Risk identification and mitigation framework Quantitative risk metrics (coherence score, drift stage) at per-task granularity — converts qualitative risk assessment into measurable data
EU AI Act Art. 12
Automatic Logging
Mandates logging for high-risk AI systems Specifies what to log, how to structure it, and how to make it tamper-proof — the implementation guide for Art. 12 compliance
EU AI Act Art. 14
Human Oversight
Requires human oversight of high-risk AI The drift classification + confession system enables effective oversight without requiring constant human monitoring — the mechanism Art. 14 needs
IEEE 2857-2024
AI Performance Benchmarking
Benchmarking methodology AAQA extends benchmarking from model evaluation to agent task execution verification — from "how well does it perform?" to "did it do what it was told?"
ISO/IEC 23894:2023
AI Risk Management
Risk management guidance Per-execution risk measurement (via coherence + drift) that feeds directly into enterprise risk registers
Key Differentiation: All existing standards operate at the system level (pre-deployment) or organizational level (governance). AAQA operates at the execution level — verifying each individual task delegation in real-time. This is analogous to the difference between SOC 2 certification (system) and individual transaction receipts (execution). Both are necessary; only AAQA provides the latter for AI agents.
Section 6 · Reference Implementation

Existing Implementation: POAW (Proof of Agent Work)

A reference implementation of this proposed standard exists as the POAW (Proof of Agent Work) module within the NI Stack (Natural Intelligence Stack), developed by OHM.

AAQA Requirement POAW Implementation Patent Protection
Action Logging SHA-256 hashed, CSPRNG nonces, JSON serialisation Claims 1-5
Checkpoint Integrity Fibonacci-spaced Merkle tree Claims 6-9
Drift Detection 5-stage classifier with adaptive φ⁻¹ thresholds Claims 13-16
Coherence Measurement φ-weighted 12D Heim projection scoring Claims 10-12
Proof Artifact JSON portable format with Merkle root, drift confessions Claims 19-21
Verdict Binary VALID/INVALID with cryptographic evidence Claims 19-21
Quantum Entropy QRNG via Cisco quantum hardware (optional tier) Claims 17-18

The POAW reference implementation is protected by 20 patent claims (3 independent + 17 dependent) under USPTO filing NI-POAW-001. A FRAND (Fair, Reasonable, And Non-Discriminatory) licensing model is proposed for any patents essential to the standard.

Section 7 · Proposed Timeline

Project Milestones

Phase Target Date Deliverable
NWIP Submission Q3 2026 Form 4 submitted to JTC 1/SC 42
Working Draft (WD) Q1 2027 Complete normative text with test suite
Committee Draft (CD) Q3 2027 First public comment period
Draft International Standard (DIS) Q1 2028 Final technical content, ballot
Publication Q3 2028 ISO/IEC AAQA-1:2028 published
Urgency: The EU AI Act enforcement deadline for high-risk AI is 2 August 2026. Enterprises deploying AI agents after this date will need to demonstrate Art. 12 & 14 compliance. AAQA provides the implementation framework — but only if standardisation begins now.
Section 8 · Nomenclature Strategy

From POAW to AAQA — Why the Rebrand Matters

Aspect POAW (Current) AAQA (Proposed)
Full Name Proof of Agent Work Autonomous Agent Quality Assurance
Positioning Technical mechanism (proof generation) Industry standard / discipline name
Audience Engineers, patent examiners Regulators, CxOs, standards bodies, insurers
Analogy "SHA-256" (the algorithm) "TLS" (the standard that uses the algorithm)
Standards Fit Implementation reference ISO/IEC deliverable title
Recommendation: Keep POAW as the reference implementation name (like "OpenSSL" implements TLS). Position AAQA as the standard name (like "ISO/IEC 42001"). This allows OHM to own the implementation while contributing to the open standard.

"POAW is to AAQA what OpenSSL is to TLS."
Section 9 · Proposed Multi-Part Structure

Future AAQA Standard Family

Part Title Status
AAQA-1 Core Requirements — Action logging, drift detection, coherence, proof artifacts, verdict This Proposal
AAQA-2 Insurance Integration — Mapping AAQA scores to actuarial risk models (NI-SHIELD) Planned Q4 2026
AAQA-3 Multi-Agent Systems — Verification of delegated task chains across agent-to-agent handoffs Planned 2027
AAQA-4 Conformity Assessment — Certification scheme for AAQA-compliant agent systems Planned 2027
Section 10 · Call to Action

Next Steps for Standardisation Bodies

🇪🇺 For the EU AI Office

AAQA provides the implementation standard that Art. 12 and Art. 14 require but don't specify. We propose a formal liaison with CEN/CENELEC TC on AI (prEN 18286) to align AAQA with the EU's Quality Management System requirements for AI providers.

🇺🇸 For NIST

AAQA extends the AI RMF 1.0 with per-execution runtime risk measurement. The POAW reference implementation already maps to IEEE 2857-2024 benchmarking. We propose contributing AAQA to the NIST AI Safety Consortium (AISIC).

🌍 For ISO/IEC JTC 1/SC 42

We request consideration of this NWIP under ISO/IEC JTC 1/SC 42 (Artificial Intelligence), WG 3 (Trustworthiness). AAQA is positioned as a supporting standard to ISO/IEC 42001, providing the runtime verification layer that the management system requires.

🏥 For Insurance Industry (Munich Re, Swiss Re, Allianz)

AAQA provides the actuarial measurement framework the AI insurance market needs. The AAQA-2 Insurance Integration part (planned Q4 2026) maps directly to the aiSure™ and equivalent underwriting systems.

AAQA-1 · PRE-NWIP DRAFT v1.0
Submitted for review · 6 March 2026
Proposer: OHM · Contact: hello@offlinehumanmode.com
This document is a pre-submission draft intended for feedback and refinement before formal NWIP filing. It follows the structure of ISO/IEC Form 4 (New Work Item Proposal) per the ISO/IEC Directives Part 1.