Open Standard · FAIR Protocol · v1.0

CDPS — The 8-Axis Score That Tells You How Clean Your AI Training Data Really Is

Because "we scraped the internet" isn't a compliance strategy anymore. Introducing the Clean Data Provenance Score: an open, multi-dimensional certification standard for the AI data economy.

CDPS Clean Data Provenance Score — Infographic showing uncertified vs. diamond-certified datasets flowing into AI models
⚡ Executive Summary

The Clean Data Provenance Score (CDPS) is a free, open standard (Apache 2.0) that answers a question every AI company will face before regulators: "How clean is this training dataset?" CDPS scores datasets on 8 orthogonal axes — from creator consent to adversarial poisoning resistance — producing a grade: 💎 Diamond ✅ Clean ⚠️ Partial ❌ Uncertified. It maps directly to the EU AI Act, ISO 42001, NIST AI RMF, and 10 more jurisdictions. The standard is open. The engine that writes scores steganographically into media? That's FORTRESS.

🔴 The Problem: A $170 Billion Question Nobody Can Answer

"Every generative AI model is one class-action lawsuit away from having its entire training pipeline questioned. The question isn't if they'll ask 'Where did you get this data?' — it's when."— The reality facing OpenAI, Google, Anthropic, and Meta in 2026

The New York Times sued OpenAI. Getty sued Stability AI. The Authors Guild sued Meta. Every major AI company now faces a fundamental question they cannot answer with confidence: "How clean is our training data?"

There is no universally accepted answer because there is no universally accepted scoring system. The landscape is fragmented:

ApproachWhat It DoesWhat It Misses
C2PAProves where content came fromDoesn't score training data quality
Fairly TrainedCertifies companies as ethicalBinary pass/fail — no dataset-level granularity
D&TA StandardsDefines metadata fields to documentNo scoring rubric — just a vocabulary
Spawning.aiProvides opt-out signals for creatorsOpt-out only — no quality measurement
HuggingFace Data CardsSelf-reported dataset documentationNo verification — self-reported by data providers

Each addresses a piece of the puzzle. None provides a quantitative, multi-dimensional, machine-verifiable certification that a regulator, insurer, or acquirer can trust.

CDPS does.

🟢 The Solution: 8 Axes, 1 Score, Zero Ambiguity

CDPS evaluates any AI training dataset across 8 independent axes, each scored 0–10, weighted, and aggregated into a single composite score and human-readable grade.

1
Consent Provenance
Was creator consent obtained for AI training? Weight: 20%
2
Copyright Compliance
Legal in EU, US, Japan, China, Korea? Weight: 15%
3
Bias & Representation
Balanced across demographics? Weight: 10%
4
Freshness & Versioning
Timestamped and version-controlled? Weight: 10%
5
Privacy & PII
PII detected and removed? Weight: 15%
6
Poisoning Resistance
Screened for adversarial attacks? Weight: 10%
7
Source Traceability
Cryptographic provenance chain? Weight: 10%
8
Settlement Readiness
Automated royalty rails configured? Weight: 10%

The 4 Grades

GradeScoreWhat It Means
💎 Diamond9.0 – 10.0Full consent, full traceability, royalty rails active, zero PII, zero poisoning risk
✅ Clean7.0 – 8.99Strong provenance, minor gaps documented and acknowledged
⚠️ Partial4.0 – 6.99Mixed provenance, consent gaps present, PII risk likely
❌ Uncertified0.0 – 3.99No provenance, unknown consent, high legal liability

🔧 How CDPS Fits Into the AI Data Economy

"We built the immune system. And we open-sourced the language to read it."— Hagen Schmidt, Founder — DESTILL.ai

CDPS is the open rubric — the scoring methodology. Anyone can evaluate their datasets using the 8 axes for free. But the truly transformative integration happens when CDPS connects to the FAIR Protocol ecosystem:

LayerLicenseWhat It Does
CDPS StandardApache 2.0 (Open)Defines the 8-axis scoring rubric
FAIR Reader SDKApache 2.0 (Open)Reads scores from watermarked media
FORTRESS DWT EngineProprietaryWrites scores steganographically into media — survives 12 AI generations
ClearinghouseProprietaryValidates, stores, and settles CDPS attestations on-chain

Think of it like SSL: the cryptographic standard is open (TLS). The certificates are issued by trusted authorities (Verisign, Let's Encrypt). The commerce infrastructure built on that trust (Stripe, Visa) generates billions. CDPS is the standard. FORTRESS is the certificate authority. The Clearinghouse is the payment rail.

📜 Global Regulatory Compliance — 13 Jurisdictions, 1 Standard

CDPS was designed from day one to map directly to the regulatory requirements AI companies face worldwide:

RegulationJurisdictionCDPS Axes Covered
EU AI Act Art. 10, 50, 53EU/EEAAxes 1, 2, 3, 5, 7, 8
GDPR Art. 5, 6, 9EU/EEAAxis 5
ISO/IEC 42001GlobalAll 8 axes
NIST AI RMFUSAAxes 3, 5, 6
California SB 942USA (CA)Axes 1, 2, 7
China Cybersecurity Law (2026)ChinaAxes 5, 6, 7
Japan IP Code (emerging)JapanAxes 1, 2
Korea AI Basic ActSouth KoreaAxes 3, 5, 7

No other data provenance standard currently maps to all of these jurisdictions. This is by design — CDPS was built for sovereign, global deployment.

🕸️ Deep XPollination — 7 Main BPC, 27 Sub-BPC, 8 Standards Evaluated

Scope: This comparison evaluates AI Training Data Provenance & Governance Standards as of April 2026.

Disclosure: CDPS (FAIR Protocol) is our own solution. We scored it first (before competitors) to avoid anchor bias. All scores reflect publicly verifiable information.

👆 Click each BPC group below to expand the full Sub-BPC evaluation with individual scores for all 8 standards
⚖️ BPC 1 — Creator Rights & Consent (Weight: 20%) CDPS: 9.3 Click to expand

Measures: Does the standard enable explicit creator consent, opt-out enforcement, and granular permission signaling for AI training use?

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Explicit opt-in signals
Machine-readable consent declaration per asset
105678345
Opt-out enforcement
Respects robots.txt, ai.txt, DNTR registries
945810436
Granular permissions
Per-use-case: training, RAG, display, research
106735253
Consent verification
Cryptographic proof of consent (DID, signature)
87454134
Average 9.35.55.55.86.82.53.84.5
📜 BPC 2 — Regulatory Compliance Coverage (Weight: 18%) CDPS: 9.5 Click to expand

Measures: Number of jurisdictions and regulatory frameworks the standard explicitly maps to, with documented compliance controls.

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
EU AI Act Art. 10/50/53
Data governance, GPAI transparency, output labeling
108754465
GDPR & Privacy regs (Art. 5/6/9)
Lawful basis, special categories, data minimization
107766354
ISO/IEC 42001 & 5338 alignment
AI management system controls, lifecycle readiness
108753365
Non-EU jurisdictions (US/CN/JP/KR/BR)
NIST, China CyberSec, Japan Art.30-4, Korea AI Act
87543344
Average 9.57.56.55.04.03.35.34.5
🔬 BPC 3 — Technical Rigor & Scoring Granularity (Weight: 15%) CDPS: 9.5 Click to expand

Measures: Quantitative scoring precision, multi-dimensional evaluation, cryptographic verification, and adversarial testing capabilities.

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Scoring granularity
Continuous 0–10 vs binary pass/fail vs none
105423347
Multi-dimensional evaluation
Number of independent measurable axes (≥5 = excellent)
105722465
Cryptographic verification
PQC signatures, hash chains, tamper-proof attestations
910323235
Adversarial/poisoning evaluation
Explicit axis for data poisoning detection & defense
93211216
Average 9.55.84.01.82.32.83.55.8
🔓 BPC 4 — Openness & Freedom to Operate (Weight: 15%) CDPS: 8.5 Click to expand

Measures: Open-source availability, license permissiveness, vendor lock-in risk, and interoperability with other standards.

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Open-source codebase
SDK, schemas, and tools publicly available
984581069
No vendor lock-in
Can switch providers without data loss
8768810710
Interoperability
Can embed inside or be read by other standards
98767877
Community governance
Open contribution model, RFCs, working groups
89776988
Average 8.58.06.06.57.39.37.08.5
💰 BPC 5 — Settlement & Creator Monetization (Weight: 12%) CDPS: 10 Click to expand

Measures: Automated royalty payment infrastructure, per-asset pricing metadata, and zero-friction settlement capabilities.

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Automated royalty rails
Machine-triggered USDC/HBAR settlement
102231111
Per-asset pricing metadata
scrape_fee, training_royalty embeddable per image
103222121
Clearinghouse integration
Centralized or decentralized settlement API
102131000
Average 10.02.31.72.71.30.71.00.7
🛡️ BPC 6 — Resilience & Survivability (Weight: 10%) CDPS: 8.8 Click to expand

Measures: Does provenance data survive metadata stripping, re-encoding, AI diffusion, and deliberate tampering?

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Metadata stripping survival
Provenance survives social media upload/re-share
103201021
AI generational survival
Watermark readable after N encode-decode-retrain cycles
91000000
Post-quantum readiness
ML-DSA/ML-KEM signatures vs RSA/ECDSA only
95211122
Tamper detection
Active alert on manipulation attempt
78323133
Average 8.84.31.80.81.30.51.81.5
🌍 BPC 7 — Adoption & Ecosystem Maturity (Weight: 10%) CDPS: 5.3 Click to expand

Measures: Current market adoption, number of integrations, industry backing, and community size. (Our honest weakness.)

Sub-BPCCDPSC2PAD&TAFairlySpawn.HFIPTCMIT
Industry backing
Fortune 500 / FAANG / government endorsements
310856985
SDK downloads / integrations
npm/pip downloads, tool integrations
485361065
Hardware integration
Camera chipset, browser native, device-level
29211241
Academic citations
Peer-reviewed papers referencing the standard
28545879
We're honest: CDPS is new. C2PA and HuggingFace lead on adoption today.
Average 2.88.85.03.34.57.36.35.0

📊 Weighted Composite Results

RankStandardBPC1
20%
BPC2
18%
BPC3
15%
BPC4
15%
BPC5
12%
BPC6
10%
BPC7
10%
WeightedLabel
🥇CDPS (FAIR) 9.39.59.58.5108.82.88.5🥈 Silver
🥈C2PA v2.1 5.57.55.88.02.34.38.86.1
🥉MIT DPI 4.54.55.88.50.71.55.04.5
4Spawning.ai 6.84.02.37.31.31.34.54.2
5D&TA 5.56.54.06.01.71.85.04.7
6HuggingFace 2.53.32.89.30.70.57.33.8
7Fairly Trained 5.85.01.86.52.70.83.33.9
8IPTC 2025.1 3.85.33.57.01.01.86.34.1

Note: CDPS receives 🥈 Silver (not Gold) because BPC 7 (Adoption) is 2.8/10 — we need community traction to pass the "no BPC below 4.0" Gold threshold. This is honest. We're building.

⚡ Trade-Off Analysis — 2 Codependent BPC Pairs Detected

These BPC dimensions have structural tensions that cannot both reach 10/10 without deliberate architectural dissolution.

🔴 Critical BPC 3 (Technical Rigor) ↔ BPC 7 (Adoption)

Why They Conflict: High technical rigor (PQC signatures, adversarial poisoning axes, 8-dimensional scoring) increases implementation complexity, which slows adoption. Simple standards (like HuggingFace Data Cards) get adopted faster precisely because they demand less from implementers.

Current: Rigor = 9.5, Adoption = 2.8. Sacrifice = -6.7 on Adoption.

💡 TRIZ Principle #1 — Segmentation: Create 3 tiers: CDPS Lite (3 mandatory axes, Consent + Copyright + Traceability — 5-minute implementation), CDPS Standard (all 8 axes), CDPS Diamond (8 axes + PQC + Clearinghouse settlement). This lets HuggingFace-level implementers start with Lite and upgrade. Like SSL → TLS 1.0 → 1.3.

Post-Dissolution: Rigor = 9.5 (unchanged), Adoption = 6.0 (+3.2). Effort: M | Innovation: Architectural

🟠 Significant BPC 5 (Settlement) ↔ BPC 4 (Openness)

Why They Conflict: The settlement Clearinghouse is proprietary — which reduces the "Freedom to Operate" score. Open-source purists will resist a standard that requires a proprietary payment rail for full functionality.

Current: Settlement = 10, Openness = 8.5. Mild tension (-1.5).

💡 TRIZ Principle #13 — The Other Way Round: Publish the Clearinghouse API specification (OpenAPI 3.0) as open-source. Anyone can build a compatible settlement endpoint. FORTRESS runs the reference implementation. Like how SMTP is open but Gmail is a proprietary implementation.

Post-Dissolution: Settlement = 10 (unchanged), Openness = 9.5 (+1.0). Effort: S | Innovation: Incremental

🚀 Get Started — Score Your Dataset in 3 Minutes

The CDPS standard is open. The attestation schema is open. You can generate a score today:

1. Read the spec: schema/CDPS_STANDARD.md
2. Score each of the 8 axes for your dataset (0–10)
3. Generate a JSON attestation: cdps-attestation-v1.json
4. Publish at /.well-known/fair.json with a "cdps" block
5. (Optional) Embed steganographically via FORTRESS

❓ Questions This Research Opens

For AI Companies:

If the EU AI Act requires you to document training data provenance by 2027, and your datasets currently score "Uncertified" — what's your remediation timeline?

For Regulators:

Should CDPS grades be required on AI model cards, the way nutrition labels are required on food?

For Creators:

If your images are inside a Diamond-certified dataset with active settlement rails, do you actually get paid? CDPS Axis 8 says yes — automatically, via USDC.

Sacred geometry with crystalline structures — NI-Stack visual signature

📚 Related Reading

🌱
Open Source & Sovereign: How FAIR Protocol Gives Back to the Community The open-source architecture behind CDPS — Apache 2.0 licensed SDK, sovereign infrastructure, and the community flywheel that makes provenance scoring accessible to everyone.
🛡️
98 Days: The Invisible Watermark That Could Own the $18.9B Content Provenance Market How FORTRESS DWT steganographic watermarking survives 12 generations of AI re-processing — the technical backbone behind CDPS Axis 6 (Resilience).
🔬
Your Images Survive 12 Generations of AI — Here's the Math The DWT benchmark data that powers the CDPS 12-Gen AI Survival sub-axis — mathematical proof of steganographic resilience across diffusion model pipelines.
Hagen Schmidt — Founder & Creator, DESTILL.ai / FORTRESS. Building the immune system for the AI data economy. 3,030+ USPTO patent claims. Post-quantum cryptography. Sovereign infrastructure. Vienna, Austria.

🔗 LinkedIn · 🌐 destill.ai · 📧 IP@destill.ai