CDPS — The 8-Axis Clean Data Provenance Score

CDPS Clean Data Provenance Score — Infographic showing uncertified vs. diamond-certified datasets flowing into AI models

⚡ Executive Summary

The Clean Data Provenance Score (CDPS) is a free, open standard (Apache 2.0) that answers a question every AI company will face before regulators: "How clean is this training dataset?" CDPS scores datasets on 8 orthogonal axes — from creator consent to adversarial poisoning resistance — producing a grade: 💎 Diamond ✅ Clean ⚠️ Partial ❌ Uncertified. It maps directly to the EU AI Act, ISO 42001, NIST AI RMF, and 10 more jurisdictions. The standard is open. The engine that writes scores steganographically into media? That's FORTRESS.

🔴 The Problem: A $170 Billion Question Nobody Can Answer

"Every generative AI model is one class-action lawsuit away from having its entire training pipeline questioned. The question isn't if they'll ask 'Where did you get this data?' — it's when."— The reality facing OpenAI, Google, Anthropic, and Meta in 2026

The New York Times sued OpenAI. Getty sued Stability AI. The Authors Guild sued Meta. Every major AI company now faces a fundamental question they cannot answer with confidence: "How clean is our training data?"

There is no universally accepted answer because there is no universally accepted scoring system. The landscape is fragmented:

Approach	What It Does	What It Misses
C2PA	Proves where content came from	Doesn't score training data quality
Fairly Trained	Certifies companies as ethical	Binary pass/fail — no dataset-level granularity
D&TA Standards	Defines metadata fields to document	No scoring rubric — just a vocabulary
Spawning.ai	Provides opt-out signals for creators	Opt-out only — no quality measurement
HuggingFace Data Cards	Self-reported dataset documentation	No verification — self-reported by data providers

Each addresses a piece of the puzzle. None provides a quantitative, multi-dimensional, machine-verifiable certification that a regulator, insurer, or acquirer can trust.

CDPS does.

🟢 The Solution: 8 Axes, 1 Score, Zero Ambiguity

CDPS evaluates any AI training dataset across 8 independent axes, each scored 0–10, weighted, and aggregated into a single composite score and human-readable grade.

Consent Provenance

Was creator consent obtained for AI training? Weight: 20%

Legal in EU, US, Japan, China, Korea? Weight: 15%

Bias & Representation

Balanced across demographics? Weight: 10%

Freshness & Versioning

Timestamped and version-controlled? Weight: 10%

Privacy & PII

PII detected and removed? Weight: 15%

Poisoning Resistance

Screened for adversarial attacks? Weight: 10%

Source Traceability

Cryptographic provenance chain? Weight: 10%

Settlement Readiness

Automated royalty rails configured? Weight: 10%

The 4 Grades

Grade	Score	What It Means
💎 Diamond	9.0 – 10.0	Full consent, full traceability, royalty rails active, zero PII, zero poisoning risk
✅ Clean	7.0 – 8.99	Strong provenance, minor gaps documented and acknowledged
⚠️ Partial	4.0 – 6.99	Mixed provenance, consent gaps present, PII risk likely
❌ Uncertified	0.0 – 3.99	No provenance, unknown consent, high legal liability

🔧 How CDPS Fits Into the AI Data Economy

"We built the immune system. And we open-sourced the language to read it."— Hagen Schmidt, Founder — DESTILL.ai

CDPS is the open rubric — the scoring methodology. Anyone can evaluate their datasets using the 8 axes for free. But the truly transformative integration happens when CDPS connects to the FAIR Protocol ecosystem:

Layer	License	What It Does
CDPS Standard	Apache 2.0 (Open)	Defines the 8-axis scoring rubric
FAIR Reader SDK	Apache 2.0 (Open)	Reads scores from watermarked media
FORTRESS DWT Engine	Proprietary	Writes scores steganographically into media — survives 12 AI generations
Clearinghouse	Proprietary	Validates, stores, and settles CDPS attestations on-chain

Think of it like SSL: the cryptographic standard is open (TLS). The certificates are issued by trusted authorities (Verisign, Let's Encrypt). The commerce infrastructure built on that trust (Stripe, Visa) generates billions. CDPS is the standard. FORTRESS is the certificate authority. The Clearinghouse is the payment rail.

📜 Global Regulatory Compliance — 13 Jurisdictions, 1 Standard

CDPS was designed from day one to map directly to the regulatory requirements AI companies face worldwide:

Regulation	Jurisdiction	CDPS Axes Covered
EU AI Act Art. 10, 50, 53	EU/EEA	Axes 1, 2, 3, 5, 7, 8
GDPR Art. 5, 6, 9	EU/EEA	Axis 5
ISO/IEC 42001	Global	All 8 axes
NIST AI RMF	USA	Axes 3, 5, 6
California SB 942	USA (CA)	Axes 1, 2, 7
China Cybersecurity Law (2026)	China	Axes 5, 6, 7
Japan IP Code (emerging)	Japan	Axes 1, 2
Korea AI Basic Act	South Korea	Axes 3, 5, 7

No other data provenance standard currently maps to all of these jurisdictions. This is by design — CDPS was built for sovereign, global deployment.

🕸️ Deep XPollination — 7 Main BPC, 27 Sub-BPC, 8 Standards Evaluated

Scope: This comparison evaluates AI Training Data Provenance & Governance Standards as of April 2026.

Disclosure: CDPS (FAIR Protocol) is our own solution. We scored it first (before competitors) to avoid anchor bias. All scores reflect publicly verifiable information.

👆 Click each BPC group below to expand the full Sub-BPC evaluation with individual scores for all 8 standards

⚖️ BPC 1 — Creator Rights & Consent (Weight: 20%) CDPS: 9.3 Click to expand ▼

Measures: Does the standard enable explicit creator consent, opt-out enforcement, and granular permission signaling for AI training use?

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Explicit opt-in signals Machine-readable consent declaration per asset	10	5	6	7	8	3	4	5
Opt-out enforcement Respects robots.txt, ai.txt, DNTR registries	9	4	5	8	10	4	3	6
Granular permissions Per-use-case: training, RAG, display, research	10	6	7	3	5	2	5	3
Consent verification Cryptographic proof of consent (DID, signature)	8	7	4	5	4	1	3	4
Average	9.3	5.5	5.5	5.8	6.8	2.5	3.8	4.5

📜 BPC 2 — Regulatory Compliance Coverage (Weight: 18%) CDPS: 9.5 Click to expand ▼

Measures: Number of jurisdictions and regulatory frameworks the standard explicitly maps to, with documented compliance controls.

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
EU AI Act Art. 10/50/53 Data governance, GPAI transparency, output labeling	10	8	7	5	4	4	6	5
GDPR & Privacy regs (Art. 5/6/9) Lawful basis, special categories, data minimization	10	7	7	6	6	3	5	4
ISO/IEC 42001 & 5338 alignment AI management system controls, lifecycle readiness	10	8	7	5	3	3	6	5
Non-EU jurisdictions (US/CN/JP/KR/BR) NIST, China CyberSec, Japan Art.30-4, Korea AI Act	8	7	5	4	3	3	4	4
Average	9.5	7.5	6.5	5.0	4.0	3.3	5.3	4.5

🔬 BPC 3 — Technical Rigor & Scoring Granularity (Weight: 15%) CDPS: 9.5 Click to expand ▼

Measures: Quantitative scoring precision, multi-dimensional evaluation, cryptographic verification, and adversarial testing capabilities.

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Scoring granularity Continuous 0–10 vs binary pass/fail vs none	10	5	4	2	3	3	4	7
Multi-dimensional evaluation Number of independent measurable axes (≥5 = excellent)	10	5	7	2	2	4	6	5
Cryptographic verification PQC signatures, hash chains, tamper-proof attestations	9	10	3	2	3	2	3	5
Adversarial/poisoning evaluation Explicit axis for data poisoning detection & defense	9	3	2	1	1	2	1	6
Average	9.5	5.8	4.0	1.8	2.3	2.8	3.5	5.8

🔓 BPC 4 — Openness & Freedom to Operate (Weight: 15%) CDPS: 8.5 Click to expand ▼

Measures: Open-source availability, license permissiveness, vendor lock-in risk, and interoperability with other standards.

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Open-source codebase SDK, schemas, and tools publicly available	9	8	4	5	8	10	6	9
No vendor lock-in Can switch providers without data loss	8	7	6	8	8	10	7	10
Interoperability Can embed inside or be read by other standards	9	8	7	6	7	8	7	7
Community governance Open contribution model, RFCs, working groups	8	9	7	7	6	9	8	8
Average	8.5	8.0	6.0	6.5	7.3	9.3	7.0	8.5

💰 BPC 5 — Settlement & Creator Monetization (Weight: 12%) CDPS: 10 Click to expand ▼

Measures: Automated royalty payment infrastructure, per-asset pricing metadata, and zero-friction settlement capabilities.

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Automated royalty rails Machine-triggered USDC/HBAR settlement	10	2	2	3	1	1	1	1
Per-asset pricing metadata scrape_fee, training_royalty embeddable per image	10	3	2	2	2	1	2	1
Clearinghouse integration Centralized or decentralized settlement API	10	2	1	3	1	0	0	0
Average	10.0	2.3	1.7	2.7	1.3	0.7	1.0	0.7

🛡️ BPC 6 — Resilience & Survivability (Weight: 10%) CDPS: 8.8 Click to expand ▼

Measures: Does provenance data survive metadata stripping, re-encoding, AI diffusion, and deliberate tampering?

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Metadata stripping survival Provenance survives social media upload/re-share	10	3	2	0	1	0	2	1
AI generational survival Watermark readable after N encode-decode-retrain cycles	9	1	0	0	0	0	0	0
Post-quantum readiness ML-DSA/ML-KEM signatures vs RSA/ECDSA only	9	5	2	1	1	1	2	2
Tamper detection Active alert on manipulation attempt	7	8	3	2	3	1	3	3
Average	8.8	4.3	1.8	0.8	1.3	0.5	1.8	1.5

🌍 BPC 7 — Adoption & Ecosystem Maturity (Weight: 10%) CDPS: 5.3 Click to expand ▼

Measures: Current market adoption, number of integrations, industry backing, and community size. (Our honest weakness.)

Sub-BPC	CDPS	C2PA	D&TA	Fairly	Spawn.	HF	IPTC	MIT
Industry backing Fortune 500 / FAANG / government endorsements	3	10	8	5	6	9	8	5
SDK downloads / integrations npm/pip downloads, tool integrations	4	8	5	3	6	10	6	5
Hardware integration Camera chipset, browser native, device-level	2	9	2	1	1	2	4	1
Academic citations Peer-reviewed papers referencing the standard	2	8	5	4	5	8	7	9
We're honest: CDPS is new. C2PA and HuggingFace lead on adoption today.
Average	2.8	8.8	5.0	3.3	4.5	7.3	6.3	5.0

📊 Weighted Composite Results

Rank	Standard	BPC1 20%	BPC2 18%	BPC3 15%	BPC4 15%	BPC5 12%	BPC6 10%	BPC7 10%	Weighted	Label
🥇	CDPS (FAIR)	9.3	9.5	9.5	8.5	10	8.8	2.8	8.5	🥈 Silver
🥈	C2PA v2.1	5.5	7.5	5.8	8.0	2.3	4.3	8.8	6.1	—
🥉	MIT DPI	4.5	4.5	5.8	8.5	0.7	1.5	5.0	4.5	—
4	Spawning.ai	6.8	4.0	2.3	7.3	1.3	1.3	4.5	4.2	—
5	D&TA	5.5	6.5	4.0	6.0	1.7	1.8	5.0	4.7	—
6	HuggingFace	2.5	3.3	2.8	9.3	0.7	0.5	7.3	3.8	—
7	Fairly Trained	5.8	5.0	1.8	6.5	2.7	0.8	3.3	3.9	—
8	IPTC 2025.1	3.8	5.3	3.5	7.0	1.0	1.8	6.3	4.1	—

Note: CDPS receives 🥈 Silver (not Gold) because BPC 7 (Adoption) is 2.8/10 — we need community traction to pass the "no BPC below 4.0" Gold threshold. This is honest. We're building.

⚡ Trade-Off Analysis — 2 Codependent BPC Pairs Detected

These BPC dimensions have structural tensions that cannot both reach 10/10 without deliberate architectural dissolution.

🔴 Critical BPC 3 (Technical Rigor) ↔ BPC 7 (Adoption)

Why They Conflict: High technical rigor (PQC signatures, adversarial poisoning axes, 8-dimensional scoring) increases implementation complexity, which slows adoption. Simple standards (like HuggingFace Data Cards) get adopted faster precisely because they demand less from implementers.

Current: Rigor = 9.5, Adoption = 2.8. Sacrifice = -6.7 on Adoption.

💡 TRIZ Principle #1 — Segmentation: Create 3 tiers: CDPS Lite (3 mandatory axes, Consent + Copyright + Traceability — 5-minute implementation), CDPS Standard (all 8 axes), CDPS Diamond (8 axes + PQC + Clearinghouse settlement). This lets HuggingFace-level implementers start with Lite and upgrade. Like SSL → TLS 1.0 → 1.3.

Post-Dissolution: Rigor = 9.5 (unchanged), Adoption = 6.0 (+3.2). Effort: M | Innovation: Architectural

🟠 Significant BPC 5 (Settlement) ↔ BPC 4 (Openness)

Why They Conflict: The settlement Clearinghouse is proprietary — which reduces the "Freedom to Operate" score. Open-source purists will resist a standard that requires a proprietary payment rail for full functionality.

Current: Settlement = 10, Openness = 8.5. Mild tension (-1.5).

💡 TRIZ Principle #13 — The Other Way Round: Publish the Clearinghouse API specification (OpenAPI 3.0) as open-source. Anyone can build a compatible settlement endpoint. FORTRESS runs the reference implementation. Like how SMTP is open but Gmail is a proprietary implementation.

Post-Dissolution: Settlement = 10 (unchanged), Openness = 9.5 (+1.0). Effort: S | Innovation: Incremental

🚀 Get Started — Score Your Dataset in 3 Minutes

The CDPS standard is open. The attestation schema is open. You can generate a score today:

Read the spec: schema/CDPS_STANDARD.md

Score each of the 8 axes for your dataset (0–10)

Generate a JSON attestation: cdps-attestation-v1.json

Publish at /.well-known/fair.json with a "cdps" block

(Optional) Embed steganographically via FORTRESS

❓ Questions This Research Opens

For AI Companies:

If the EU AI Act requires you to document training data provenance by 2027, and your datasets currently score "Uncertified" — what's your remediation timeline?

For Regulators:

Should CDPS grades be required on AI model cards, the way nutrition labels are required on food?

For Creators:

If your images are inside a Diamond-certified dataset with active settlement rails, do you actually get paid? CDPS Axis 8 says yes — automatically, via USDC.

📚 Related Reading

🌱

Open Source & Sovereign: How FAIR Protocol Gives Back to the Community The open-source architecture behind CDPS — Apache 2.0 licensed SDK, sovereign infrastructure, and the community flywheel that makes provenance scoring accessible to everyone.

→ 🛡️

98 Days: The Invisible Watermark That Could Own the $18.9B Content Provenance Market How FORTRESS DWT steganographic watermarking survives 12 generations of AI re-processing — the technical backbone behind CDPS Axis 6 (Resilience).

→ 🔬

Your Images Survive 12 Generations of AI — Here's the Math The DWT benchmark data that powers the CDPS 12-Gen AI Survival sub-axis — mathematical proof of steganographic resilience across diffusion model pipelines.

→

⚖️ Methodology & Legal Notice

This comparison evaluates AI Training Data Provenance & Governance Standards using the XPollination Holistic BPC Framework. All scores reflect publicly available information as of April 27, 2026. Competitor names are used solely for identification per Art. 4(f) Directive 2006/114/EC. No competitor logos, slogans, or brand colors are displayed.

Correction mechanism: Vendors may submit corrections with evidence URLs. Scores will be updated within 30 business days of verified submission.

📧 Request Data Update

Hagen Schmidt — Founder & Creator, DESTILL.ai / FORTRESS. Building the immune system for the AI data economy. 3,030+ USPTO patent claims. Post-quantum cryptography. Sovereign infrastructure. Vienna, Austria.

🔗 LinkedIn · 🌐 destill.ai · 📧 IP@destill.ai