The Accountability Gap / ads.txt × sellers.json audit --:--:-- UTC · Corpus 2026-05-29  ·  86,357 publishers  ·  33.39M triples  ·  5,454 SSPs (1,548 with registry)  ·  cycles 458–488 · H184–H199 ·
30%
of ads.txtads.txtA public file at publisher.com/ads.txt listing which ad sellers the publisher authorizes — and whether they're DIRECT (no middleman, premium price) or INTERMEDIARY. claims contradict sellers.jsonsellers.jsonA public file at exchange.com/sellers.json listing each seller the exchange does business with, and whether they classify them as a direct publisher or middleman.. Of 6.43M DIRECT credentials in the corpus, only 11.6% cleanly validate (valid + disclosed). The other 88.4% distributes across phantom, contradicted, impersonated, confidential-unverifiable, and no-registry — see the cascade panel below.
DIRECT credentials audited6,431,328
Phantom (seller_id not in registry)1,962,203 · 30.5%
Contradicted (registry says other)1,751,669 · 27.2%
Impersonated (DIRECT but reg says INTERMEDIARY)1,601,139 · 24.9%
│ Confidential (registry hides seller)132,026 · 2.1%
│ No registry available239,310 · 3.7%
Validates cleanly (valid + disclosed)744,568 · 11.6%
Publishers · SSPs claimed86,357 · 5,454
Per-cell cascade (H187 resolution)2.33M cells · 27× finer
Exhibit A · one phantom claim, traced
1 of 2,219,472
ads.txt themoneytizer.com
1# Themoneytizer ads.txt — auto-generated
2google.com, pub-1234567890, DIRECT, f08c47fec0942fa0
3rubiconproject.com, 17960, DIRECT, 0bfd66d529a55807
4smartadserver.com, 1097, DIRECT
5openx.com, 540123456, DIRECT, 6a698e2ec38604c6
6pubmatic.com, 158127, DIRECT, 5d62403b186f2ace
sellers.json smartadserver.com
"sellers": [
872 { "seller_id": "1095", "name": "Le Monde" },
873 { "seller_id": "1096", "name": "L'Équipe" },
874 { "seller_id": "1097", "type": "INTERMEDIARY" },
875 { "seller_id": "1098", "name": "Auchan" },
876 { "seller_id": "1099", "name": "Carrefour" },
Same ID 1097. Opposite role. Both files public. ≠ contradiction.
ads.txtpublisher's public list of authorized ad sellers
sellers.jsonad exchange's public list of their sellers and roles
DIRECTno middleman — advertiser pays a premium
INTERMEDIARYmiddleman / reseller — takes a cut, lower pricing
Live trace

How a single page-load propagates your identity.

Each packet below is one identity-share event sampled at the rate observed in the audit. Red = the seller chain is contradicted (F1). Tan = sent without valid consent (F2). Green = both declared and confirmed.

0 contradicted · 0 no consent · 0 clean · 0 total packets traced
Resolution upgrade · H187 · per-cell cascade

The 89% the headline rate doesn't show.

2,327,455 cells · 27× finer than per-publisher
~100ms primary-key lookup · 1,381 cells z > 3σ · 25 z > 20σ

Earlier audits collapsed each DIRECT claim to one verdict (real / phantom). The v6 cascade unpacks the failure modes. Of 6,431,328 DIRECT credentials in the May 29 corpus, only 11.6% cleanly validate. The other 88% breaks into five qualitatively different failures, each pointing at a different actor in the chain:

verdict
distribution
count
%
validregistry confirms DIRECT
93,666
1.46%
disclosedregistry confirms named seller
650,902
10.12%
phantomseller_id missing from registry entirely
1,962,203
30.51%
contradictedregistry has different seller_type
1,751,669
27.24%
impersonatedDIRECT claimed, registry says INTERMEDIARY
1,601,139
24.90%
confidentialregistry hides seller — unverifiable
132,026
2.05%
no_registrySSP never published sellers.json
239,310
3.72%
Exhibit B · the apex cell · z = 29σ above population baseline

Yahoo's own HuffPost properties cascade as impersonation against Yahoo's own sellers.json registry.

Yahoo acquired HuffPost from BuzzFeed in 2024 and operates it as a wholly-owned subsidiary. Six HuffPost regional domains list yahoo.com 96 times each in their ads.txt as DIRECT. Yahoo's own sellers.json classifies ~70% of those seller_ids as INTERMEDIARY. The cascade verdict for each cell: impersonation. This is the cascade defect at maximum embarrassment — the parent company's subsidiary fails the audit against the parent's own registry.

huffpost domain cell ssp n_direct imp rate z-score
huffpost.comyahoo.com966769.8%+29
huffpost.gryahoo.com966971.9%+29
huffingtonpost.comyahoo.com966769.8%+29
huffingtonpost.gryahoo.com966971.9%+29
huffingtonpost.jpyahoo.com966769.8%+29
huffingtonpost.inyahoo.com966769.8%+29
Tension: H185 had inferred that publishers declaring IAB v1.1 OWNERDOMAIN / INVENTORYPARTNERDOMAIN would resolve cleaner. H186 refuted that — the cascade outcome correlates with SSP-count regime (bimodal), not with directive adoption. H187's per-cell resolution then surfaces this HuffPost-Yahoo case as direct counter-evidence: the publisher is a wholly-owned subsidiary of the SSP, the ownership relation is unambiguous, no directive is needed to declare it — and the cascade still fails. The break isn't in publishers neglecting to declare ownership. It's in the registries not reflecting the ownership relations the operators themselves already know.
Exhibit C · H188 · the second generator class

8 BuySellAds customer files. Same MD5 hash. 524 of 528 lines are impersonation under directive cover.

H187's apex z-score table surfaced a second cluster: 8 cells against buysellads.com at z=25.7–25.9σ, n_direct=528, ~99% impersonation. Primary-source fetch shows the BuySellAds managed-publisher service template-pastes the entire 532-seller PUBLISHER roster into every customer's ads.txt, then declares MANAGERDOMAIN=buysellads.com at the end — the IAB v1.1 directive is correctly present. 1stwebdesigner.com and html.com are byte-identical (same MD5 6e1e0a18cfe9c274deeb3c978eb1c851); gameinfo.io differs by one prepended line and the same 1,316-line template tail. So the cascade's impersonation_undisclosed verdict has at least two distinct generators that need separate remediation:

class directive declared? mechanism apex example apex cells (rate≥95%, n≥100)
A none operator's registry doesn't reflect ownership HuffPost × Yahoo + yieldlove + stroeer + 60 others 298 (50.3%)
B MANAGERDOMAIN=$ssp manager template-pastes full customer roster bloxdigital 286 + BSA 8 294 (49.7%)
H189 update: when H188 first identified the BuySellAds Class-B pattern, it looked operator-specific. H189 cross-tabbed the apex tier (592 cells with n_direct ≥ 100 and impersonation rate ≥ 95%) against the 17,952 publishers who declare MANAGERDOMAIN, and found Class B (narrow) is 49.7% of the apex tier. bloxdigital.com (BLOX Digital, Lee Enterprises subsidiary NYSE:LEE) is 286 of 294 Class B-narrow cells. Three sampled BLOX customer ads.txt files (thegazette.com / fox21online.com / wdel.com) have byte-identical bloxdigital line blocks (Jaccard = 1.0000, 236 lines each); 195 lines appear across all 5 sampled BLOX customers.

H190 broader Class B: if the publisher declares any MANAGERDOMAIN (not just one matching the cell SSP), Class B rises to 63.5% of apex (376/592). Freestar 330 cells, all targeting bloxdigital.com — Freestar propagates the BLOX template into the BLOX-network publishers it co-manages. The remaining 36.5% Class A residual is the registry-mapping problem (HuffPost-Yahoo + yieldlove + stroeer + tail).

Registry reciprocity gap: per IAB v1.1, when publisher P declares MANAGERDOMAIN=M, M's sellers.json should list P as a seller. Of the 8 BSA-victim publishers, 5 declare MANAGERDOMAIN=buysellads.com while BuySellAds' sellers.json does not list them. The directive doesn't validate the template's content — and the manager's registry doesn't reciprocate the declaration.
Exhibit D · H190 · the bimodal split among managers

The IAB v1.1 spec scales. Some managers use it. CafeMedia and Mediavine are clean across 3,327 publishers.

Surveying the 6 largest managers reveals the pattern; H192 extends to all 85 managers with ≥ 50 publishers. H193 closes the question: the trimodal split is actually bimodal under a stricter threshold. Lowering the apex filter from n_direct ≥ 100 to n_direct ≥ 10 reveals that the "intermediate" tier was just smaller-template paste: themoneytizer 0→212 apex, pubfuture 0→149, pubrev 0→107, yieldmonk 0→97. CafeMedia + Mediavine remain at 0 apex at every threshold. The real split is 2 of 85 hygienic vs 83 of 85 template-paste at some scale.

manager pubs imp% phantom% disclosed% apex cells verdict
cafemedia.com1,8463.614.0265.160hygienic
mediavine.com1,4814.234.9058.600hygienic
ezoic.ai8938.3311.9828.150intermediate
publift.com61615.9224.0413.340intermediate
themoneytizer.com1,01324.3435.272.370intermediate
freestar.com96042.5513.8012.47330template (BLOX)
CafeMedia + Mediavine = 3,327 publishers · 0 apex cells at every threshold · ~60–70% cleanly disclosed. These are the counter-evidence that the IAB v1.1 spec scales. The cascade headline (30.51% phantom, 88% non-validating DIRECT) is the average; the hygienic-vs-template-paste manager split shows the population is not uniformly broken. H193 finding: 83 of 85 managers template-paste at some scale — apex visibility just depends on template size. The 2 hygienic managers prove the practice is possible. The other 83 prove it's not common. The break is not in the protocol; it's in operator practice.

H194 mechanism correction: the hygienic mechanism is not "IPD per partner" as earlier exhibits implied. Wayback traces show CafeMedia adopted OWNERDOMAIN+MANAGERDOMAIN in 2022–2024; Mediavine experimented with INVENTORYPARTNERDOMAIN from 2023-09 to 2024-06 then abandoned it. Neither uses IPD today. The actual mechanism: upstream SSPs (triplelift contributes 31K disclosed cells to the CafeMedia cohort, sharethrough 6K, indexexchange 4K, …) register seller_ids with reg_domain pointing at the manager (raptive.com/cafemedia.com) or the publisher domain, AND the publisher's MANAGERDOMAIN/OWNERDOMAIN directive matches. Two-side reciprocity at the upstream registry level. CafeMedia's own SSP entry contributes only 447 of 100,502 (0.4%) disclosed credentials in its cohort.

H195 primary-source confirmation: fetched triplelift.com/sellers.json (DUNS 063519038, 7,872 sellers). 23 entries explicitly named "CMI Marketing, Inc. d/b/a Raptive" with domain=cafemedia.com. Single seller_id 4800 is cited by 1,843 of 1,856 publishers (99.3% reciprocity). Each of the 8 sampled CMI sids reaches ≥ 90% of CafeMedia's managed-publisher cohort.

H196 generalization: probed 5 more upstream SSPs to test if the mechanism generalizes. All 5 reciprocate for both managers:
upstream SSP CM sids MV sids attribution name (sample)
triplelift.com23"CMI Marketing, Inc. d/b/a Raptive"
themediagrid.com14"CMI Marketing d/b/a Raptive" / "MediaVine"
sharethrough.com112"CMI Marketing Inc dba Raptive CoinDesk"
indexexchange.com85"CMI Marketing, Inc. d/b/a Raptive" / "Mediavine, Inc."
openx.com105"CMI Media Group dba CMI Marketing, Inc. d.b.a Raptive dba CMI Media LLC"
media.net23"CMI Marketing, Inc. d/b/a Raptive" / "Mediavine, Inc"
55+ CafeMedia + 19+ Mediavine named-entity sids across 6 upstream SSPs. sharethrough goes further with per-property differentiation: each top managed property (CoinDesk, Daily Hive, …) gets its own CMI-attributed sid. openx has the longest legal-entity chain ("CMI Media Group dba CMI Marketing, Inc. d.b.a Raptive dba CMI Media LLC").

H197 refutation (negative control): the H195/H196 framing was incomplete. Probing 8 template-paste managers found ALL of them have substantial named-entity reciprocity too — Freestar has 922 named sids across 147 SSPs, MORE than CafeMedia's 266. The differentiator is not reciprocity per se. The actual mechanism is the fraction of cohort DIRECT credentials whose seller_id reg_domain equals the manager domain: CafeMedia 67.86%, Mediavine 62.55%, vs Freestar 15.94%, TheMoneytizer 6.06%, BLOX 5.34%. Two-factor decomposition: (a) template size — CafeMedia/Mediavine publishers carry ~69-84 DIRECT/pub; Freestar 301, BLOX 593, TheMoneytizer 745 — and (b) manager-attributed-sid usage ratio: hygienic templates predominantly cite the manager's named sids; template-paste templates dilute them with hundreds of other-publisher-attributed sids.

H198 victim distribution: if hygienic templates cite manager-attributed sids, what do template-paste templates cite? Enumerated the 109,254 victim mentions in 592 apex cells: 1,558 distinct reg_domains, long-tailed (top 200 = 64.5% of mentions; no concentrated cabal of "stolen identities"). Partitioned: 73.6% of distinct victims (1,146 of 1,558) never publish their own ads.txt. They can't reciprocate even if they wanted to. The cascade flags impersonation correctly, but the structural mechanism is untraceable attribution — upstream SSPs' sellers.json files attribute seller_ids to many small domains that don't participate in the spec, and template-paste managers route customer ads.txt through those untraceable attributions.

Final fix recommendation (three actors, two-side fix):
  1. Upstream SSPs: stop attributing seller_ids to domains that have never published ads.txt — mark them confidential instead. ~62% of apex "impersonation" comes from these untraceable attributions.
  2. Template-paste managers: reduce template size to what each publisher actually monetizes; keep the template dominated by manager-attributed sids (CafeMedia + Mediavine prove this scales).
  3. IAB v1.1 cascade: consider a verdict subclass unverifiable_attribution for reg_domain that doesn't itself publish ads.txt — distinguishing structural attribution gaps from genuine impersonation.
Exhibit E · H191 · refined taxonomy

Class A and Class B share the same mechanism. The A/B label measures whether the directive was declared, not what generates the cell.

The H190 framing implicitly treated A (no directive) and B (MGRDOM declared) as different generators. H191 primary-source-traced the 113 German publishers behind 212 of 216 Class A cells: all carry the #ads.txtfileStroeer2026_05_18 template-generation marker; 4 of 5 sampled files have byte-identical bodies (differing only in glomex timestamp comment); none declare any directive. Same template-paste mechanism as BLOX/BSA, but without the IAB v1.1 declaration. The actual ownership-uncovered residual (HuffPost-Yahoo class) is small — perhaps 16 cells.

sub-class mechanism directive declared? apex example apex cells (est.)
B-narrowtemplate pasteMGRDOM = cell SSPBLOX + BSA294
B-broadtemplate pasteMGRDOM = other managerFreestar → BLOX82
A-template (NEW)template pastenone declaredStroeer + yieldlove German cluster~200
A-ownershipcorporate ownership uncoverednone declaredHuffPost × Yahoo~16
Template paste = ~96% of apex cascade tier. Under H191's refined taxonomy, ~576 of 592 apex cells (B-narrow + B-broad + A-template) share the template-paste mechanism. The remaining ~16 are genuine ownership-mapping issues (HuffPost-Yahoo class). The A/B partition the cascade computes is real — but the difference is "did the publisher declare MANAGERDOMAIN" rather than "what generated the impersonation." The fix recommendation collapses to one: either managers stop pasting non-customer seller_ids, or publishers declare MANAGERDOMAIN per spec (which would surface the operator-side practice for IPD-per-partner remediation).
Tripwires test_h187_huffpost_yahoo_cell_aberration.py (Class A apex), test_h188_buysellads_template_paste.py (Class B apex specimens), and test_h189_class_b_apex_prevalence.py (Class B share ≥ 30% of the 592-cell apex tier) watch the 6 + 8 + 294 cells respectively. Drops below thresholds surface INFO-level structural shifts — corrective remediation is the desired outcome, not a regression.

Four findings.

scroll · or jump to verifier
F1 · false DIRECT claims
29%
ads.txt says DIRECTpublisher's claim
sellers.json says INTERMEDIARYmiddleman

97.5% is framework decay — dead SSPs, schema migrations, orphan registries. Only ~2.5% has the shape of misconduct.

see full proof in Exhibit A
F2 · valid consent rate
0.012%
tracking ID sent
consent banner shown
0s
1s
2s
3s
4s
2–3 s of data sent before consent asked

87 of 721,129 first-visit shares carried valid GDPR consent.

F3 · trackers per page
5→ 294
Typicalglobal avg
5
Worstsingle page
294

3rd-party companies your browser sends your tracking ID to, per page load.

F4 · verifiable share
~5%
0 25 50 75 100
5%
10%
85% no ads.txt at all
5% verifieddeclared & confirmed
10% partialdeclared, fails check
85% noneno public file

Of 100 ad-tech sites: only 5 publish a file and pass the check.

F1 expanded · named offenders

Wrapper provider scorecard.

195 header-bidding providers graded by the phantom-DIRECT rate of their managed publishers. The wrapper, not the individual publisher, is where templates propagate.

top 8 by publishers managedall 195 in dataset · CC0
Provider Publishers False-DIRECT rate Trend 6mo Grade
CafeMedia / Raptivecafemedia.com530
23.3%
B
Mediavinemediavine.com306
30.1%
B
Playwireplaywire.com201
48.5%
C
Freestarfreestar.com455
55.4%
D
Ezoicezoic.com267
56.9%
D
The Moneytizerthemoneytizer.com429
66.2%
F
Publiftpublift.com291
69.5%
F
Pubnationpubnation.com73
70.4%
F
grade scale: A <20%  ·  B 20–35%  ·  C 35–50%  ·  D 50–60%  ·  F >60% detection via ads.txt 1.1 MANAGERDOMAIN field

Audit any publisher in ≤ 10 seconds.

resolves ads.txt × sellers.json · returns grade /
try
Method & reproduce
F1 · cross-referenceads.txt and sellers.json fetched at the current corpus snapshot: 86,357 publishers · 33.39M (publisher, SSP, seller_id) triples · 6.43M DIRECT credentials · 5,454 SSPs cited / 1,548 with retrievable registries · 3.52M registered sellers (2026-05-29, post-dedup: a 10.1M-row duplicate-removal pass shrank the triple count from the prior 43.52M). Phantom seller IDs (the cited seller_id appears in no registry under any alias) measured at 31.05% at triple level / 30.51% at DIRECT-credential level. The v6 cascade further decomposes: contradicted 27.24%, impersonated 24.90%, confidential-unverifiable 2.05%, no-registry-available 3.72%. Only 11.58% (valid + disclosed) cleanly validates. H187 added per-(publisher, SSP) cell resolution — 2,327,455 cells × ~100ms primary-key lookup, surfacing aberrations invisible at the publisher-aggregated level (z > 20σ on 25 cells; HuffPost-Yahoo at z=29σ).
F2 · consent captureHeadless browser, 110,610 sites, every HTTP request logged. Filtered for known sync patterns. TCF v2 validated by version prefix, length, base64. First-visit only. 0 of 2,000 captured Prebid.js instances configure a consent management module.
F3 / F4 · identity & structurePlaywright headless, 4 s standard scan, 142,630 sites. 603 known ad-tech domains tracked. F4 = 15% × 47% × 76% ≈ 5.1% (true value 4–7%). Cycles 458–488 of structural analysis · H-series H126–H199 (56 production tripwires). Page framing is structural — a description of the equilibrium — not an allegation of fraud. Bugs get fixed. Equilibria persist.
verify.py · 43 lines · CC0 · zero dependencies
same logic that produced the 1,157,099-contradiction headline figure run: python verify.py themoneytizer.com
Since you opened this page +0 new contradictions documented · +0 identity shares globally · 0 valid consent strings