The Accountability Gap

Programmatic advertising supply chain integrity, consent failure, and identity proliferation.
Evidence from 142,000 websites, 84 SSP registries, 422,000 identity-sharing requests.
Data collected March 14–23, 2026.

Verify in 10 seconds: The ad authorization system contradicts itself.

1. Fetch: ads.themoneytizer.com/ads_txt.php → "smartadserver.com, 1097, DIRECT" 2. Check: smartadserver.com/sellers.json → seller 1097 → seller_type: "INTERMEDIARY", name: "Themoneytizer"
Same entity. Opposite claims. Both public. We found 962,891 similar contradictions.

Finding 1: Authorization Is Forged

Core claim

29% of "DIRECT" authorization claims are explicitly contradicted by the SSPs' own registries. A further 26% reference seller IDs that don't exist. Cross-reference of 1,757,362 triples across 21,397 publishers against 84 SSP registries.

When a publisher lists a company as "DIRECT" in their ads.txt file, it means: "I control this seller account. I authorized this directly." Advertisers pay more for DIRECT inventory because it's supposed to mean fewer intermediaries and less fraud risk.

We checked 1,757,362 unique (publisher, SSP, seller_id) triples against the SSPs' own registries:

Category	Count	How verified
Contradicted	503,387 (29%)	SSP says INTERMEDIARY, not PUBLISHER
Phantom	459,504 (26%)	Seller ID doesn't exist in registry
Plausible	793,727 (45%)	Registry confirms PUBLISHER or BOTH type, or no registry available

Reading this table: "Contradicted" is unambiguous — the SSP explicitly classifies the account as INTERMEDIARY, contradicting the DIRECT claim. "Phantom" is ambiguous — the seller ID may be stale, fabricated, or hidden behind Google's 71% confidentiality flag. "Plausible" includes confirmed PUBLISHER entries, BOTH-type entries (may act as either), and claims against SSPs with no registry. If intermediaries exploit the BOTH classification as a loophole, the true false rate is between 55% and 65%. The strict rate (contradicted only, no interpretation) is 29%. Both rates are stable across successive SSP expansions and across both curated and crawled publisher datasets.

Why it happens

Intermediaries (AnyManager, PubFuture, Waardex, YieldMonk) distribute template ads.txt blocks to publishers with DIRECT instead of RESELLER. DIRECT commands higher CPMs and lower SSP fees. The intermediary's margin becomes invisible to the buyer.

Counter-argument preempted: Could "DIRECT" legitimately mean "direct business relationship" rather than the spec's "controls the account"? If so, some CONTRADICTED claims might be legitimate. Testing this: 96.6% of contradicted claims involve seller IDs shared by 6+ publishers (59.5% by 51–500 publishers, 22.5% by 500+). At most 3.4% could represent individual direct relationships. Under any interpretation, the remaining 96.6% is industrial-scale template injection.

SSPs most affected by false DIRECT claims

SSP Domain	False Claims	False Rate
lijit.com (Sovrn)	68,785	83%
google.com	64,743	45% (all phantom — Google uses confidential flag)
rubiconproject.com (Magnite)	62,495	87%
taboola.com	53,869	62%
onetag.com	50,841	82%
pubmatic.com	41,912	60%
indexexchange.com	38,915	51%
openx.com	36,608	76%
triplelift.com	33,767	62%
appnexus.com (Xandr)	29,192	60%

Google's sellers.json has 71% of entries marked confidential (is_confidential: true). Every other SSP: 0%. Google's 64,743 false claims are all PHANTOM (seller ID not found) — they could be hidden behind the confidentiality flag. Excluding Google, the strict contradicted rate across the remaining SSPs is 38%.

Method: Downloaded ads.txt from 21,397 publishers (Tranco top-1M + 75,000-domain crawler harvest). Fetched sellers.json from 84 SSPs (1.56M total seller entries, including Google's 650K). For each DIRECT claim, looked up the seller_id in the SSP's registry. If the registry says INTERMEDIARY → contradicted. If the seller_id doesn't exist → phantom. If it says PUBLISHER → plausible. Deduplicated by (publisher, SSP, seller_id) triple. Malformed seller_ids (containing spaces, filenames, etc.) filtered.

Verify any publisher now:

Machine-readable data: false_direct_claims.jsonl — 962,891 records, one per triple, with publisher, SSP, seller_id, registry_type, and verdict.

Finding 2: Consent Is Absent

Core claim

0.012% of identity-sharing requests carry valid consent on first visit. ~87 genuine consent strings in 721,129 sync requests. Not broken — never built.

Under GDPR, companies must obtain consent before sharing a user's identity with third parties. In programmatic advertising, "cookie syncing" is the mechanism that shares identity — Company A tells Company B "your user #X is my user #Y."

We captured 721,129 cookie sync requests across 186,000 websites and checked whether they carried the required TCF consent string:

Consent Status	Requests	Distribution
No consent parameter at all	210,980 (77.3%)
Parameter present but empty	60,109 (22.0%)
Unresolved template macro	1,794 (0.66%)
Valid TCF consent string	34 (0.012%)

What the user experiences

0.0 seconds — Page load begins

Ad scripts start executing. Cookie sync requests fire immediately.

0.1–0.5 seconds — Identity propagation

User IDs shared across companies. No consent obtained yet.

1–3 seconds — Full graph assembled

Dozens to hundreds of companies now know this user visited this page. The worst observed: 294 in one load.

2–5 seconds — Consent banner appears

User sees "Do you accept cookies?" The data has already been shared.

0 of 2,000 captured Prebid.js instances configure the consent management module. The Prebid Mobile SDK (decompiled from production binary) defaults to "full device access" when no consent management platform is present — including broadcasting the device's permanent Advertising ID to all bidders.

Per-company consent rates

Company	Syncs Captured	Has Consent Field	Has Valid Value
Google Ads	27,129	0%	0.01%
ID5	10,988	19%	0.01%
Trade Desk	10,682	43%	0.01%
Xandr	8,365	23%	0.05%
Magnite	8,141	18%	0.04%
Criteo	6,617	27%	0.14%

Method: Headless browser crawled 110,610 unique sites (the dataset at the time of consent analysis), capturing all HTTP requests. Filtered for known sync URL patterns (cookie_sync, usersync, setuid, mapuid, getuid, etc.). Parsed consent parameters (gdpr_consent, euconsent, consent) from query strings. TCF v2 strings validated by checking version prefix, length, and base64 encoding. This measures first-visit behavior — before the user has interacted with any consent banner. The crawl has since expanded to 142,630 sites; consent analysis covers the 110,610-site subset.

Finding 3: Identity Proliferates

Core claim

The average ad-tech-enabled website shares user identity with 5 companies. The worst shares with 294 in a single page load.

Metric	Value
Sites scanned	142,630
Total requests captured	2,586,662
Total sync requests	422,308
Average companies per site (with adtech)	5.1
Maximum (single page load)	294

Geographic differential

Region	Sites	Avg Entities	Avg Syncs/Site
.jp (Japan)	1,605	8.1	4.6
.com (US-dominated)	53,194	5.7	3.4
.br (Brazil)	1,660	5.6	2.7
.co.uk (United Kingdom)	1,947	5.0	5.7
.ru (Russia)	2,990	3.5	2.2
.fr (France)	1,265	3.7	1.3
.de (Germany)	2,429	2.7	0.6

EU sites average 23% fewer tracking entities than non-EU sites (3.7 vs 4.8 across 135,000 sites). Within the EU, UK and German sites are both GDPR. UK sites average 5.7 syncs per page; German sites average 0.6. The 10x difference is publisher configuration, not regulation. The law is the same. Compliance is not.

Top identity graph hubs

Company	Sync Partners	Role
Trade Desk	15	DSP — buys audiences
Xandr (Microsoft)	14	SSP/DSP hybrid
Magnite	12	SSP — sells inventory
Criteo	11	Retargeting
PubMatic	10	SSP
ID5	10	Identity resolution
BidSwitch	9	Exchange connector

Method: Playwright-based headless browser, 25 parallel instances, 4-second page loads (8s for deep scans with Prebid hook). 142,630 unique sites from Tranco 1M, tiered scheduling (top 10K every 4h, mid 100K every 24h, tail 1M every 72h). Every HTTP request matched against 603 known ad-tech domains (240 companies). Sync requests identified by URL pattern matching. Identity graph edges defined by co-occurrence.

Finding 4: The Structure

Core claim

Approximately 4% of ad-tech activity on the web falls within a functioning authorization framework. The three findings above are not three separate failures. They are one system at its operating temperature.

The authorization framework (ads.txt) is opt-in. Most of the web opted out:

Layer	Coverage	How measured
Sites with ads.txt	15% of ad-tech-enabled sites	Of 44,176 crawler scans showing ad-tech activity, 6,749 have a valid ads.txt. 37,427 do not.
DIRECT claims that are genuine	45%	Of 1,757,362 verified triples, 793,727 are plausible (Finding 1).
Companies covered by ads.txt	76%	Across 1,170 sites with both scan data and ads.txt, 24% of observed companies have no ads.txt entry.
Net authorization	~5%	0.15 × 0.45 × 0.76 ≈ 0.051

The 24% of companies that arrive without authorization are not random. They are the identity infrastructure — companies that build cross-site profiles:

Company	Sites observed	Syncs (48h)	Authorized in ads.txt?
Trade Desk	2,700	10,075	No — arrives through creatives
LinkedIn	2,655	6,882	No
ID5	2,095	10,243	No
Microsoft Clarity	1,918	2,505	No
Lotame	1,551	2,554	No
LiveRamp	1,042	2,120	No
Tapad	1,030	2,762	No

Two delivery mechanisms. Trade Desk, ID5, LiveRamp, Lotame, and Tapad appear almost exclusively on sites with heavy ad-tech (0–0.6% presence on low-adtech sites). They arrive through ad creatives — the publisher never installed them. Meta and Microsoft Clarity also appear on low-adtech sites (4–6%), indicating publisher installation. The brief's "24% unauthorized" figure includes both groups. The narrower claim — companies arriving uninvited through creatives — applies to the first group. Ads.txt covers SSP authorization only; none of these 7 companies are SSPs, so their absence from ads.txt is expected. The gap is not that they lack authorization — it's that no authorization framework covers them at all.

Why it persists

16 intermediary accounts appear in more than half of all ads.txt files on the internet. The most ubiquitous (Rubicon seller 17960) is in 61% of files. These entries arrive through templates distributed by intermediaries to thousands of publishers. The publisher didn't choose them; a template chose them.

The result is a Nash equilibrium. Every participant profits from every check not working:

Actor	Incentive to not check
Publishers	False DIRECT claims → higher CPMs (DIRECT commands premium pricing)
Intermediaries	Template injection → invisible margin (buyer doesn't know they exist)
SSPs	Larger supply pool (enforcing registries would shrink inventory)
DSPs	Richer identity data (surveillance chain provides targeting signals)

The only parties harmed are advertisers (paying DIRECT prices for intermediary inventory) and users (identity shared without authorization or consent). Neither party has visibility into the system that harms them.

This is a model of incentives, not a proof of intent. But ads.txt was introduced in 2017. Nine years later, the false DIRECT rate has not converged toward zero. Bugs get fixed. Equilibria persist.

Method: The 5% figure is the product of three independently measured rates: (1) ads.txt adoption among ad-tech-enabled sites (15%, from 44,176 crawler scans with ad-tech activity), (2) DIRECT claim plausibility (47%, from 1,757,362-triple cross-reference), (3) authorized company coverage (76%, from 1,170 sites with both scan data and ads.txt). The multiplication assumes approximate independence — verified by checking that plausible rates don't vary significantly between high-surveillance and low-surveillance sites. The 24% unauthorized figure counts companies identified by domain matching against 603 known ad-tech domains. Template convergence measured across 8,095 normalized ads.txt files.

Known Weaknesses

1. Sample bias. 21,397 publishers from Tranco top-1M + 75,000-domain crawler harvest. Biased toward popular Western commercial sites. This is the relevant population for programmatic authorization, but not the internet.

2. sellers.json freshness. Snapshots from March 17–23, 2026. SSPs can reclassify sellers — an INTERMEDIARY today could become PUBLISHER tomorrow. Phantom claims could resolve if SSPs add entries. Both verdicts are point-in-time. The strict contradicted rate (29%) is the most defensible number; it requires the SSP to have actively classified the seller as INTERMEDIARY at snapshot time.

3. First-visit consent. Returning users with consent cookies likely show higher consent rates. The first-visit case is the privacy-critical one — data flows before consent is possible — but should not be generalized to all visits.

4. Quick crawl undercounting. Standard scans use 4-second page loads. Deep scans (8+ seconds with Prebid hook) see 2.4x more tracking entities. The average of 5.1 entities per ad-tech-enabled site is from standard scans; fully-loaded pages likely average 7–10.

5. Two rates, one dataset. The strict rate (29%) counts only claims where the SSP explicitly says INTERMEDIARY. The inclusive rate (55%) also counts phantom seller IDs. Both are stable across successive SSP expansions and across both curated (top-1000) and independently crawled (long-tail) publisher populations. The strict rate is the floor no one can argue with. The inclusive rate is the ceiling that depends on whether phantom IDs are fabricated or merely stale. Google's 71% confidentiality flag makes their phantom claims genuinely ambiguous.

6. The 5% estimate. The net authorization figure multiplies three rates that are measured independently and assumed to be approximately independent. The individual measurements are solid; the multiplication is approximate. The true figure could be 4–7% depending on correlation structure. The point is the order of magnitude, not the decimal.

7. Bot detection. The crawler uses headless Chromium with anti-detection flags but no mouse movement or scrolling simulation. Sites with advanced bot detection (Cloudflare Bot Management, DataDome, PerimeterX) may serve reduced ad content to detected bots. This biases our entity counts downward — the real tracking load on human visitors is likely higher than what we measured.

Data Files

All data is machine-readable, standalone (no database required), and independently verifiable:

File	Records	Description
`false_direct_claims.jsonl`	962,891	Every (publisher, SSP, seller_id) triple with verdict. Deduplicated.
`supply_chain_summary.json`	1	Aggregate totals matching the JSONL exactly.
`publisher_profiles.jsonl`	8,734	Per-publisher ads.txt depth and crawl traffic.
`identity_graph.json`	5,816 edges	Sync co-occurrence graph across 201 companies.
`consent_measurement.json`	1	Per-company consent field presence rates.
`crawl_summary.json`	1	Site distribution and geographic breakdown.