The Accountability Gap

Programmatic advertising supply chain integrity, consent failure, and identity proliferation.
Evidence from 142,000 websites, 84 SSP registries, 422,000 identity-sharing requests.
Data collected March 14–23, 2026.

Verify in 10 seconds: The ad authorization system contradicts itself.

1. Fetch: ads.themoneytizer.com/ads_txt.php → "smartadserver.com, 1097, DIRECT" 2. Check: smartadserver.com/sellers.json → seller 1097 → seller_type: "INTERMEDIARY", name: "Themoneytizer"
Same entity. Opposite claims. Both public. We found 962,891 similar contradictions.

Finding 1: Authorization Is Forged

Core claim

29% of "DIRECT" authorization claims are explicitly contradicted by the SSPs' own registries. A further 26% reference seller IDs that don't exist. Cross-reference of 1,757,362 triples across 21,397 publishers against 84 SSP registries.

When a publisher lists a company as "DIRECT" in their ads.txt file, it means: "I control this seller account. I authorized this directly." Advertisers pay more for DIRECT inventory because it's supposed to mean fewer intermediaries and less fraud risk.

We checked 1,757,362 unique (publisher, SSP, seller_id) triples against the SSPs' own registries:

CategoryCountDistributionHow verified
Contradicted503,387 (29%)
SSP says INTERMEDIARY, not PUBLISHER
Phantom459,504 (26%)
Seller ID doesn't exist in registry
Plausible793,727 (45%)
Registry confirms PUBLISHER or BOTH type, or no registry available

Reading this table: "Contradicted" is unambiguous — the SSP explicitly classifies the account as INTERMEDIARY, contradicting the DIRECT claim. "Phantom" is ambiguous — the seller ID may be stale, fabricated, or hidden behind Google's 71% confidentiality flag. "Plausible" includes confirmed PUBLISHER entries, BOTH-type entries (may act as either), and claims against SSPs with no registry. If intermediaries exploit the BOTH classification as a loophole, the true false rate is between 55% and 65%. The strict rate (contradicted only, no interpretation) is 29%. Both rates are stable across successive SSP expansions and across both curated and crawled publisher datasets.

Why it happens

Intermediaries (AnyManager, PubFuture, Waardex, YieldMonk) distribute template ads.txt blocks to publishers with DIRECT instead of RESELLER. DIRECT commands higher CPMs and lower SSP fees. The intermediary's margin becomes invisible to the buyer.

Counter-argument preempted: Could "DIRECT" legitimately mean "direct business relationship" rather than the spec's "controls the account"? If so, some CONTRADICTED claims might be legitimate. Testing this: 96.6% of contradicted claims involve seller IDs shared by 6+ publishers (59.5% by 51–500 publishers, 22.5% by 500+). At most 3.4% could represent individual direct relationships. Under any interpretation, the remaining 96.6% is industrial-scale template injection.

SSPs most affected by false DIRECT claims

SSP DomainFalse ClaimsFalse Rate
lijit.com (Sovrn)68,78583%
google.com64,74345% (all phantom — Google uses confidential flag)
rubiconproject.com (Magnite)62,49587%
taboola.com53,86962%
onetag.com50,84182%
pubmatic.com41,91260%
indexexchange.com38,91551%
openx.com36,60876%
triplelift.com33,76762%
appnexus.com (Xandr)29,19260%

Google's sellers.json has 71% of entries marked confidential (is_confidential: true). Every other SSP: 0%. Google's 64,743 false claims are all PHANTOM (seller ID not found) — they could be hidden behind the confidentiality flag. Excluding Google, the strict contradicted rate across the remaining SSPs is 38%.

Method: Downloaded ads.txt from 21,397 publishers (Tranco top-1M + 75,000-domain crawler harvest). Fetched sellers.json from 84 SSPs (1.56M total seller entries, including Google's 650K). For each DIRECT claim, looked up the seller_id in the SSP's registry. If the registry says INTERMEDIARY → contradicted. If the seller_id doesn't exist → phantom. If it says PUBLISHER → plausible. Deduplicated by (publisher, SSP, seller_id) triple. Malformed seller_ids (containing spaces, filenames, etc.) filtered.
Verify any publisher now:

Machine-readable data: false_direct_claims.jsonl — 962,891 records, one per triple, with publisher, SSP, seller_id, registry_type, and verdict.

Finding 2: Consent Is Absent

Core claim

0.012% of identity-sharing requests carry valid consent on first visit. ~87 genuine consent strings in 721,129 sync requests. Not broken — never built.

Under GDPR, companies must obtain consent before sharing a user's identity with third parties. In programmatic advertising, "cookie syncing" is the mechanism that shares identity — Company A tells Company B "your user #X is my user #Y."

We captured 721,129 cookie sync requests across 186,000 websites and checked whether they carried the required TCF consent string:

Consent StatusRequestsDistribution
No consent parameter at all210,980 (77.3%)
Parameter present but empty60,109 (22.0%)
Unresolved template macro1,794 (0.66%)
Valid TCF consent string34 (0.012%)

What the user experiences

0.0 seconds — Page load begins

Ad scripts start executing. Cookie sync requests fire immediately.

0.1–0.5 seconds — Identity propagation

User IDs shared across companies. No consent obtained yet.

1–3 seconds — Full graph assembled

Dozens to hundreds of companies now know this user visited this page. The worst observed: 294 in one load.

2–5 seconds — Consent banner appears

User sees "Do you accept cookies?" The data has already been shared.

0 of 2,000 captured Prebid.js instances configure the consent management module. The Prebid Mobile SDK (decompiled from production binary) defaults to "full device access" when no consent management platform is present — including broadcasting the device's permanent Advertising ID to all bidders.

Per-company consent rates

CompanySyncs CapturedHas Consent FieldHas Valid Value
Google Ads27,1290%0.01%
ID510,98819%0.01%
Trade Desk10,68243%0.01%
Xandr8,36523%0.05%
Magnite8,14118%0.04%
Criteo6,61727%0.14%
Method: Headless browser crawled 110,610 unique sites (the dataset at the time of consent analysis), capturing all HTTP requests. Filtered for known sync URL patterns (cookie_sync, usersync, setuid, mapuid, getuid, etc.). Parsed consent parameters (gdpr_consent, euconsent, consent) from query strings. TCF v2 strings validated by checking version prefix, length, and base64 encoding. This measures first-visit behavior — before the user has interacted with any consent banner. The crawl has since expanded to 142,630 sites; consent analysis covers the 110,610-site subset.

Finding 3: Identity Proliferates

Core claim

The average ad-tech-enabled website shares user identity with 5 companies. The worst shares with 294 in a single page load.

MetricValue
Sites scanned142,630
Total requests captured2,586,662
Total sync requests422,308
Average companies per site (with adtech)5.1
Maximum (single page load)294

Geographic differential

RegionSitesAvg EntitiesAvg Syncs/Site
.jp (Japan)1,6058.14.6
.com (US-dominated)53,1945.73.4
.br (Brazil)1,6605.62.7
.co.uk (United Kingdom)1,9475.05.7
.ru (Russia)2,9903.52.2
.fr (France)1,2653.71.3
.de (Germany)2,4292.70.6

EU sites average 23% fewer tracking entities than non-EU sites (3.7 vs 4.8 across 135,000 sites). Within the EU, UK and German sites are both GDPR. UK sites average 5.7 syncs per page; German sites average 0.6. The 10x difference is publisher configuration, not regulation. The law is the same. Compliance is not.

Top identity graph hubs

CompanySync PartnersRole
Trade Desk15DSP — buys audiences
Xandr (Microsoft)14SSP/DSP hybrid
Magnite12SSP — sells inventory
Criteo11Retargeting
PubMatic10SSP
ID510Identity resolution
BidSwitch9Exchange connector
Method: Playwright-based headless browser, 25 parallel instances, 4-second page loads (8s for deep scans with Prebid hook). 142,630 unique sites from Tranco 1M, tiered scheduling (top 10K every 4h, mid 100K every 24h, tail 1M every 72h). Every HTTP request matched against 603 known ad-tech domains (240 companies). Sync requests identified by URL pattern matching. Identity graph edges defined by co-occurrence.

Finding 4: The Structure

Core claim

Approximately 4% of ad-tech activity on the web falls within a functioning authorization framework. The three findings above are not three separate failures. They are one system at its operating temperature.

The authorization framework (ads.txt) is opt-in. Most of the web opted out:

LayerCoverageHow measured
Sites with ads.txt15% of ad-tech-enabled sitesOf 44,176 crawler scans showing ad-tech activity, 6,749 have a valid ads.txt. 37,427 do not.
DIRECT claims that are genuine45%Of 1,757,362 verified triples, 793,727 are plausible (Finding 1).
Companies covered by ads.txt76%Across 1,170 sites with both scan data and ads.txt, 24% of observed companies have no ads.txt entry.
Net authorization~5%0.15 × 0.45 × 0.76 ≈ 0.051

The 24% of companies that arrive without authorization are not random. They are the identity infrastructure — companies that build cross-site profiles:

CompanySites observedSyncs (48h)Authorized in ads.txt?
Trade Desk2,70010,075No — arrives through creatives
LinkedIn2,6556,882No
ID52,09510,243No
Microsoft Clarity1,9182,505No
Lotame1,5512,554No
LiveRamp1,0422,120No
Tapad1,0302,762No

Two delivery mechanisms. Trade Desk, ID5, LiveRamp, Lotame, and Tapad appear almost exclusively on sites with heavy ad-tech (0–0.6% presence on low-adtech sites). They arrive through ad creatives — the publisher never installed them. Meta and Microsoft Clarity also appear on low-adtech sites (4–6%), indicating publisher installation. The brief's "24% unauthorized" figure includes both groups. The narrower claim — companies arriving uninvited through creatives — applies to the first group. Ads.txt covers SSP authorization only; none of these 7 companies are SSPs, so their absence from ads.txt is expected. The gap is not that they lack authorization — it's that no authorization framework covers them at all.

Why it persists

16 intermediary accounts appear in more than half of all ads.txt files on the internet. The most ubiquitous (Rubicon seller 17960) is in 61% of files. These entries arrive through templates distributed by intermediaries to thousands of publishers. The publisher didn't choose them; a template chose them.

The result is a Nash equilibrium. Every participant profits from every check not working:

ActorIncentive to not check
PublishersFalse DIRECT claims → higher CPMs (DIRECT commands premium pricing)
IntermediariesTemplate injection → invisible margin (buyer doesn't know they exist)
SSPsLarger supply pool (enforcing registries would shrink inventory)
DSPsRicher identity data (surveillance chain provides targeting signals)

The only parties harmed are advertisers (paying DIRECT prices for intermediary inventory) and users (identity shared without authorization or consent). Neither party has visibility into the system that harms them.

This is a model of incentives, not a proof of intent. But ads.txt was introduced in 2017. Nine years later, the false DIRECT rate has not converged toward zero. Bugs get fixed. Equilibria persist.

Method: The 5% figure is the product of three independently measured rates: (1) ads.txt adoption among ad-tech-enabled sites (15%, from 44,176 crawler scans with ad-tech activity), (2) DIRECT claim plausibility (47%, from 1,757,362-triple cross-reference), (3) authorized company coverage (76%, from 1,170 sites with both scan data and ads.txt). The multiplication assumes approximate independence — verified by checking that plausible rates don't vary significantly between high-surveillance and low-surveillance sites. The 24% unauthorized figure counts companies identified by domain matching against 603 known ad-tech domains. Template convergence measured across 8,095 normalized ads.txt files.

Known Weaknesses

1. Sample bias. 21,397 publishers from Tranco top-1M + 75,000-domain crawler harvest. Biased toward popular Western commercial sites. This is the relevant population for programmatic authorization, but not the internet.

2. sellers.json freshness. Snapshots from March 17–23, 2026. SSPs can reclassify sellers — an INTERMEDIARY today could become PUBLISHER tomorrow. Phantom claims could resolve if SSPs add entries. Both verdicts are point-in-time. The strict contradicted rate (29%) is the most defensible number; it requires the SSP to have actively classified the seller as INTERMEDIARY at snapshot time.

3. First-visit consent. Returning users with consent cookies likely show higher consent rates. The first-visit case is the privacy-critical one — data flows before consent is possible — but should not be generalized to all visits.

4. Quick crawl undercounting. Standard scans use 4-second page loads. Deep scans (8+ seconds with Prebid hook) see 2.4x more tracking entities. The average of 5.1 entities per ad-tech-enabled site is from standard scans; fully-loaded pages likely average 7–10.

5. Two rates, one dataset. The strict rate (29%) counts only claims where the SSP explicitly says INTERMEDIARY. The inclusive rate (55%) also counts phantom seller IDs. Both are stable across successive SSP expansions and across both curated (top-1000) and independently crawled (long-tail) publisher populations. The strict rate is the floor no one can argue with. The inclusive rate is the ceiling that depends on whether phantom IDs are fabricated or merely stale. Google's 71% confidentiality flag makes their phantom claims genuinely ambiguous.

6. The 5% estimate. The net authorization figure multiplies three rates that are measured independently and assumed to be approximately independent. The individual measurements are solid; the multiplication is approximate. The true figure could be 4–7% depending on correlation structure. The point is the order of magnitude, not the decimal.

7. Bot detection. The crawler uses headless Chromium with anti-detection flags but no mouse movement or scrolling simulation. Sites with advanced bot detection (Cloudflare Bot Management, DataDome, PerimeterX) may serve reduced ad content to detected bots. This biases our entity counts downward — the real tracking load on human visitors is likely higher than what we measured.

Data Files

All data is machine-readable, standalone (no database required), and independently verifiable:

FileRecordsDescription
false_direct_claims.jsonl962,891Every (publisher, SSP, seller_id) triple with verdict. Deduplicated.
supply_chain_summary.json1Aggregate totals matching the JSONL exactly.
publisher_profiles.jsonl8,734Per-publisher ads.txt depth and crawl traffic.
identity_graph.json5,816 edgesSync co-occurrence graph across 201 companies.
consent_measurement.json1Per-company consent field presence rates.
crawl_summary.json1Site distribution and geographic breakdown.