Programmatic advertising supply chain integrity, consent failure, and identity proliferation.
Evidence from 142,000 websites, 84 SSP registries, 422,000 identity-sharing requests.
Data collected March 14–23, 2026.
1. Fetch: ads.themoneytizer.com/ads_txt.php
→ "smartadserver.com, 1097, DIRECT"
2. Check: smartadserver.com/sellers.json → seller 1097
→ seller_type: "INTERMEDIARY", name: "Themoneytizer"
29% of "DIRECT" authorization claims are explicitly contradicted by the SSPs' own registries. A further 26% reference seller IDs that don't exist. Cross-reference of 1,757,362 triples across 21,397 publishers against 84 SSP registries.
When a publisher lists a company as "DIRECT" in their ads.txt file, it means: "I control this seller account. I authorized this directly." Advertisers pay more for DIRECT inventory because it's supposed to mean fewer intermediaries and less fraud risk.
We checked 1,757,362 unique (publisher, SSP, seller_id) triples against the SSPs' own registries:
| Category | Count | Distribution | How verified |
|---|---|---|---|
| Contradicted | 503,387 (29%) | SSP says INTERMEDIARY, not PUBLISHER | |
| Phantom | 459,504 (26%) | Seller ID doesn't exist in registry | |
| Plausible | 793,727 (45%) | Registry confirms PUBLISHER or BOTH type, or no registry available |
Reading this table: "Contradicted" is unambiguous — the SSP explicitly classifies the account as INTERMEDIARY, contradicting the DIRECT claim. "Phantom" is ambiguous — the seller ID may be stale, fabricated, or hidden behind Google's 71% confidentiality flag. "Plausible" includes confirmed PUBLISHER entries, BOTH-type entries (may act as either), and claims against SSPs with no registry. If intermediaries exploit the BOTH classification as a loophole, the true false rate is between 55% and 65%. The strict rate (contradicted only, no interpretation) is 29%. Both rates are stable across successive SSP expansions and across both curated and crawled publisher datasets.
Intermediaries (AnyManager, PubFuture, Waardex, YieldMonk) distribute template ads.txt blocks to publishers with DIRECT instead of RESELLER. DIRECT commands higher CPMs and lower SSP fees. The intermediary's margin becomes invisible to the buyer.
Counter-argument preempted: Could "DIRECT" legitimately mean "direct business relationship" rather than the spec's "controls the account"? If so, some CONTRADICTED claims might be legitimate. Testing this: 96.6% of contradicted claims involve seller IDs shared by 6+ publishers (59.5% by 51–500 publishers, 22.5% by 500+). At most 3.4% could represent individual direct relationships. Under any interpretation, the remaining 96.6% is industrial-scale template injection.
| SSP Domain | False Claims | False Rate |
|---|---|---|
| lijit.com (Sovrn) | 68,785 | 83% |
| google.com | 64,743 | 45% (all phantom — Google uses confidential flag) |
| rubiconproject.com (Magnite) | 62,495 | 87% |
| taboola.com | 53,869 | 62% |
| onetag.com | 50,841 | 82% |
| pubmatic.com | 41,912 | 60% |
| indexexchange.com | 38,915 | 51% |
| openx.com | 36,608 | 76% |
| triplelift.com | 33,767 | 62% |
| appnexus.com (Xandr) | 29,192 | 60% |
Google's sellers.json has 71% of entries marked confidential (is_confidential: true). Every other SSP: 0%. Google's 64,743 false claims are all PHANTOM (seller ID not found) — they could be hidden behind the confidentiality flag. Excluding Google, the strict contradicted rate across the remaining SSPs is 38%.
Machine-readable data: false_direct_claims.jsonl — 962,891 records, one per triple, with publisher, SSP, seller_id, registry_type, and verdict.
0.012% of identity-sharing requests carry valid consent on first visit. ~87 genuine consent strings in 721,129 sync requests. Not broken — never built.
Under GDPR, companies must obtain consent before sharing a user's identity with third parties. In programmatic advertising, "cookie syncing" is the mechanism that shares identity — Company A tells Company B "your user #X is my user #Y."
We captured 721,129 cookie sync requests across 186,000 websites and checked whether they carried the required TCF consent string:
| Consent Status | Requests | Distribution |
|---|---|---|
| No consent parameter at all | 210,980 (77.3%) | |
| Parameter present but empty | 60,109 (22.0%) | |
| Unresolved template macro | 1,794 (0.66%) | |
| Valid TCF consent string | 34 (0.012%) |
Ad scripts start executing. Cookie sync requests fire immediately.
User IDs shared across companies. No consent obtained yet.
Dozens to hundreds of companies now know this user visited this page. The worst observed: 294 in one load.
User sees "Do you accept cookies?" The data has already been shared.
0 of 2,000 captured Prebid.js instances configure the consent management module. The Prebid Mobile SDK (decompiled from production binary) defaults to "full device access" when no consent management platform is present — including broadcasting the device's permanent Advertising ID to all bidders.
| Company | Syncs Captured | Has Consent Field | Has Valid Value |
|---|---|---|---|
| Google Ads | 27,129 | 0% | 0.01% |
| ID5 | 10,988 | 19% | 0.01% |
| Trade Desk | 10,682 | 43% | 0.01% |
| Xandr | 8,365 | 23% | 0.05% |
| Magnite | 8,141 | 18% | 0.04% |
| Criteo | 6,617 | 27% | 0.14% |
The average ad-tech-enabled website shares user identity with 5 companies. The worst shares with 294 in a single page load.
| Metric | Value |
|---|---|
| Sites scanned | 142,630 |
| Total requests captured | 2,586,662 |
| Total sync requests | 422,308 |
| Average companies per site (with adtech) | 5.1 |
| Maximum (single page load) | 294 |
| Region | Sites | Avg Entities | Avg Syncs/Site |
|---|---|---|---|
| .jp (Japan) | 1,605 | 8.1 | 4.6 |
| .com (US-dominated) | 53,194 | 5.7 | 3.4 |
| .br (Brazil) | 1,660 | 5.6 | 2.7 |
| .co.uk (United Kingdom) | 1,947 | 5.0 | 5.7 |
| .ru (Russia) | 2,990 | 3.5 | 2.2 |
| .fr (France) | 1,265 | 3.7 | 1.3 |
| .de (Germany) | 2,429 | 2.7 | 0.6 |
EU sites average 23% fewer tracking entities than non-EU sites (3.7 vs 4.8 across 135,000 sites). Within the EU, UK and German sites are both GDPR. UK sites average 5.7 syncs per page; German sites average 0.6. The 10x difference is publisher configuration, not regulation. The law is the same. Compliance is not.
| Company | Sync Partners | Role |
|---|---|---|
| Trade Desk | 15 | DSP — buys audiences |
| Xandr (Microsoft) | 14 | SSP/DSP hybrid |
| Magnite | 12 | SSP — sells inventory |
| Criteo | 11 | Retargeting |
| PubMatic | 10 | SSP |
| ID5 | 10 | Identity resolution |
| BidSwitch | 9 | Exchange connector |
Approximately 4% of ad-tech activity on the web falls within a functioning authorization framework. The three findings above are not three separate failures. They are one system at its operating temperature.
The authorization framework (ads.txt) is opt-in. Most of the web opted out:
| Layer | Coverage | How measured |
|---|---|---|
| Sites with ads.txt | 15% of ad-tech-enabled sites | Of 44,176 crawler scans showing ad-tech activity, 6,749 have a valid ads.txt. 37,427 do not. |
| DIRECT claims that are genuine | 45% | Of 1,757,362 verified triples, 793,727 are plausible (Finding 1). |
| Companies covered by ads.txt | 76% | Across 1,170 sites with both scan data and ads.txt, 24% of observed companies have no ads.txt entry. |
| Net authorization | ~5% | 0.15 × 0.45 × 0.76 ≈ 0.051 |
The 24% of companies that arrive without authorization are not random. They are the identity infrastructure — companies that build cross-site profiles:
| Company | Sites observed | Syncs (48h) | Authorized in ads.txt? |
|---|---|---|---|
| Trade Desk | 2,700 | 10,075 | No — arrives through creatives |
| 2,655 | 6,882 | No | |
| ID5 | 2,095 | 10,243 | No |
| Microsoft Clarity | 1,918 | 2,505 | No |
| Lotame | 1,551 | 2,554 | No |
| LiveRamp | 1,042 | 2,120 | No |
| Tapad | 1,030 | 2,762 | No |
Two delivery mechanisms. Trade Desk, ID5, LiveRamp, Lotame, and Tapad appear almost exclusively on sites with heavy ad-tech (0–0.6% presence on low-adtech sites). They arrive through ad creatives — the publisher never installed them. Meta and Microsoft Clarity also appear on low-adtech sites (4–6%), indicating publisher installation. The brief's "24% unauthorized" figure includes both groups. The narrower claim — companies arriving uninvited through creatives — applies to the first group. Ads.txt covers SSP authorization only; none of these 7 companies are SSPs, so their absence from ads.txt is expected. The gap is not that they lack authorization — it's that no authorization framework covers them at all.
16 intermediary accounts appear in more than half of all ads.txt files on the internet. The most ubiquitous (Rubicon seller 17960) is in 61% of files. These entries arrive through templates distributed by intermediaries to thousands of publishers. The publisher didn't choose them; a template chose them.
The result is a Nash equilibrium. Every participant profits from every check not working:
| Actor | Incentive to not check |
|---|---|
| Publishers | False DIRECT claims → higher CPMs (DIRECT commands premium pricing) |
| Intermediaries | Template injection → invisible margin (buyer doesn't know they exist) |
| SSPs | Larger supply pool (enforcing registries would shrink inventory) |
| DSPs | Richer identity data (surveillance chain provides targeting signals) |
The only parties harmed are advertisers (paying DIRECT prices for intermediary inventory) and users (identity shared without authorization or consent). Neither party has visibility into the system that harms them.
This is a model of incentives, not a proof of intent. But ads.txt was introduced in 2017. Nine years later, the false DIRECT rate has not converged toward zero. Bugs get fixed. Equilibria persist.
1. Sample bias. 21,397 publishers from Tranco top-1M + 75,000-domain crawler harvest. Biased toward popular Western commercial sites. This is the relevant population for programmatic authorization, but not the internet.
2. sellers.json freshness. Snapshots from March 17–23, 2026. SSPs can reclassify sellers — an INTERMEDIARY today could become PUBLISHER tomorrow. Phantom claims could resolve if SSPs add entries. Both verdicts are point-in-time. The strict contradicted rate (29%) is the most defensible number; it requires the SSP to have actively classified the seller as INTERMEDIARY at snapshot time.
3. First-visit consent. Returning users with consent cookies likely show higher consent rates. The first-visit case is the privacy-critical one — data flows before consent is possible — but should not be generalized to all visits.
4. Quick crawl undercounting. Standard scans use 4-second page loads. Deep scans (8+ seconds with Prebid hook) see 2.4x more tracking entities. The average of 5.1 entities per ad-tech-enabled site is from standard scans; fully-loaded pages likely average 7–10.
5. Two rates, one dataset. The strict rate (29%) counts only claims where the SSP explicitly says INTERMEDIARY. The inclusive rate (55%) also counts phantom seller IDs. Both are stable across successive SSP expansions and across both curated (top-1000) and independently crawled (long-tail) publisher populations. The strict rate is the floor no one can argue with. The inclusive rate is the ceiling that depends on whether phantom IDs are fabricated or merely stale. Google's 71% confidentiality flag makes their phantom claims genuinely ambiguous.
6. The 5% estimate. The net authorization figure multiplies three rates that are measured independently and assumed to be approximately independent. The individual measurements are solid; the multiplication is approximate. The true figure could be 4–7% depending on correlation structure. The point is the order of magnitude, not the decimal.
7. Bot detection. The crawler uses headless Chromium with anti-detection flags but no mouse movement or scrolling simulation. Sites with advanced bot detection (Cloudflare Bot Management, DataDome, PerimeterX) may serve reduced ad content to detected bots. This biases our entity counts downward — the real tracking load on human visitors is likely higher than what we measured.
All data is machine-readable, standalone (no database required), and independently verifiable:
| File | Records | Description |
|---|---|---|
false_direct_claims.jsonl | 962,891 | Every (publisher, SSP, seller_id) triple with verdict. Deduplicated. |
supply_chain_summary.json | 1 | Aggregate totals matching the JSONL exactly. |
publisher_profiles.jsonl | 8,734 | Per-publisher ads.txt depth and crawl traffic. |
identity_graph.json | 5,816 edges | Sync co-occurrence graph across 201 companies. |
consent_measurement.json | 1 | Per-company consent field presence rates. |
crawl_summary.json | 1 | Site distribution and geographic breakdown. |