Update Dissertation.md

This commit is contained in:
SINS 2026-06-02 14:09:47 +00:00
parent 7e606b8a50
commit 000c5b73a4

View File

@ -8,7 +8,7 @@ This section is here to capture relative historical and legal presidence setfort
In June 2024, Wired's engineering team watched Perplexity fetch articles it had been explicitly told not to fetch. The site's robots.txt disallowed PerplexityBot. Perplexity's declared crawler honored it. Then requests kept arriving from an undeclared user agent on an AWS IP range, pulling the same URLs and surfacing them, near-verbatim, in Perplexity answers minutes later. When confronted, the company called it a third-party contractor problem. In June 2024, Wired's engineering team watched Perplexity fetch articles it had been explicitly told not to fetch. The site's robots.txt disallowed PerplexityBot. Perplexity's declared crawler honored it. Then requests kept arriving from an undeclared user agent on an AWS IP range, pulling the same URLs and surfacing them, near-verbatim, in Perplexity answers minutes later. When confronted, the company called it a third-party contractor problem.
### 1.2 -- iFixit v Anthropic ### 1.2 -- iFixit v Anthropic
That same summer, iFixit's CEO posted server logs showing Anthropic's ClaudeBot hitting his site close to a million times in twenty-four hours. Read the Docs disclosed five-figure monthly bandwidth bills driven almost entirely by AI scrapers. Wikimedia reported that roughly sixty-five percent of its most expensive traffic was uncached, high-cost requests now coming from AI crawlers, against a human readership that had not meaningfully grown to account for such an impact. By early 2025, SourceHut's Drew DeVault was writing the same post every other month: """Please stop, we are a small team, we will go down.""" iFixit's CEO posted server logs showing Anthropic's ClaudeBot hitting his site close to a million times in twenty-four hours. Read the Docs disclosed five-figure monthly bandwidth bills driven almost entirely by AI scrapers. Wikimedia reported that roughly sixty-five percent of its most expensive traffic was uncached, high-cost requests now coming from AI crawlers, against a human readership that had not meaningfully grown to account for such an impact. By early 2025, SourceHut's Drew DeVault was writing the same post every other month: """Please stop, we are a small team, we will go down."""
### 1.3 -- Synapsis ### 1.3 -- Synapsis
None of those operators were asking for novel protections. They were asking the existing ones to be honored were being willfully ignoring. Every polite mechanism the web has shipped in the last thirty years such as: robots.txt, ai.txt, IETF content-usage preferences, or even the "email us to opt out" forms, has been treated as advisory at best and as a target list at worst. The only language scrapers have demonstrably responded to is cost and corruption. This document will aid the masses to their own contribution to save the planet by way of hack the planet. None of those operators were asking for novel protections. They were asking the existing ones to be honored were being willfully ignoring. Every polite mechanism the web has shipped in the last thirty years such as: robots.txt, ai.txt, IETF content-usage preferences, or even the "email us to opt out" forms, has been treated as advisory at best and as a target list at worst. The only language scrapers have demonstrably responded to is cost and corruption. This document will aid the masses to their own contribution to save the planet by way of hack the planet.