Update Dissertation.md

This commit is contained in:
SINS 2026-06-02 15:19:07 +00:00
parent dd3cdd722e
commit 33342a3c6e

View File

@ -201,7 +201,7 @@ Stop negotiating licensing from weakness. The major deals signed in 20242025
Opt-out is begging or asking the powerful actors with no incentive to comply to *please* listen. Poisoning is bargaining; you impose a real cost they must either pay or work around. Polite mechanisms failed because they assumed good faith from actors whose entire business model depends on its absence. The next decade of the open web depends on operators realizing the bargaining power theyve always had is sitting in their own server config, ready to be used.
## 8 -- References
The intention of section is to capture the references consumed and paraphrased in-order to produce this publication to aid the reader with additional information and resources useful for the acidemic research and study oof the underlying topics discusssed within this document.
The intention of section is to capture the references consumed and paraphrased in-order to produce this publication to aid the reader with additional information and resources useful for the acidemic research and study of the underlying topics discusssed within this document.
### 8.1 -- Section 1: Documented incidents
| Section | Claim | Source |
@ -214,7 +214,7 @@ The intention of section is to capture the references consumed and paraphrased i
| 1.2 | SourceHut / Drew DeVault: AI crawlers degrading small-team infrastructure | Drew DeVault, "Please stop externalizing your costs directly into my face," 17 Mar 2025 — https://drewdevault.com/blog/Stop-externalizing-your-costs-on-me/ ; The Register coverage, 18 Mar 2025 — https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ |
### 8.2 -- Section 2: Polite mechanisms
| # | Claim | Source |
| Section | Claim | Source |
|---|-------|--------|
| 2.1 | robots.txt history (Martijn Koster, 1994) | "A Standard for Robot Exclusion," 1994 — https://www.robotstxt.org/orig.html ; RFC 9309 "Robots Exclusion Protocol" — https://www.rfc-editor.org/rfc/rfc9309.html |
| 2.1 | Bytespider / undeclared crawlers ignoring robots.txt and rotating UAs | Cloudflare Radar verified bots — https://radar.cloudflare.com/traffic/verified-bots ; Originality.AI, "AI Bot Robots.txt Compliance Study," 2024 — https://originality.ai/blog/ai-bot-robots-txt |
@ -227,13 +227,13 @@ The intention of section is to capture the references consumed and paraphrased i
| 2.5 | Common Crawl scope / persistence in training corpora | Common Crawl — https://commoncrawl.org/ ; Mozilla / 2024 study "Training Data for the Price of a Sandwich" — https://foundation.mozilla.org/en/research/library/generative-ai-training-data/common-crawl/ |
### 8.3 -- Section 3: Regulation and litigation
| # | Claim | Source |
| Section | Claim | Source |
|---|-------|--------|
| 3.3 | EU AI Act Article 53 (GPAI obligations re: TDM opt-out) | Regulation (EU) 2024/1689, Art. 53 — https://eur-lex.europa.eu/eli/reg/2024/1689/oj ; Commission GPAI Code of Practice — https://digital-strategy.ec.europa.eu/en/policies/ai-code-practice |
| 3.3 | NYT v. OpenAI / Microsoft | Complaint, S.D.N.Y. 1:23-cv-11195, 27 Dec 2023 — https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf |
### 8.4 -- Section 4: Active countermeasures
| # | Claim | Source |
| Section | Claim | Source |
|---|-------|--------|
| 4.1 | Anubis (PoW reverse proxy) — 90-95% bot drop reports | Project: https://github.com/TecharoHQ/anubis (19.7k stars, MIT) ; documentation: https://anubis.techaro.lol/ ; deployment write-ups: Xe Iaso, "Anubis works," 19 Jan 2025 — https://xeiaso.net/blog/2025/anubis/ ; UNESCO / GNOME GitLab adoption coverage: https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ |
| 4.1 | go-away (alternative PoW / abuse detection) | https://git.gammaspectra.live/git/go-away (mirror: https://github.com/WeebDataHoarder/go-away) |
@ -246,7 +246,7 @@ The intention of section is to capture the references consumed and paraphrased i
| 4.4 | Decompression / zip bombs (background) | https://www.bamsoftware.com/hacks/zipbomb/ |
### 8.5 -- Section 6: Mitigations
| # | Claim | Source |
| Section | Claim | Source |
|---|-------|--------|
| 6.1 | Cloudflare Bot Fight Mode / AI scraper blocking (free tier, default July 2024) | Cloudflare blog, "Declaring your AIndependence: block AI bots, scrapers and crawlers with a single click," 3 Jul 2024 — https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/ ; Cloudflare "AI Audit," Sep 2024 — https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/ |
| 6.3.a | nginx limit_req_zone | nginx docs — https://nginx.org/en/docs/http/ngx_http_limit_req_module.html |