Add techniques/nepenthes/technical_nepenthes.md
This commit is contained in:
parent
7b413f85a4
commit
c299b69414
37
techniques/nepenthes/technical_nepenthes.md
Normal file
37
techniques/nepenthes/technical_nepenthes.md
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
# Nepenthes — Infinite Tarpit for Content Protection
|
||||||
|
|
||||||
|
**Nepenthes** is a lightweight, self-hosted tarpit system designed to trap and waste the resources of non-compliant AI crawlers. Created by Aaron (zadzmo), it generates procedurally infinite, nonsensical web pages filled with Markov chain text and endless links. Compliant crawlers that honor `robots.txt` never see it; aggressive scrapers that ignore `Disallow` rules become trapped in an ever-expanding maze of garbage content.
|
||||||
|
|
||||||
|
## Why Nepenthes Matters
|
||||||
|
|
||||||
|
Traditional polite mechanisms (`robots.txt`, `ai.txt`, opt-out forms) have proven ineffective against frontier AI labs and their contractors. Nepenthes flips the economic model: instead of the content creator bearing bandwidth and compute costs, the scraper is forced to spend time, bandwidth, and storage on worthless data. A single persistent crawler can be held for hours or days, dramatically raising the marginal cost of unauthorized ingestion.
|
||||||
|
|
||||||
|
This directly implements the "tarpit" layer described in Section 4.2 of the primary dissertation and complements the proof-of-work protection provided by Anubis.
|
||||||
|
|
||||||
|
## Key Features for Individuals
|
||||||
|
|
||||||
|
- **Zero ongoing maintenance** - Once deployed behind a `Disallow` path, it runs autonomously.
|
||||||
|
- **Extremely low resource usage** - Designed to serve infinite content with minimal CPU and memory.
|
||||||
|
- **Seamless integration** - Works alongside the aggressive-bot UA list in `known-aggressive-bot-user-agents.md`.
|
||||||
|
- **Multiple deployment modes** - Docker, Python source, or static file generation (Quixotic style).
|
||||||
|
- **Open source** - Transparent and auditable.
|
||||||
|
|
||||||
|
## How It Fits the Defense Stack
|
||||||
|
|
||||||
|
1. **Anubis** (`anubis.md`) - First filter (PoW challenge for suspicious clients).
|
||||||
|
2. **Nepenthes** (this document) - Second filter for any crawler that bypasses or ignores the PoW.
|
||||||
|
3. **Active denial techniques** (decompression bombs, malformed content, slowloris) - Third layer for persistent offenders.
|
||||||
|
4. **UA reference list** (`known-aggressive-bot-user-agents.md`) - Shared intelligence used by all layers.
|
||||||
|
|
||||||
|
Nepenthes is the natural next step after Anubis. It ensures that even if a sophisticated scraper eventually solves the proof-of-work, it still pays a heavy ongoing cost.
|
||||||
|
|
||||||
|
## Official Resources
|
||||||
|
|
||||||
|
- Project: https://zadzmo.org/code/nepenthes/
|
||||||
|
- Coverage: Ars Technica, "AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt" (28 Jan 2025)
|
||||||
|
|
||||||
|
## Recommended Starting Point
|
||||||
|
|
||||||
|
Place Nepenthes behind any path listed in `Disallow` (e.g., `/tarpit/`, `/garbage/`). Only non-compliant user-agents will ever reach it. When combined with Anubis, the two tools form a powerful, low-cost passive perimeter that returns control to the individual creator.
|
||||||
|
|
||||||
|
*Nepenthes is the cornerstone tarpit technology in the passive defense layer. All other techniques in this repository are designed to work alongside it.*
|
||||||
Loading…
Reference in New Issue
Block a user