Lyre/techniques/nepenthes/technical_nepenthes.md

37 lines
2.8 KiB
Markdown

# Nepenthes - Infinite Tarpit for Content Protection
**Nepenthes** is a lightweight, self-hosted tarpit system designed to trap and waste the resources of non-compliant AI crawlers. Created by Aaron (zadzmo), it generates procedurally infinite, nonsensical web pages filled with Markov chain text and endless links. Compliant crawlers that honor `robots.txt` never see it; aggressive scrapers that ignore `Disallow` rules become trapped in an ever-expanding maze of garbage content.
## Why Nepenthes Matters
Traditional polite mechanisms (`robots.txt`, `ai.txt`, opt-out forms) have proven ineffective against frontier AI labs and their contractors. Nepenthes flips the economic model: instead of the content creator bearing bandwidth and compute costs, the scraper is forced to spend time, bandwidth, and storage on worthless data. A single persistent crawler can be held for hours or days, dramatically raising the marginal cost of unauthorized ingestion.
This directly implements the "tarpit" layer described in Section 4.2 of the primary dissertation and complements the proof-of-work protection provided by Anubis.
## Key Features for Individuals
- **Zero ongoing maintenance** - Once deployed behind a `Disallow` path, it runs autonomously.
- **Extremely low resource usage** - Designed to serve infinite content with minimal CPU and memory.
- **Seamless integration** - Works alongside the aggressive-bot UA list in `known-aggressive-bot-user-agents.md`.
- **Multiple deployment modes** - Docker, Python source, or static file generation (Quixotic style).
- **Open source** - Transparent and auditable.
## How It Fits the Defense Stack
1. **Anubis** (`anubis.md`) - First filter (PoW challenge for suspicious clients).
2. **Nepenthes** (this document) - Second filter for any crawler that bypasses or ignores the PoW.
3. **Active denial techniques** (decompression bombs, malformed content, slowloris) - Third layer for persistent offenders.
4. **UA reference list** (`known-aggressive-bot-user-agents.md`) - Shared intelligence used by all layers.
Nepenthes is the natural next step after Anubis. It ensures that even if a sophisticated scraper eventually solves the proof-of-work, it still pays a heavy ongoing cost.
## Official Resources
- Project: https://zadzmo.org/code/nepenthes/
- Coverage: Ars Technica, "AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt" (28 Jan 2025)
## Recommended Starting Point
Place Nepenthes behind any path listed in `Disallow` (e.g., `/tarpit/`, `/garbage/`). Only non-compliant user-agents will ever reach it. When combined with Anubis, the two tools form a powerful, low-cost passive perimeter that returns control to the individual creator.
*Nepenthes is the cornerstone tarpit technology in the passive defense layer. All other techniques in this repository are designed to work alongside it.*