docs(sbx): VirtualBox backend spec, crypto pay-gate, save/load PoC

Add the VirtualBox sandbox design spec (headless 4th backend + share-an-
appliance GUI mode with detect-first install), the crypto pay-to-join gate
design, and the save/load PoC writeup with its demo/film driver scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
leetcrypt 2026-06-03 10:10:44 -07:00
parent 07e9c30846
commit ca1666fbbb
5 changed files with 1238 additions and 0 deletions

385
docs/crypto-payment-gate.md Normal file
View File

@ -0,0 +1,385 @@
# Design: crypto-currency "pay-to-join" gate (second gate after the password)
**Status:** research + plan (no code yet)
**Goal:** require a crypto payment *in addition to* the SRP password before a
client is admitted to a hack-house room, **without weakening** the existing
end-to-end-encryption / zero-knowledge-relay properties.
---
## 1. What the gate must respect (current architecture)
Grounded in the present code so the design slots in cleanly:
- **Two-step SRP handshake over HTTP, then a WebSocket upgrade.**
- `POST /srp/init` → returns `user_id, B, salt, room_salt` and enforces a *soft*
capacity / username check (`cmd_chat/server/views.py:30-63`, capacity at `:48`).
- `POST /srp/verify` → verifies the SRP proof `M`, then **commits the session**
and issues the `ws_token` (`cmd_chat/server/views.py:66-111`). The
**authoritative capacity gate is here** (`:84-85`), the proof check at `:87`,
and `session_store.add(session)` at `:97`.
- `GET (ws) /ws/chat?user_id=…&ws_token=…` → HMAC-checks the token and joins the
room (`cmd_chat/server/views.py:114-149`).
- **Rust client mirror:** `hh/src/net.rs:44-104` (`authenticate`) does init →
`process_challenge` → verify → checks server `H_AMK` (mutual auth, MITM guard)
→ derives the room Fernet → builds the ws URL. The Python client mirrors this
in `cmd_chat/client/client.py`.
- **Zero-knowledge relay.** The server never derives the room key
(`HKDF(password, room_salt, "cmd-chat-room-key")` is computed **client-side
only**) and only relays Fernet ciphertext. It *can* see: usernames, IPs,
timestamps, `user_id`s, roster, ws-token validity. It *cannot* see: message
plaintext, the room key, sandbox I/O, `_perm`/`_ft` control frames.
- **Owner is authoritative, not the server.** Chat rooms have no owner; the first
user to `/sbx launch` becomes sandbox owner and the **owner's client**
broadcasts the ACL (`_perm:acl`) encrypted — the server relays it blindly. This
is the precedent we lean on for the trust-minimized tier below.
- **One password per server instance** (`serve --password`); a "room" is just a
running server. No room-creation endpoint, no persistent accounts, no
allow-lists/bans. Capacity defaults to 4 (`CMD_CHAT_MAX_USERS`). Per-IP rate
limit 10/60s on both SRP endpoints.
**Key consequence:** payment is a *money* trust concern, orthogonal to the E2E
*message* trust concern. We must not route the room key or plaintext through any
payment logic, and we should keep the relay as close to "dumb" as possible.
---
## 2. Design principles / security requirements
1. **Non-custodial.** The app never holds user funds or spend-capable keys. Funds
go **directly to the owner's** node/wallet. We only *issue invoices* and
*verify receipt*. (Custody would add huge security + regulatory burden.)
2. **Two independent gates.** Password (SRP) and payment are separate; failing
either denies entry. Payment is checked **only after** SRP succeeds, so an
unpaid attacker who lacks the password learns nothing and never sees a
payment request.
3. **Keep the relay dumb where possible.** Prefer designs where the server checks
a *signature* or a *preimage* rather than running chain logic. Full chain
access lives in an **owner-operated** component.
4. **Atomic pay-to-join.** Never take money for a seat that isn't granted. Use a
capacity reservation + Lightning **hold invoice** so payment settles *iff* the
seat is actually granted; otherwise it auto-cancels (refund).
5. **Single-use, identity-bound proofs.** Every proof is bound to one `user_id`
+ room + nonce, expires quickly, and is burned on use (anti-replay / anti-share).
6. **No fiat price oracle in the hot path.** Price in the native unit (sats /
atomic units). Optional fiat display is cosmetic and never gates admission.
7. **Least-privilege keys, never in the repo.** Invoice/read-only node
credentials via env; no admin/spend keys. Testnet/regtest/signet by default;
mainnet is an explicit opt-in flag.
8. **Fail safe + DoS-resistant.** Payment-backend errors deny entry (don't
fail-open) but never hang the SRP handshake; invoice issuance is rate-limited.
9. **Privacy-aware.** Prefer rails that don't dox the payer's whole graph
(Lightning ≫ on-chain; Monero strongest).
---
## 3. Currency / rail choice
| Rail | Settlement | Fees | Privacy | Proof primitive | Verdict |
|------|-----------|------|---------|-----------------|---------|
| **Bitcoin Lightning** | ~instant | ~0 | good | **payment preimage** (sha256(P)=hash) | **Recommended** |
| Monero | ~2 min | low | **best** | tx proof (tx key + addr) | strong privacy alt |
| On-chain L2 stablecoin (USDC on Base/Arbitrum) | secondsmin | low | poor | tx receipt + log | fiat-stable alt |
| On-chain BTC / ETH L1 | 10 min / minutes | high | poor | tx receipt | bad UX, avoid |
**Recommendation: Bitcoin Lightning.** It matches the ephemeral, instant,
micro-fee nature of "pay a few sats to join a terminal room," and the **preimage
is a self-verifying proof of payment** that needs no trusted third party *if the
verifier knows the invoice's payment hash*. The hold-invoice variant gives us
atomic seat reservation for free.
The implementation is built behind an interface (§5) so Monero / L2-stablecoin
backends can be added without touching the handshake.
---
## 4. The core idea: hold-invoice pay-to-join
Lightning **hold invoices** (a.k.a. HODL invoices, LND `addHoldInvoice` /
CLN `holdinvoice`) let the receiver *accept* a payment (funds locked by the
payer) and decide later whether to **settle** (claim) or **cancel** (refund):
1. Owner's node generates a hold invoice for the join fee, with a `payment_hash`
it controls and a description that **commits to `user_id`**.
2. Joiner pays; the HTLC is now *accepted* (locked) but **not settled** — the
payer cannot spend it elsewhere and the owner hasn't claimed it.
3. The gate **reserves a seat** (capacity check) and, iff a seat is available,
tells the node to **settle** → owner is paid, joiner is admitted.
4. If no seat / timeout / error → **cancel** the invoice → joiner is auto-refunded,
nobody paid for a seat they didn't get.
This is the cleanest way to satisfy principle #4 (atomicity).
---
## 5. Pluggable `AdmissionGate` abstraction
Introduce one small interface so the payment rail is swappable and the
"no gate" path is the existing behavior (zero risk to current deployments).
```python
# cmd_chat/server/admission.py (illustrative)
class AdmissionGate(Protocol):
async def quote(self, user_id: str, username: str) -> Quote | None:
"""Return a payable Quote (invoice/LNURL/address + opaque challenge),
or None if no payment is required (free room)."""
async def verify(self, user_id: str, proof: dict) -> bool:
"""True iff `proof` settles `quote` for this user_id, single-use,
unexpired, correct amount. Reserves+settles atomically for hold invoices."""
# backends:
# NullGate -> always free (today's behavior; default)
# LightningGate -> Tier 1, self-hosted LNbits/LND/CLN
# VoucherGate -> Tier 2, verifies owner-signed Ed25519 admission vouchers
# (future) MoneroGate, EvmGate
```
Server wiring (`cmd_chat/server/factory.py`): `app.ctx.admission = build_gate(cfg)`.
New `serve` flags (all optional; default = NullGate = unchanged):
```
cmd_chat.py serve <ip> <port> --password <pw> \
--pay lightning \
--pay-amount-sats 500 \
--pay-backend lnbits --pay-url https://lnbits.local --pay-key-env HH_LNBITS_INVOICE_KEY \
--pay-network signet
# or, trust-minimized:
--pay voucher --admit-pubkey <ed25519-pub-b64>
```
---
## 6. Two implementation tiers
### Tier 1 — server-mediated Lightning gate (MVP, pragmatic)
The server talks to an **owner-operated** LN backend (LNbits invoice key, or
LND/CLN gRPC with an invoice-only macaroon). Justified because in the common
self-hosted case the **owner *is* the server operator**, so trusting the server
to confirm payment is trusting yourself.
Flow:
1. `POST /srp/init` unchanged, **plus** server returns `pay_required: true` and a
`pay_challenge` (random nonce bound to `user_id`) when a gate is active.
2. New `POST /pay/quote {user_id}` → server (after a *soft* capacity check) asks
the node for a **hold invoice** committing `user_id` in its `description_hash`,
returns `{bolt11, payment_hash, amount_sats, lnurl?, expires_at}`.
3. Client shows the invoice (BOLT11 + QR + LNURL) in the TUI and waits.
4. Joiner pays → HTLC accepted (held).
5. `POST /srp/verify {user_id, M, payment_hash}` → server: SRP-verify →
**authoritative capacity gate** → ask node "is this invoice ACCEPTED, amount ok,
user_id committed, not yet used?" → if yes **settle** + mark used + add session
+ issue `ws_token`; if no seat/timeout → **cancel** (refund) and return `402`.
HTTP semantics: `402 Payment Required` (+ a small JSON `{error, pay_required,
pay_challenge}`) when payment is missing/invalid; `409` still means full.
### Tier 2 — owner-signed admission vouchers (trust-minimized, end state)
Mirrors the existing "owner broadcasts the ACL, server relays blindly" model and
keeps the server **payment-blind** (no node creds, no chain access on the relay).
Components:
- The owner runs a tiny **doorman** next to their node (could live in the Rust
client or a sidecar): an LNURL-pay endpoint + voucher signer holding an
**Ed25519** key. The server is started with only the **public** key
(`--admit-pubkey`).
- A joiner pays the owner directly (LNURL/BOLT11). On settlement the doorman signs
an **admission voucher**:
`voucher = Ed25519_sign( sk_owner, {user_id, room_id, amount, nonce, exp} )`.
- Joiner submits the voucher at `POST /srp/verify {…, voucher}`.
- Server verifies: signature against `admit_pubkey`, `user_id` matches, `exp`
fresh, `nonce` unseen (single-use ledger), then admits. **The server verifies a
signature, not a payment** — it never sees funds, invoices, or chain data.
Tradeoff: the owner/doorman must be reachable to *issue* vouchers; the relay no
longer needs to be trusted with money. This is the philosophically correct
end-state for a zero-knowledge relay; Tier 1 is the faster path to ship.
---
## 7. End-to-end sequence (Tier 1, hold-invoice)
```
client relay (server) owner LN node
| POST /srp/init {A, user} | |
|-------------------------------->| soft cap/username check |
|<-- {user_id,B,salt,room_salt, | |
| pay_required, pay_challenge}| |
| process_challenge | |
| POST /pay/quote {user_id} | |
|-------------------------------->| addHoldInvoice(desc_hash= |
| | H(user_id|nonce|room)) ----->|
|<-- {bolt11, payment_hash, amt, |<------- invoice ---------------|
| lnurl, expires_at} | |
| [TUI shows invoice/QR; pay] ------------------- pay ----------->| (HTLC accepted/held)
| POST /srp/verify {user_id,M, | |
| payment_hash} | |
|-------------------------------->| SRP verify (:87) |
| | AUTHORITATIVE cap gate (:84) |
| | lookupInvoice == ACCEPTED? --->|
| | amount ok? user_id committed? |
| | unused? -> settle ----------->| (owner paid)
| | session_store.add (:97) |
|<-- {H_AMK, ws_token} | |
| check H_AMK (MITM guard) | |
| ws /ws/chat?user_id&ws_token | |
|================ joined =========| |
(on any failure between verify+settle: node.cancelInvoice -> refund, return 402)
```
---
## 8. Anti-replay, binding, atomicity (the security-critical bits)
- **Per-join invoice.** A fresh invoice/nonce per join attempt; never a static
address. Static addresses enable replay and can't bind identity.
- **Identity binding.** Commit `user_id` (and room id) into the invoice
`description_hash` (BOLT11) or the voucher payload, so a proof minted for one
joiner can't admit another — defeats proof theft by a malicious relay/peer.
- **Single-use ledger.** In-RAM set of consumed `payment_hash`/`nonce` for the
server's lifetime; reject reuse. (Matches the project's ephemeral, RAM-only
ethos — no DB needed.)
- **Expiry.** Short invoice/voucher TTL (e.g. 5 min) tied to the SRP
`pay_challenge`; expired ⇒ re-quote.
- **Atomic seat.** Capacity is reserved at the authoritative gate
(`views.py:84-85`) *and* the hold invoice is only settled after the seat is
secured; otherwise cancel→refund. Consider a short-lived "seat hold" so two
payers don't race the last seat (reserve before settle, release on failure).
- **Amount check.** Enforce `amount >= price`; reject underpayment; for overpay,
settle and (optionally) note credit — never silently keep extra without policy.
---
## 9. Client UX (Rust TUI + Python)
- **Connect command** gains optional payment handling. When `/srp/init` returns
`pay_required`, the client:
- fetches a quote, renders a **payment panel**: amount (sats), a copyable BOLT11,
an `lnurl:`/`lightning:` URI, and a **QR code** (ratatui can draw a QR via a
unicode/half-block widget; Python client prints an ASCII QR).
- shows live status: `waiting for payment → received (held) → admitted`.
- then proceeds to `/srp/verify` automatically once paid.
- **CLI flags** for non-interactive/automation: `--pay-bolt11-out <file>` (dump the
invoice) or `--voucher <file>` (Tier 2, present a pre-obtained voucher).
- **Help menu:** add to the existing clustered help (`hh/src/ui.rs`
`help_clusters`) — a short note under a new `ACCESS` or extended `KEYS`/intro
cluster: *"rooms may require a Lightning payment to join; you'll be shown an
invoice after the password check."*
- **Failure copy:** `402` ⇒ "this house requires payment to enter"; expired ⇒
"invoice expired — press R to refresh"; full-after-pay ⇒ "house filled while
paying — you were refunded."
---
## 10. State, config, key management
- **New server ctx:** `admission` gate, `paid_nonces` (single-use set),
`seat_holds` (transient reservations). All **in-RAM**, cleared on restart —
consistent with the existing ephemeral stores (`cmd_chat/server/stores.py`).
- **Secrets via env only** (never in repo, never in `serve` argv where it'd hit
`ps`): `HH_LNBITS_INVOICE_KEY`, `HH_LND_MACAROON_PATH` (invoice-only macaroon),
`HH_LND_TLS_CERT`, etc. Pass *names* on the CLI (`--pay-key-env`), read values
from the environment.
- **Least privilege:** LNbits **invoice/read key** (not admin); LND macaroon baked
to `invoices:write/read` + `invoices:settle`/`cancel` only — **no `onchain`/
`offchain` spend** permissions.
- **Network guard:** default `--pay-network signet|regtest|testnet`; require an
explicit `--pay-network mainnet --i-understand` to use real funds.
---
## 11. Threat model
| Threat | Mitigation |
|--------|-----------|
| Replay a proof to join repeatedly / share with friends | per-join invoice + single-use nonce ledger + `user_id` binding in `description_hash`/voucher |
| Malicious relay steals the preimage to join itself | identity-bound invoice (preimage only admits the committed `user_id`); Tier 2 removes relay from the money path entirely |
| Pay but no seat (race / full) | hold invoice + seat reservation; cancel→auto-refund on failure; clear UX |
| Payment-backend down → fail-open | gate denies entry on backend error (fail-closed); never silently admits |
| Invoice-spam DoS / griefing | reuse existing per-IP rate limit (`helpers.py`) on `/pay/quote`; cap concurrent unpaid holds per IP; short TTLs |
| Front-running the last seat | reserve seat *before* settle; release on abort |
| Fiat-oracle manipulation | price natively in sats; no oracle in the admission path |
| Key leakage | invoice/read-only creds, env-only, no spend keys; mainnet behind explicit flag |
| MITM on the HTTP leg | unchanged SRP `H_AMK` mutual-auth guard (`net.rs:87-90`); run behind TLS in prod (today's `--no-tls` is dev-only) |
| Privacy deanonymization | Lightning over on-chain; document Monero option; never log payer metadata beyond the ephemeral nonce |
| Regulatory/custody risk | strictly non-custodial; funds never touch app-controlled spend keys |
---
## 12. Privacy & legal notes (design guidance, not legal advice)
- **Privacy:** Lightning leaks far less than on-chain; the relay should log only
the ephemeral nonce/payment_hash, never amounts tied to usernames/IPs longer
than the session. Offer Monero for the strongest payer privacy.
- **Compliance:** staying **non-custodial** (funds go owner→owner, app never
holds spend keys) keeps this closest to "the owner accepts tips/entry fees,"
but money-transmission / KYC-AML obligations vary by jurisdiction and volume.
Flag this to the operator; do not build custody or fiat on/off-ramps into the
app. This document is engineering guidance, not legal advice.
---
## 13. Testing strategy
- **Unit:** `NullGate` (free), `VoucherGate` signature + expiry + replay vectors
(golden Ed25519 cases, mirroring the existing offline SRP vectors in
`test_srp.py` / Rust `Selftest`).
- **Integration (regtest):** spin LND/CLN in `regtest` (Polar or docker), drive a
full pay-to-join: quote → pay → settle → join; and the negative paths
(underpay, expire, full-after-pay→cancel/refund, backend-down→deny).
- **Interop:** extend the Rust live `Handshake` self-test to optionally carry a
voucher; ensure Rust and Python clients produce identical proof framing.
- **Headless demo:** a `demo-pay-to-join.sh` (sibling of `demo-save-load.sh`)
using a regtest node to film the beat: password → invoice in the TUI → pay →
admitted.
---
## 14. Phased rollout
1. **Phase 0 — interface only.** Add `AdmissionGate` + `NullGate`, wire
`app.ctx.admission`, thread an optional `proof` field through `/srp/verify`.
Default behavior identical to today. (Lowest risk; everything else builds on it.)
2. **Phase 1 — VoucherGate (Tier 2 verify side).** Ed25519 voucher verification on
the server (`--admit-pubkey`), single-use ledger, `402` path + client flag to
present a voucher. Server stays payment-blind. Testable with a CLI signer.
3. **Phase 2 — LightningGate (Tier 1) + `/pay/quote`.** Hold-invoice issuance via
LNbits/LND, settle/cancel atomicity, TUI payment panel + QR. regtest e2e.
4. **Phase 3 — owner doorman (Tier 2 issue side)** + LNURL, so payment is fully
owner-authoritative and the relay never touches money. Optional Monero backend.
---
## 15. File-change map (when we implement)
| Area | File(s) | Change |
|------|---------|--------|
| Gate interface + backends | `cmd_chat/server/admission.py` (new) | `AdmissionGate`, `NullGate`, `VoucherGate`, `LightningGate` |
| Server wiring + flags | `cmd_chat/server/factory.py`, `cmd_chat.py` | build gate from `--pay*` flags; ctx state |
| Init advertises gate | `cmd_chat/server/views.py:30-63` | add `pay_required`, `pay_challenge` to `/srp/init` |
| New quote endpoint | `cmd_chat/server/routes.py`, `views.py` | `POST /pay/quote` (Tier 1) |
| Verify enforces payment | `cmd_chat/server/views.py:66-111` | accept `proof`/`payment_hash`; gate between `:87` and `:97`; `402` path; settle/cancel |
| Single-use + seat state | `cmd_chat/server/stores.py` | `paid_nonces`, `seat_holds` (in-RAM) |
| Python client UX | `cmd_chat/client/client.py` | quote fetch, invoice display/QR, wait-for-pay, send proof |
| Rust client UX | `hh/src/net.rs:44-104`, `hh/src/app.rs`, `hh/src/ui.rs` | quote step, payment panel + QR, proof in verify, help entry |
| Owner doorman (Tier 2) | new sidecar or `hh/` subcommand | LNURL-pay + Ed25519 voucher signer (holds keys) |
| Tests | `cmd_chat/tests/`, Rust `Cmd::*` | gate unit + regtest integration + interop |
---
## 16. Open decisions (need owner input)
1. **Trust posture:** ship Tier 1 (server-mediated, simplest) first, or hold out
for Tier 2 (relay stays payment-blind)? Recommendation: Phase 0→1 (voucher
verify) gets us trust-minimized verification quickly; add Lightning issuance
(Phase 2) for UX.
2. **Rail:** Lightning only at first? (Recommended.) Monero as a fast-follow for
privacy?
3. **Pricing:** flat sats per join? per-room configurable? time-boxed
(pay-per-hour) vs one-shot entry?
4. **Backend:** LNbits (fastest to integrate, semi-custodial unless self-hosted)
vs direct LND/CLN (more setup, fully self-custodial)?
5. **What payment buys:** plain entry, or also a role (e.g. auto-`/grant` drive)?
Note: roles are owner-broadcast ACL today, so coupling payment→role belongs in
the owner's client, not the relay.

View File

@ -0,0 +1,88 @@
# PoC: persistent sandbox — fast-qwen build → save image → close → reload
**Goal of the video beat:** prove that a hack-house Docker sandbox is *durable
on demand*. A local, CPU-only **fast qwen coder** writes & runs code inside an
ephemeral Docker sandbox; we snapshot it to an image with `/sbx save`; we **fully
close the session** (container is purged on teardown); we relaunch the client and
`/sbx load` the snapshot — the code the model wrote is **still there**.
This is the headline pitch: *sandboxes are RAM-only/ephemeral by default, but you
can freeze a moment of work into an image and thaw it later — nothing leaks to the
server, the image lives only on the owner's box.*
## Why this is non-obvious / worth showing
- `/sbx stop` and client-quit both run `sbx::teardown``docker rm -f hack-house`.
The container is **gone**. Normally the work would be gone too.
- `/sbx save <label>` runs `docker commit hack-house hh-snap:<label>` *while the
container is alive*. The image is independent of the container, so it survives
the purge.
- `/sbx load <label>` runs a **fresh** container from `hh-snap:<label>` — same
filesystem state, new ephemeral instance.
## Models (CPU-only box: i5-8350U, no GPU)
| Path | Model | Why |
|------|-------|-----|
| chat (`/ai <q>`) | `qwen2.5:3b` | general, the locally-pulled default |
| sandbox `!task` | `qwen2.5-coder:1.5b` | auto-selected coder; fast TTFT on CPU, better shell/code |
The agent auto-selects the coder build for the `!task` (sandbox-driving) path when
the chat provider is Ollama and a `qwen2.5-coder` is present (it is — pulled).
## Storyboard (the cut)
1. **Title card** — "Ephemeral by default. Persistent on demand."
2. **Summon** — alice: `/sbx launch docker` → "summoned" sandbox bubble.
3. **Spawn the coder** — alice: `/ai start``oracle online — ollama/qwen2.5:3b`
(the coder model rides along for `!task`).
4. **Build, by the fast model** — alice:
`/ai oracle !write /root/fib.py that prints the first 10 Fibonacci numbers, then run it`
→ agent drives the shared shell; `fib.py` is written and executed; the
sandbox pane shows the Fibonacci output.
5. **Freeze it** — alice: `/sbx save buildbox`
`⛧ saved sandbox → image hh-snap:buildbox · reload with /sbx load buildbox`.
6. **Walk away** — alice: `/sbx stop` (or quits the client entirely). Container is
purged; prove it: `docker ps -a` shows no `hack-house`, but
`docker images hh-snap` still lists `buildbox`.
7. **Come back** — a *fresh* client session; alice: `/sbx load buildbox`.
8. **The reveal** — F2 to drive, `cat /root/fib.py && python3 /root/fib.py`
the model's code and output are exactly as left. **Persistence proven.**
9. **Result card** — "OPERATIONS CONDUCTED": built by local qwen-coder · saved to
image · session closed · reloaded intact.
## Acceptance (what the PoC script asserts)
- After step 4: `docker exec hack-house cat /root/fib.py` is non-empty AND running
it prints 10 Fibonacci numbers (`0 1 1 2 3 5 8 13 21 34`).
- After step 5: `docker images hh-snap --format '{{.Tag}}'` contains `buildbox`.
- After step 6 (stop): `docker ps -a --format '{{.Names}}'` has **no** `hack-house`;
the `hh-snap:buildbox` image still exists.
- After step 7-8 (load): the **new** `hack-house` container's `/root/fib.py`
matches the original byte-for-byte.
## Execution
`hh/demo-save-load.sh` drives the whole thing headlessly over tmux (per the
TUI-tmux test recipe): boots the server, runs client **session A**, injects the
beats with `send-keys`, verifies via `capture-pane` + `docker exec`, then quits
session A and opens client **session B** to load and confirm. It is a PoC /
correctness harness first; once green it feeds the polished `video-toolkit`
render.
### Gotchas baked into the script
- TUI doesn't bind Ctrl-U (it inserts a literal `u`); clear input with `BSpace`.
Send text with `send-keys -l "<text>"` then a separate `Enter`; don't race renders.
- Agent name is hardcoded `oracle`; only one `/ai start` per room.
- Keep `!task` phrasing single-line; the agent's drive output lands in the sandbox
pane, not chat.
- `/sbx load` refuses if a sandbox is already running — stop first.
- Docker daemon must be up (`docker info`); `/sbx launch docker --start` can boot
it (sudo) but we pre-check instead.
- Snapshot label charset: alphanumerics, `.`, `_`, `-` (≤64).
### Teardown / cleanup
The script removes the `hack-house` container and (optionally) the `hh-snap:*`
demo images it created, and kills the server + tmux sessions, so reruns are clean.

View File

@ -0,0 +1,280 @@
# hack-house → VirtualBox Sandbox Backend — Spec
> **Status:** Draft v1 · **Date:** 2026-06-03
> **Scope:** Add VirtualBox as a sandbox backend, in two complementary modes:
> **(A)** a headless, owner-hosted VM driven through the existing shared PTY
> (drops into the current `Backend` abstraction), and **(B)** a *portable VM
> appliance* the room can hand out so each member boots the **actual GUI locally**
> on their own machine — including detecting and (with consent) installing
> VirtualBox if it's missing.
> **Baseline reviewed:** `hh/src/sbx.rs`, `hh/src/app.rs` @ `feat/ai-context`.
---
## 0. Decisions to lock
| # | Decision | Proposal |
|---|----------|----------|
| A | VirtualBox transport into the guest | **SSH** (NAT port-forward) as primary; `VBoxManage guestcontrol` as a no-SSH fallback. SSH gives a clean PTY and reuses the multipass provisioning model verbatim. |
| B | Single shared instance vs. per-user local copies | **Both, as two modes.** Mode A = one owner-hosted headless VM, shared PTY (zero-knowledge preserved). Mode B = export the VM as an `.ova`, distribute over the *existing* `/send` channel, each member imports + launches the GUI locally. |
| C | GUI sharing | **No live framebuffer relay.** Sharing the *desktop* = sharing the *appliance*, not the pixels. Sidesteps the zero-knowledge problem entirely (the image rides the encrypted file transfer). |
| D | Installation | **Detect-first, then opt-in install.** `ensure-vbox.sh` mirrors `ensure-docker.sh`: never installs silently; prints what it would do and requires an explicit `--yes` (or the `/sbx ... --install` flag). |
---
## 1. Why VirtualBox, and what's genuinely new
The existing backends (`Backend::{Local,Docker,Multipass}` in `hh/src/sbx.rs:51`)
are all **headless and text-only**. The owner hosts the box and runs a local PTY
into it (`command_for`, `sbx.rs:278`); the PTY bytes are encrypted with the room
key and relayed as `_sbx` frames, so the server only ever sees ciphertext.
VirtualBox adds two things the others can't:
1. **Arbitrary guest OSes** — Windows, BSD, old kernels, purpose-built
malware-analysis or CTF images — with a mature snapshot tree.
2. **A real graphical desktop.** This is the part that doesn't fit the PTY relay,
and it's the part you explicitly want: *people share a VM and each launches it
locally with the GUI.*
So the integration is deliberately split so each mode keeps the project's trust
model intact:
- **Mode A (shared shell):** one VM, owner-hosted, driven collaboratively through
the shared PTY — identical trust story to multipass.
- **Mode B (shared appliance):** the VM *image* is the shared artifact. It travels
over the existing E2E `/send` transfer; each member runs their **own local copy**
in the VirtualBox GUI. No pixels cross the wire — only the (encrypted) disk image.
---
## 2. Mode A — headless VirtualBox as a 4th backend
### 2.1 Enum + labels (`hh/src/sbx.rs`)
```rust
pub enum Backend { Local, Docker, Multipass, VirtualBox } // new variant
```
- `Backend::parse`: add `"virtualbox" | "vbox" => Some(Backend::VirtualBox)`.
- `label()`: `"virtualbox"`.
- `default_image()`: a named base appliance, e.g. `"hh-base"` (an Ubuntu image we
pre-register), since VirtualBox has no "pull by release string" like multipass.
### 2.2 Mapping every existing fn to `VBoxManage`
Each function in `sbx.rs` gets one new match arm. The transport (how a command
reaches *inside* the guest) is SSH over a NAT port-forward.
| Fn (`sbx.rs`) | Multipass today | VirtualBox arm |
|---|---|---|
| `prepare` (`:86`) | `multipass launch` | import appliance if absent (`VBoxManage import <ova> --vsys 0 --vmname <name>`), set forward (`modifyvm <name> --natpf1 "ssh,tcp,127.0.0.1,<port>,,22"`), then `startvm <name> --type headless`; if it already exists just `startvm`. Idempotent like the multipass arm. |
| `command_for` (`:278`) | `multipass exec … bash` | `ssh -tt -p <port> -o StrictHostKeyChecking=no <run_user>@127.0.0.1` (login shell). `run_user` empty ⇒ default account. |
| `provision` (`:355`) | `useradd` via `mp()` | identical `useradd`/sudoers scripts, run through an SSH helper `vbx()` (mirrors `mp()`/`dk()` at `sbx.rs:319`). Owner gets passwordless sudo via the same `mp_grant_sudo` script. |
| `set_sudo` (`:387`) | sudoers drop-in | same script over SSH; gate on `backend == VirtualBox`. |
| `save_state` (`:208`) | `multipass snapshot` | `VBoxManage snapshot <name> take <label> --pause` |
| `list_snapshots` (`:241`) | `multipass list --snapshots` | `VBoxManage snapshot <name> list --machinereadable`, parse `SnapshotName*=` lines |
| `teardown` (`:172`) | `multipass delete --purge` | `VBoxManage controlvm <name> poweroff` then `unregistervm <name> --delete` |
**Why SSH over `guestcontrol`:** `VBoxManage guestcontrol <vm> run --exe /bin/bash`
requires Guest Additions in the image and gives a rough PTY; interactive driving
is clunky. SSH needs only an sshd in the base image (cheap to bake once) and the
whole P4 permission stack (`/grant`, `/sudo`, drive ACL) works **unchanged**
because it's all "run a command through a transport." `guestcontrol` stays as a
documented fallback for images without sshd.
### 2.3 Port allocation
Each headless VM needs a unique host loopback port for its SSH forward. Reuse the
free-port discovery already used by the save/load PoC (see
`docs/demo-save-load-poc.md` / `hh/demo-save-load.sh`) so two sandboxes on one
host don't collide. The owner is the only one who ever connects to it
(`127.0.0.1:<port>`), so it never leaves the host.
### 2.4 Snapshots tie into existing `/sbx save`/`load`
`/sbx save`/`load`/`snaps` (`app.rs:1244``1307`) already branch on backend.
VirtualBox snapshots map cleanly onto the same commands, so the existing UX
("save state → quit → load") works for VBox with no new commands — just the new
match arms in `save_state`/`list_snapshots`, plus a VirtualBox arm in the `load`
path (today `load` hardcodes Docker at `app.rs:1282`; generalize it to the broker's
backend).
---
## 3. Mode B — share a VM, launch the GUI locally
This is the new product surface. The shared artifact is the **appliance**, not a
live session. Flow:
```
owner: /sbx export [name] → freezes the VM to an .ova on the owner's disk
owner: /send hh-box.ova → existing E2E file transfer (chunked, SHA-256)
member: /accept → lands in ./downloads/ (existing path)
member: /sbx open ./downloads/hh-box.ova
→ ensure VirtualBox is installed (detect; offer install)
→ VBoxManage import … --vmname hh-box-<member>
→ VBoxManage startvm hh-box-<member> --type gui ← real desktop window
```
Everyone ends up running an **identical local VM** — same disk, same tools, same
state at export time — but each on their own machine, with a full GUI. Because the
image moved over the encrypted `/send` channel, the server never saw it, and there
is no live cross-machine display traffic to secure.
### 3.1 New commands
| Command | Who | Action |
|---|---|---|
| `/sbx export [name]` | owner of a VBox sandbox | `VBoxManage export <name> -o <out>.ova` (VM should be powered off or snapshot-exported). Emits the path and hints `/send` it. |
| `/sbx open <file.ova> [--install]` | any member | Detect VirtualBox → (consent) install if missing → `import` under a per-member VM name → `startvm --type gui`. |
| `/sbx gui [name]` | any member | Launch the GUI for an already-imported VM (`startvm --type gui`), or attach a running headless one (`VBoxManage startvm <name> --type separate`). |
`/sbx open` and `/sbx export` are deliberately **local-only** operations (like
`/pw`): they never broadcast. The only thing that crosses the room is the `.ova`
you choose to `/send`.
### 3.2 Relationship between the two modes
They compose: the owner can run a **Mode A** headless VM, `/sbx save` a snapshot,
`/sbx export` it to an `.ova`, and `/send` it — at which point each member can
`/sbx open` it and keep working **locally in the GUI** from the exact same state.
"Collaborate live in one shared shell" and "everyone take a copy home and run the
desktop" become two ends of one workflow.
---
## 4. Installation handling — `ensure-vbox.sh` (detect first)
Mirror `hh/ensure-docker.sh` exactly in spirit: **a backend never installs
anything silently.**
### 4.1 Detection (always first, zero side effects)
```rust
pub fn vbox_installed() -> bool { // sbx.rs, beside docker_daemon_up()
Command::new("VBoxManage").arg("--version")
.stdout(Stdio::null()).stderr(Stdio::null())
.status().map(|s| s.success()).unwrap_or(false)
}
```
If present, every Mode A/B path proceeds normally. If absent, the command **fails
loud with the remedy**, exactly like the Docker daemon message at `app.rs:1206`:
> `VirtualBox isn't installed — retry with /sbx open <file> --install to install it (needs sudo), or run ./ensure-vbox.sh in a terminal first`
### 4.2 The installer script
`hh/ensure-vbox.sh`, invoked as `bash ensure-vbox.sh --yes` only when the user
passed `--install` (matching how `prepare` shells `ensure-docker.sh --yes` at
`sbx.rs:31`). It:
1. Re-checks `VBoxManage --version`; if found, exits 0 immediately (idempotent).
2. Detects the platform and prints the **exact** command it will run *before*
running it:
- **Debian/Ubuntu:** `sudo apt-get install -y virtualbox` (or add Oracle's repo
for a current build).
- **Fedora:** `sudo dnf install -y VirtualBox`.
- **Arch:** `sudo pacman -S --noconfirm virtualbox`.
- **macOS:** `brew install --cask virtualbox` (note: needs the kernel-extension
approval in System Settings; the script surfaces that as a manual step).
- **Windows / unknown:** do **not** attempt; point at the download page and the
`winget install Oracle.VirtualBox` one-liner.
3. On any failure, surfaces the last stderr line through the returned error (same
pattern as `start_docker_daemon` at `sbx.rs:31`) so it lands in the TUI error
popup, never bleeding raw onto the surface.
> **Honesty note for the spec:** VirtualBox needs a host kernel module
> (`vboxdrv`) and, on Secure-Boot machines, a signed/enrolled MOK. The script
> detects Secure Boot (`mokutil --sb-state`) and, rather than fail opaquely,
> tells the user the one manual step required. We check; we don't pretend it's
> always one command.
### 4.3 Consent UX
No surprise installs, no surprise sudo. The flow is: try → detect missing → tell
the user the remedy and the exact command → they re-issue with `--install`. This
matches the project's existing posture (`/sbx launch docker --start` is opt-in
daemon-start, not automatic).
---
## 5. Command surface (additions)
Extend the `/sbx` usage line (`app.rs:1309`):
```
/sbx launch [local|docker|multipass|virtualbox] [image]
/sbx stop | save [label] | load <label> | snaps
/sbx export [name] # freeze host VM → .ova (then /send it)
/sbx open <file.ova> [--install]# import + launch the GUI locally
/sbx gui [name] [--install] # launch GUI for an imported VM
```
| Command | Broadcasts? | Notes |
|---|---|---|
| `launch virtualbox` | yes (`_sbx status`) | Mode A; owner-hosted headless, shared PTY |
| `export` | no | local; produces an artifact to `/send` |
| `open` / `gui` | no | local; each member's own GUI window |
---
## 6. Code touchpoints (what actually changes)
Mode A is small — a new enum variant and ~7 match arms; the broker, drive ACL,
sudo delegation, and save/load machinery are all backend-agnostic already.
| File | Change |
|---|---|
| `hh/src/sbx.rs` | `Backend::VirtualBox` variant; arms in `parse`/`label`/`default_image`/`prepare`/`command_for`/`provision`/`set_sudo`/`save_state`/`list_snapshots`/`teardown`; `vbox_installed()`, `vbx()` SSH helper; `export_ova()` + `open_local()` (Mode B, local-only). |
| `hh/src/app.rs` | accept `virtualbox`/`vbox` in `/sbx launch`; generalize `/sbx load` off hardcoded Docker (`:1282`) to the broker backend; new `/sbx export`, `/sbx open`, `/sbx gui` arms; extend usage string (`:1309`); install-missing error mirroring `:1206`. |
| `hh/ensure-vbox.sh` (new) | detect-first installer, per `§4`. |
| `hh/src/ui.rs` | add the three new commands to the clustered help menu. |
| `README.MD` | backend table (`:175`) gains a `virtualbox` row; a short "share a VM, run it locally" subsection. |
| `models.toml` / docs | none. |
---
## 7. Security & trust notes
- **Mode A preserves zero-knowledge**: the VM is owner-local, the SSH forward is
`127.0.0.1`-only, and only PTY ciphertext crosses the room — same as multipass.
- **Mode B preserves zero-knowledge** by *not* streaming a display at all. The
`.ova` is just a file through the existing SHA-256-verified, Fernet-encrypted
`/send` path (`hh/src/ft.rs`). Note the existing **50 MB** transfer cap (README
`:215`) — real VM images blow past it, so Mode B needs either a raised cap for
appliances or an out-of-band hand-off (documented honestly; see Open Questions).
- **No silent install, no silent sudo** (`§4.3`).
- **Appliance provenance**: a shared `.ova` is executable content. The spec should
warn recipients exactly as `/accept` already implies trust in the sender — an
imported VM runs code. Worth an explicit one-line caution in the `open` flow.
---
## 8. Phasing
| Phase | Deliverable | Gate |
|---|---|---|
| **V0** | `vbox_installed()` + `ensure-vbox.sh` (detect-first, Linux apt path) | manual: missing → guided install → present |
| **V1** | `Backend::VirtualBox` Mode A: launch/stop/PTY over SSH, provision, sudo | shared shell into a headless VBox VM, two clients driving |
| **V2** | snapshots: `save`/`load`/`snaps` arms; generalize `/sbx load` backend | save → stop → load round-trips on VBox |
| **V3** | Mode B: `/sbx export``/send``/sbx open` GUI launch | a second machine boots the shared appliance's desktop |
| **V4** | macOS/other-distro install paths; transfer-cap handling for `.ova` | cross-platform `open` works end-to-end |
---
## 9. Open questions
1. **Appliance size vs. the 50 MB `/send` cap.** Options: raise the cap for `.ova`
only, compress (`.ova` + zstd), ship a thin base + a provisioning script instead
of a fat image, or accept out-of-band transfer for large VMs and keep `/send`
for small ones. Needs a product call.
2. **Base image sourcing.** Do we bake and ship an `hh-base.ova` (sshd + sudo
preinstalled) so Mode A "just works," or import whatever the user points at and
require them to have sshd? Baking one image once is the smoother UX.
3. **Per-member VM naming / cleanup** for Mode B locals — namespacing
(`hh-box-<member>`) and a `/sbx open --replace` to re-import cleanly.
4. **`guestcontrol` fallback** — ship it in V1 or document-only until someone needs
a no-sshd image?
```

240
hh/demo-save-load.sh Executable file
View File

@ -0,0 +1,240 @@
#!/usr/bin/env bash
# demo-save-load.sh — PoC harness for the "persistent sandbox" video beat.
#
# Flow (see docs/demo-save-load-poc.md):
# session A: /sbx launch docker → /ai start → /grant oracle →
# /ai oracle !build fib.py & run it → /sbx save buildbox → quit
# prove: container purged on quit, but hh-snap:buildbox image survives
# session B: fresh client → /sbx load buildbox → the model's code is intact
#
# Headless: drives the ratatui client over tmux send-keys, asserts via
# capture-pane + `docker exec`. PoC/correctness first; feeds video-toolkit later.
#
# Usage: hh/demo-save-load.sh [--keep]
# --keep leave the server, container, image and tmux sessions up afterwards
set -uo pipefail
# ---- config -----------------------------------------------------------------
REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# Pick a free TCP port so we never collide with a stale server from another
# session (a leftover server on a fixed port answers SRP with its own password
# → spurious 401s). Honour an explicit $PORT if the caller forces one.
pick_port() { local p; for p in $(seq 4200 4280); do ss -ltn 2>/dev/null | grep -q ":$p " || { echo "$p"; return; }; done; echo 4173; }
PORT="${PORT:-$(pick_port)}"
PW="${PW:-malware-bless}"
LABEL="${LABEL:-buildbox}"
IMG="${IMG:-python:3.12-slim}" # base image: ships python3 so the built code runs
CTR="hack-house" # sbx::SBX_NAME — the container/instance name
SNAP="hh-snap:${LABEL}"
PY="$REPO/.venv/bin/python"
BIN="$REPO/hh/target/debug/hack-house"
SRV_SESS="hhpoc-srv"
A_SESS="hhpoc-a"
B_SESS="hhpoc-b"
EVID="$(mktemp -d /tmp/hh-poc.XXXXXX)"
KEEP=0; [[ "${1:-}" == "--keep" ]] && KEEP=1
GREEN=$'\e[32m'; RED=$'\e[31m'; YEL=$'\e[33m'; DIM=$'\e[2m'; RST=$'\e[0m'
step() { printf '\n%s== %s ==%s\n' "$YEL" "$*" "$RST"; }
ok() { printf '%s ok %s%s\n' "$GREEN" "$*" "$RST"; }
bad() { printf '%s XX %s%s\n' "$RED" "$*" "$RST"; }
note() { printf '%s %s%s\n' "$DIM" "$*" "$RST"; }
FAIL=0
fail() { bad "$*"; FAIL=1; }
cleanup() {
if [[ $KEEP -eq 1 ]]; then
note "--keep: leaving server/sessions/image up. Evidence: $EVID"
return
fi
step "cleanup"
tmux kill-session -t "$A_SESS" 2>/dev/null
tmux kill-session -t "$B_SESS" 2>/dev/null
tmux kill-session -t "$SRV_SESS" 2>/dev/null
docker rm -f "$CTR" >/dev/null 2>&1
docker rmi -f "$SNAP" >/dev/null 2>&1
note "removed container + $SNAP; sessions killed. Evidence kept: $EVID"
}
trap cleanup EXIT
# ---- helpers ----------------------------------------------------------------
# say <session> <text> : type a literal line then Enter (no Ctrl-U; renders race)
say() {
local sess="$1"; shift
tmux send-keys -t "$sess" -l "$*"
sleep 0.4
tmux send-keys -t "$sess" Enter
sleep 0.6
}
cap() { tmux capture-pane -t "$1" -p 2>/dev/null; } # snapshot a pane to stdout
snap_evid() { cap "$1" > "$EVID/$2.txt"; } # ...and save it
# wait_for <session> <regex> <timeout_s> : poll the pane until regex appears
wait_for() {
local sess="$1" re="$2" t="${3:-30}" i=0
while (( i < t*2 )); do
cap "$sess" | grep -qE "$re" && return 0
sleep 0.5; ((i++))
done
return 1
}
# wait_cmd <cmd...> : succeeds within a timeout (seconds via $WT, default 30)
wait_cmd() {
local t="${WT:-30}" i=0
while (( i < t )); do "$@" >/dev/null 2>&1 && return 0; sleep 1; ((i++)); done
return 1
}
# ---- 0. preflight -----------------------------------------------------------
step "preflight"
command -v tmux >/dev/null || { echo "tmux required"; exit 2; }
[[ -x "$PY" ]] || { echo "venv python missing: $PY"; exit 2; }
docker info >/dev/null 2>&1 || { echo "docker daemon down - start it first"; exit 2; }
ollama list 2>/dev/null | grep -q 'qwen2.5-coder' || note "warn: qwen2.5-coder not in 'ollama list' (coder path may fall back)"
ollama list 2>/dev/null | grep -q 'qwen2.5:3b' || note "warn: qwen2.5:3b not present (chat default)"
docker image inspect "$IMG" >/dev/null 2>&1 || { echo "pulling $IMG..."; docker pull "$IMG"; }
if [[ ! -x "$BIN" ]]; then
step "building client (debug)"; ( cd "$REPO/hh" && cargo build ) || exit 2
fi
ok "tools present, docker up, models checked"
note "evidence dir: $EVID"
# clear any stale state
tmux kill-session -t "$A_SESS" 2>/dev/null; tmux kill-session -t "$B_SESS" 2>/dev/null
tmux kill-session -t "$SRV_SESS" 2>/dev/null
docker rm -f "$CTR" >/dev/null 2>&1
docker rmi -f "$SNAP" >/dev/null 2>&1
# ---- 1. server --------------------------------------------------------------
step "boot server :$PORT"
tmux new-session -d -s "$SRV_SESS" -x 200 -y 50 \
"cd '$REPO' && '$PY' cmd_chat.py serve 127.0.0.1 $PORT --password '$PW' --no-tls 2>&1 | tee '$EVID/server.log'"
WT=20 wait_cmd bash -c "grep -qiE 'listening|running|serving|started|websocket' '$EVID/server.log'" \
|| sleep 3 # some builds log nothing; give it a beat
ok "server session up"
# ---- 2. session A: client ---------------------------------------------------
step "session A - alice joins"
tmux new-session -d -s "$A_SESS" -x 200 -y 50 \
"'$BIN' connect 127.0.0.1 $PORT alice --password '$PW' --no-tls 2>&1 | tee '$EVID/clientA.log'"
wait_for "$A_SESS" 'alice|roster|hack-house|owner' 20 && ok "alice in the room" \
|| fail "alice never joined (see $EVID/clientA.log)"
snap_evid "$A_SESS" 01-joined
# ---- 3. launch docker sandbox ----------------------------------------------
step "launch docker sandbox ($IMG)"
say "$A_SESS" "/sbx launch docker $IMG"
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" \
&& ok "container '$CTR' running" || fail "sandbox container never came up"
wait_for "$A_SESS" 'summoned|sandbox|ready|online' 60 >/dev/null
snap_evid "$A_SESS" 02-sandbox
# ---- 4. spawn the coder agent + grant drive --------------------------------
step "spawn oracle (qwen2.5:3b chat, qwen2.5-coder:1.5b for !task)"
say "$A_SESS" "/ai start"
wait_for "$A_SESS" 'oracle|online|ollama' 45 && ok "oracle announced" \
|| note "no 'online' line yet - agent log: ${TMPDIR:-/tmp}/hh-agent-oracle.log"
say "$A_SESS" "/grant oracle"
sleep 1
snap_evid "$A_SESS" 03-agent
# ---- 5. fast model builds code in the sandbox ------------------------------
step "fast qwen builds /root/fib.py in the sandbox"
say "$A_SESS" "/ai oracle !create /root/fib.py that prints the first 10 fibonacci numbers space-separated on one line, then run it with python3"
# Give the CPU coder model room to think, then poll for the file.
WT=150 wait_cmd docker exec "$CTR" test -s /root/fib.py
NEED='0 1 1 2 3 5 8 13 21 34'
runout() { docker exec "$CTR" sh -c 'cd /root && python3 fib.py' 2>&1; }
# Accept the model's work only if the file exists AND actually runs to the right
# sequence. A 1.5B model typed through a PTY sometimes drops indentation, so fall
# back to a known-good file (written BEFORE save, so the snapshot is meaningful).
if docker exec "$CTR" test -s /root/fib.py 2>/dev/null && runout | grep -qE "$NEED"; then
ok "model wrote a working /root/fib.py"
BUILT_BY="qwen2.5-coder"
else
note "model output missing or not runnable - writing deterministic fallback so the"
note "save/load proof still completes (retry for a clean model take in the video)."
docker exec "$CTR" sh -c 'cat > /root/fib.py <<"PY"
a, b = 0, 1
out = []
for _ in range(10):
out.append(str(a))
a, b = b, a + b
print(" ".join(out))
PY'
BUILT_BY="fallback"
fi
runout > "$EVID/fib-output.txt" 2>&1
ORIG_SHA="$(docker exec "$CTR" sha256sum /root/fib.py | awk '{print $1}')"
note "fib.py built by: $BUILT_BY"
note "fib.py output: $(cat "$EVID/fib-output.txt")"
docker exec "$CTR" cat /root/fib.py > "$EVID/fib-src-original.py"
snap_evid "$A_SESS" 04-built
grep -qE "$NEED" "$EVID/fib-output.txt" \
&& ok "fib.py prints the sequence" || fail "fib.py output unexpected"
# ---- 6. snapshot to an image -----------------------------------------------
step "/sbx save $LABEL (docker commit -> $SNAP)"
say "$A_SESS" "/sbx save $LABEL"
WT=40 wait_cmd bash -c "docker images $SNAP --format '{{.Tag}}' | grep -qx '$LABEL'" \
&& ok "image $SNAP created" || fail "snapshot image not found"
wait_for "$A_SESS" "saved|hh-snap|$LABEL" 10 >/dev/null
snap_evid "$A_SESS" 05-saved
# ---- 7. close the session (quit the client) --------------------------------
step "close session A (Ctrl-Q -> teardown purges the container)"
tmux send-keys -t "$A_SESS" C-q
sleep 3
tmux kill-session -t "$A_SESS" 2>/dev/null
WT=20 wait_cmd bash -c "! docker ps -a --format '{{.Names}}' | grep -qx '$CTR'" \
&& ok "container '$CTR' purged on quit" || fail "container still present after quit"
if docker images "$SNAP" --format '{{.Tag}}' | grep -qx "$LABEL"; then
ok "image $SNAP survived the purge"
else
fail "image $SNAP missing after purge"
fi
# ---- 8. session B: reopen and load -----------------------------------------
step "session B - fresh client, /sbx load $LABEL"
tmux new-session -d -s "$B_SESS" -x 200 -y 50 \
"'$BIN' connect 127.0.0.1 $PORT alice --password '$PW' --no-tls 2>&1 | tee '$EVID/clientB.log'"
wait_for "$B_SESS" 'alice|roster|hack-house|owner' 20 && ok "alice re-joined" \
|| fail "alice never re-joined"
say "$B_SESS" "/sbx load $LABEL"
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" \
&& ok "container relaunched from $SNAP" || fail "load never started a container"
wait_for "$B_SESS" 'summoned|sandbox|ready|loading|online' 60 >/dev/null
snap_evid "$B_SESS" 06-loaded
# ---- 9. the reveal: the model's code is intact -----------------------------
step "verify the work persisted"
WT=30 wait_cmd docker exec "$CTR" test -s /root/fib.py
NEW_SHA="$(docker exec "$CTR" sha256sum /root/fib.py 2>/dev/null | awk '{print $1}')"
docker exec "$CTR" cat /root/fib.py > "$EVID/fib-src-loaded.py" 2>/dev/null
docker exec "$CTR" sh -c 'cd /root && python3 fib.py' > "$EVID/fib-output-loaded.txt" 2>&1
note "original sha: $ORIG_SHA"
note "loaded sha: $NEW_SHA"
note "loaded output: $(cat "$EVID/fib-output-loaded.txt" 2>/dev/null)"
if [[ -n "$NEW_SHA" && "$NEW_SHA" == "$ORIG_SHA" ]]; then
ok "fib.py is byte-for-byte identical after close+reload - PERSISTENCE PROVEN"
else
fail "fib.py differs or missing after reload"
fi
# show it on the TUI for the camera
tmux send-keys -t "$B_SESS" F2; sleep 1 # drive
say "$B_SESS" "cat /root/fib.py && python3 /root/fib.py"
sleep 2
snap_evid "$B_SESS" 07-reveal
# ---- summary ----------------------------------------------------------------
step "result"
if [[ $FAIL -eq 0 ]]; then
printf '%sPoC PASS%s - built-by=%s, saved=%s, purged-on-quit, reloaded-intact\n' \
"$GREEN" "$RST" "$BUILT_BY" "$SNAP"
else
printf '%sPoC FAIL%s - inspect captures in %s\n' "$RED" "$RST" "$EVID"
fi
note "captures: $EVID/{01-joined,02-sandbox,03-agent,04-built,05-saved,06-loaded,07-reveal}.txt"
note "code: $EVID/fib-src-original.py vs fib-src-loaded.py"
exit $FAIL

245
hh/film-save-load.sh Executable file
View File

@ -0,0 +1,245 @@
#!/usr/bin/env bash
# film-save-load.sh — RECORD the "persistent sandbox" beat to an asciinema cast,
# then render an MP4. Sibling of demo-save-load.sh (the correctness harness):
# this one is for the camera, so it paces the beats and records a single,
# continuous take of the real flow:
#
# launch docker sandbox → /ai start (fast qwen) → agent builds code in it
# → /sbx save <label> → Ctrl-Q quit (container purged, image survives)
# → fresh client → /sbx load <label> → reveal: the work is intact
#
# Recording trick (per the TUI-tmux recipe): the demo runs in an inner tmux
# session; `asciinema rec` runs in its own detached session that `tmux attach`es
# to the inner one, so it mirrors exactly what we drive with send-keys.
#
# Usage: hh/film-save-load.sh [--keep] [--no-render]
# --keep leave server/sessions/container/image up afterwards
# --no-render stop after writing the .cast (skip the mp4 render)
set -uo pipefail
# ---- config -----------------------------------------------------------------
REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
pick_port() { local p; for p in $(seq 4200 4280); do ss -ltn 2>/dev/null | grep -q ":$p " || { echo "$p"; return; }; done; echo 4173; }
PORT="${PORT:-$(pick_port)}"
PW="${PW:-malware-bless}"
LABEL="${LABEL:-buildbox}"
IMG="${IMG:-python:3.12-slim}"
CTR="hack-house"
SNAP="hh-snap:${LABEL}"
PY="$REPO/.venv/bin/python"
BIN="$REPO/hh/target/debug/hack-house"
COLS=110; ROWS=32
SRV_SESS="hhfilm-srv" # server (not recorded)
RUN_SESS="hhfilm" # the demo pane we drive
REC_SESS="hhfilm-rec" # asciinema attaches here and records
OUTDIR="$REPO/docs/demo"
CAST="$OUTDIR/save-load.cast"
MP4="$OUTDIR/save-load.mp4"
CODER="qwen2.5-coder:1.5b"
NEED='0 1 1 2 3 5 8 13 21 34'
KEEP=0; RENDER=1
for a in "$@"; do
case "$a" in
--keep) KEEP=1 ;;
--no-render) RENDER=0 ;;
esac
done
GREEN=$'\e[32m'; RED=$'\e[31m'; YEL=$'\e[33m'; DIM=$'\e[2m'; RST=$'\e[0m'
step() { printf '\n%s== %s ==%s\n' "$YEL" "$*" "$RST"; }
ok() { printf '%s ok %s%s\n' "$GREEN" "$*" "$RST"; }
bad() { printf '%s XX %s%s\n' "$RED" "$*" "$RST"; }
note() { printf '%s %s%s\n' "$DIM" "$*" "$RST"; }
FAIL=0; fail() { bad "$*"; FAIL=1; }
cleanup() {
if [[ $KEEP -eq 1 ]]; then
note "--keep: leaving server/sessions/container/image up."
return
fi
step "cleanup"
tmux kill-session -t "$REC_SESS" 2>/dev/null
tmux kill-session -t "$RUN_SESS" 2>/dev/null
tmux kill-session -t "$SRV_SESS" 2>/dev/null
docker rm -f "$CTR" >/dev/null 2>&1
docker rmi -f "$SNAP" >/dev/null 2>&1
note "removed container + $SNAP; sessions killed."
}
trap cleanup EXIT
# ---- helpers ----------------------------------------------------------------
# type into the recorded pane: literal text, a beat, then Enter (no Ctrl-U)
say() { tmux send-keys -t "$RUN_SESS" -l "$*"; sleep 0.5; tmux send-keys -t "$RUN_SESS" Enter; sleep 0.8; }
key() { tmux send-keys -t "$RUN_SESS" "$@"; }
cap() { tmux capture-pane -t "$RUN_SESS" -p 2>/dev/null; }
wait_for() { local re="$1" t="${2:-30}" i=0; while (( i < t*2 )); do cap | grep -qE "$re" && return 0; sleep 0.5; ((i++)); done; return 1; }
wait_cmd() { local t="${WT:-30}" i=0; while (( i < t )); do "$@" >/dev/null 2>&1 && return 0; sleep 1; ((i++)); done; return 1; }
runout() { docker exec "$CTR" sh -c 'cd /root && python3 fib.py' 2>&1; }
# ---- 0. preflight -----------------------------------------------------------
step "preflight"
command -v tmux >/dev/null || { echo "tmux required"; exit 2; }
command -v "$HOME/anaconda3/bin/asciinema" >/dev/null || command -v asciinema >/dev/null || { echo "asciinema required"; exit 2; }
ASCIINEMA="$( [[ -x "$HOME/anaconda3/bin/asciinema" ]] && echo "$HOME/anaconda3/bin/asciinema" || command -v asciinema )"
[[ -x "$PY" ]] || { echo "venv python missing: $PY"; exit 2; }
docker info >/dev/null 2>&1 || { echo "docker daemon down"; exit 2; }
ollama list 2>/dev/null | grep -q "$CODER" || { echo "coder model $CODER not pulled"; exit 2; }
docker image inspect "$IMG" >/dev/null 2>&1 || { echo "pulling $IMG..."; docker pull "$IMG"; }
[[ -x "$BIN" ]] || { step "building client"; ( cd "$REPO/hh" && cargo build ) || exit 2; }
mkdir -p "$OUTDIR"
ok "tools present, docker up, $CODER ready"
# clear stale state
tmux kill-session -t "$REC_SESS" 2>/dev/null; tmux kill-session -t "$RUN_SESS" 2>/dev/null
tmux kill-session -t "$SRV_SESS" 2>/dev/null
docker rm -f "$CTR" >/dev/null 2>&1; docker rmi -f "$SNAP" >/dev/null 2>&1
rm -f "$CAST"
# ---- 0b. pre-warm the coder so first-token latency on camera is short -------
step "pre-warm $CODER (off camera)"
"$PY" - "$CODER" <<'PY' 2>/dev/null || true
import sys, json, urllib.request
m = sys.argv[1]
req = urllib.request.Request("http://127.0.0.1:11434/api/generate",
data=json.dumps({"model": m, "prompt": "ok", "stream": False}).encode(),
headers={"Content-Type": "application/json"})
try:
urllib.request.urlopen(req, timeout=120).read()
except Exception:
pass
PY
ok "model warmed"
# ---- 1. server (not recorded) ----------------------------------------------
step "boot server :$PORT"
tmux new-session -d -s "$SRV_SESS" -x 200 -y 50 \
"cd '$REPO' && '$PY' cmd_chat.py serve 127.0.0.1 $PORT --password '$PW' --no-tls 2>&1 | tee /tmp/hhfilm-server.log"
WT=20 wait_cmd bash -c "grep -qiE 'listening|running|serving|started|websocket' /tmp/hhfilm-server.log" || sleep 3
ok "server up"
# ---- 2. inner demo pane + recorder -----------------------------------------
step "open recorded pane (${COLS}x${ROWS}) and start asciinema"
# inner demo session we drive (bash, sized for the cast)
tmux new-session -d -s "$RUN_SESS" -x "$COLS" -y "$ROWS" "bash --noprofile --norc"
sleep 0.5
tmux send-keys -t "$RUN_SESS" -l "cd '$REPO'"; tmux send-keys -t "$RUN_SESS" Enter
tmux send-keys -t "$RUN_SESS" -l "clear"; tmux send-keys -t "$RUN_SESS" Enter
sleep 0.5
# recorder session: same size, just attaches to the demo session and records it
tmux new-session -d -s "$REC_SESS" -x "$COLS" -y "$ROWS" \
"'$ASCIINEMA' rec --overwrite -c 'tmux attach -t $RUN_SESS' '$CAST'"
sleep 2
ok "recording → $CAST"
# ---- 3. title + join --------------------------------------------------------
say "echo '⛧ hack-house — ephemeral by default, persistent on demand'"
sleep 1.2
say "$BIN connect 127.0.0.1 $PORT alice --password '$PW' --no-tls"
wait_for 'alice|roster|hack-house|owner' 20 && ok "alice joined" || fail "alice never joined"
sleep 1.5
# ---- 4. launch docker sandbox ----------------------------------------------
step "launch docker sandbox"
say "/sbx launch docker $IMG"
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" && ok "container up" || fail "sandbox never came up"
wait_for 'summoned|sandbox|ready|online' 60 >/dev/null
sleep 1.5
# ---- 5. spawn fast qwen agent (auto-grant drive) ---------------------------
step "spawn oracle (auto-grant sandbox drive)"
say "/ai start $CODER allow"
wait_for 'oracle|online|ollama|qwen' 45 && ok "oracle online" || note "no online line yet"
sleep 1.5
# ---- 6. the fast model builds code in the sandbox --------------------------
# Transcription-only, ZERO-indentation task so the 1.5B coder can't break it
# through the PTY. Validate-by-running; retry once; abort before save if it
# still fails (no silent fallback in a film).
step "fast qwen writes /root/fib.py and runs it"
TASK="/ai oracle !create /root/fib.py with exactly two lines and nothing else: line 1 is nums = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] and line 2 is print(*nums) then run it with: python3 /root/fib.py"
BUILT=0
for attempt in 1 2; do
note "build attempt $attempt"
say "$TASK"
WT=180 wait_cmd docker exec "$CTR" test -s /root/fib.py
if docker exec "$CTR" test -s /root/fib.py 2>/dev/null && runout | grep -qE "$NEED"; then
BUILT=1; break
fi
note "output not yet correct; re-prompting"
sleep 2
done
if [[ $BUILT -eq 1 ]]; then
ok "model wrote a working /root/fib.py"
else
fail "model never produced runnable fib.py after retries — aborting before save"
exit $FAIL
fi
ORIG_SHA="$(docker exec "$CTR" sha256sum /root/fib.py | awk '{print $1}')"
note "fib output: $(runout)"
sleep 1.5
# ---- 7. snapshot to an image -----------------------------------------------
step "/sbx save $LABEL"
say "/sbx save $LABEL"
WT=40 wait_cmd bash -c "docker images $SNAP --format '{{.Tag}}' | grep -qx '$LABEL'" && ok "image $SNAP created" || fail "snapshot not found"
wait_for "saved|hh-snap|$LABEL" 10 >/dev/null
sleep 2
# ---- 8. close the session (Ctrl-Q purges the container) --------------------
step "quit client (Ctrl-Q → teardown purges container)"
key C-q
sleep 3
WT=20 wait_cmd bash -c "! docker ps -a --format '{{.Names}}' | grep -qx '$CTR'" && ok "container purged" || fail "container still present"
docker images "$SNAP" --format '{{.Tag}}' | grep -qx "$LABEL" && ok "image survived purge" || fail "image missing"
# prove it on camera
sleep 1
say "docker ps -a --format '{{.Names}}' | grep hack-house || echo '(no hack-house container — purged)'"
sleep 1.5
say "docker images hh-snap --format '⛧ {{.Repository}}:{{.Tag}}'"
sleep 2
# ---- 9. fresh client → load -------------------------------------------------
step "fresh session → /sbx load $LABEL"
say "$BIN connect 127.0.0.1 $PORT alice --password '$PW' --no-tls"
wait_for 'alice|roster|hack-house|owner' 20 && ok "alice re-joined" || fail "alice never re-joined"
sleep 1.5
say "/sbx load $LABEL"
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" && ok "container relaunched" || fail "load never started"
wait_for 'summoned|sandbox|ready|loading|online' 60 >/dev/null
sleep 1.5
# ---- 10. the reveal ---------------------------------------------------------
step "reveal: the model's code is intact"
WT=30 wait_cmd docker exec "$CTR" test -s /root/fib.py
NEW_SHA="$(docker exec "$CTR" sha256sum /root/fib.py 2>/dev/null | awk '{print $1}')"
note "original sha: $ORIG_SHA"
note "loaded sha: $NEW_SHA"
[[ -n "$NEW_SHA" && "$NEW_SHA" == "$ORIG_SHA" ]] && ok "byte-for-byte identical — PERSISTENCE PROVEN" || fail "differs/missing after reload"
# show it on the TUI for the camera
key F2; sleep 1
say "cat /root/fib.py && echo '---' && python3 /root/fib.py"
sleep 3
# ---- 11. stop recording -----------------------------------------------------
step "stop recording"
tmux kill-session -t "$RUN_SESS" 2>/dev/null # attach exits → asciinema writes the cast
sleep 2
[[ -s "$CAST" ]] && ok "cast written: $CAST ($(du -h "$CAST" | cut -f1))" || fail "cast not written"
# ---- 12. render -------------------------------------------------------------
if [[ $RENDER -eq 1 && $FAIL -eq 0 && -s "$CAST" ]]; then
step "render mp4"
"$REPO/../../video-toolkit/bin/cast2mp4.sh" "$CAST" "$MP4" --font-size 28 --theme dracula \
|| bash ~/coding/video-toolkit/bin/cast2mp4.sh "$CAST" "$MP4" --font-size 28 --theme dracula
[[ -s "$MP4" ]] && ok "mp4: $MP4 ($(du -h "$MP4" | cut -f1))" || fail "render produced no mp4"
fi
# ---- summary ----------------------------------------------------------------
step "result"
if [[ $FAIL -eq 0 ]]; then
printf '%sFILM OK%s — cast=%s%s\n' "$GREEN" "$RST" "$CAST" "$( [[ -s "$MP4" ]] && echo " mp4=$MP4" )"
else
printf '%sFILM FAIL%s — inspect %s\n' "$RED" "$RST" "$CAST"
fi
exit $FAIL