docs(sbx): VirtualBox backend spec, crypto pay-gate, save/load PoC
Add the VirtualBox sandbox design spec (headless 4th backend + share-an- appliance GUI mode with detect-first install), the crypto pay-to-join gate design, and the save/load PoC writeup with its demo/film driver scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
07e9c30846
commit
ca1666fbbb
385
docs/crypto-payment-gate.md
Normal file
385
docs/crypto-payment-gate.md
Normal file
|
|
@ -0,0 +1,385 @@
|
|||
# Design: crypto-currency "pay-to-join" gate (second gate after the password)
|
||||
|
||||
**Status:** research + plan (no code yet)
|
||||
**Goal:** require a crypto payment *in addition to* the SRP password before a
|
||||
client is admitted to a hack-house room, **without weakening** the existing
|
||||
end-to-end-encryption / zero-knowledge-relay properties.
|
||||
|
||||
---
|
||||
|
||||
## 1. What the gate must respect (current architecture)
|
||||
|
||||
Grounded in the present code so the design slots in cleanly:
|
||||
|
||||
- **Two-step SRP handshake over HTTP, then a WebSocket upgrade.**
|
||||
- `POST /srp/init` → returns `user_id, B, salt, room_salt` and enforces a *soft*
|
||||
capacity / username check (`cmd_chat/server/views.py:30-63`, capacity at `:48`).
|
||||
- `POST /srp/verify` → verifies the SRP proof `M`, then **commits the session**
|
||||
and issues the `ws_token` (`cmd_chat/server/views.py:66-111`). The
|
||||
**authoritative capacity gate is here** (`:84-85`), the proof check at `:87`,
|
||||
and `session_store.add(session)` at `:97`.
|
||||
- `GET (ws) /ws/chat?user_id=…&ws_token=…` → HMAC-checks the token and joins the
|
||||
room (`cmd_chat/server/views.py:114-149`).
|
||||
- **Rust client mirror:** `hh/src/net.rs:44-104` (`authenticate`) does init →
|
||||
`process_challenge` → verify → checks server `H_AMK` (mutual auth, MITM guard)
|
||||
→ derives the room Fernet → builds the ws URL. The Python client mirrors this
|
||||
in `cmd_chat/client/client.py`.
|
||||
- **Zero-knowledge relay.** The server never derives the room key
|
||||
(`HKDF(password, room_salt, "cmd-chat-room-key")` is computed **client-side
|
||||
only**) and only relays Fernet ciphertext. It *can* see: usernames, IPs,
|
||||
timestamps, `user_id`s, roster, ws-token validity. It *cannot* see: message
|
||||
plaintext, the room key, sandbox I/O, `_perm`/`_ft` control frames.
|
||||
- **Owner is authoritative, not the server.** Chat rooms have no owner; the first
|
||||
user to `/sbx launch` becomes sandbox owner and the **owner's client**
|
||||
broadcasts the ACL (`_perm:acl`) encrypted — the server relays it blindly. This
|
||||
is the precedent we lean on for the trust-minimized tier below.
|
||||
- **One password per server instance** (`serve --password`); a "room" is just a
|
||||
running server. No room-creation endpoint, no persistent accounts, no
|
||||
allow-lists/bans. Capacity defaults to 4 (`CMD_CHAT_MAX_USERS`). Per-IP rate
|
||||
limit 10/60s on both SRP endpoints.
|
||||
|
||||
**Key consequence:** payment is a *money* trust concern, orthogonal to the E2E
|
||||
*message* trust concern. We must not route the room key or plaintext through any
|
||||
payment logic, and we should keep the relay as close to "dumb" as possible.
|
||||
|
||||
---
|
||||
|
||||
## 2. Design principles / security requirements
|
||||
|
||||
1. **Non-custodial.** The app never holds user funds or spend-capable keys. Funds
|
||||
go **directly to the owner's** node/wallet. We only *issue invoices* and
|
||||
*verify receipt*. (Custody would add huge security + regulatory burden.)
|
||||
2. **Two independent gates.** Password (SRP) and payment are separate; failing
|
||||
either denies entry. Payment is checked **only after** SRP succeeds, so an
|
||||
unpaid attacker who lacks the password learns nothing and never sees a
|
||||
payment request.
|
||||
3. **Keep the relay dumb where possible.** Prefer designs where the server checks
|
||||
a *signature* or a *preimage* rather than running chain logic. Full chain
|
||||
access lives in an **owner-operated** component.
|
||||
4. **Atomic pay-to-join.** Never take money for a seat that isn't granted. Use a
|
||||
capacity reservation + Lightning **hold invoice** so payment settles *iff* the
|
||||
seat is actually granted; otherwise it auto-cancels (refund).
|
||||
5. **Single-use, identity-bound proofs.** Every proof is bound to one `user_id`
|
||||
+ room + nonce, expires quickly, and is burned on use (anti-replay / anti-share).
|
||||
6. **No fiat price oracle in the hot path.** Price in the native unit (sats /
|
||||
atomic units). Optional fiat display is cosmetic and never gates admission.
|
||||
7. **Least-privilege keys, never in the repo.** Invoice/read-only node
|
||||
credentials via env; no admin/spend keys. Testnet/regtest/signet by default;
|
||||
mainnet is an explicit opt-in flag.
|
||||
8. **Fail safe + DoS-resistant.** Payment-backend errors deny entry (don't
|
||||
fail-open) but never hang the SRP handshake; invoice issuance is rate-limited.
|
||||
9. **Privacy-aware.** Prefer rails that don't dox the payer's whole graph
|
||||
(Lightning ≫ on-chain; Monero strongest).
|
||||
|
||||
---
|
||||
|
||||
## 3. Currency / rail choice
|
||||
|
||||
| Rail | Settlement | Fees | Privacy | Proof primitive | Verdict |
|
||||
|------|-----------|------|---------|-----------------|---------|
|
||||
| **Bitcoin Lightning** | ~instant | ~0 | good | **payment preimage** (sha256(P)=hash) | **Recommended** |
|
||||
| Monero | ~2 min | low | **best** | tx proof (tx key + addr) | strong privacy alt |
|
||||
| On-chain L2 stablecoin (USDC on Base/Arbitrum) | seconds–min | low | poor | tx receipt + log | fiat-stable alt |
|
||||
| On-chain BTC / ETH L1 | 10 min / minutes | high | poor | tx receipt | bad UX, avoid |
|
||||
|
||||
**Recommendation: Bitcoin Lightning.** It matches the ephemeral, instant,
|
||||
micro-fee nature of "pay a few sats to join a terminal room," and the **preimage
|
||||
is a self-verifying proof of payment** that needs no trusted third party *if the
|
||||
verifier knows the invoice's payment hash*. The hold-invoice variant gives us
|
||||
atomic seat reservation for free.
|
||||
|
||||
The implementation is built behind an interface (§5) so Monero / L2-stablecoin
|
||||
backends can be added without touching the handshake.
|
||||
|
||||
---
|
||||
|
||||
## 4. The core idea: hold-invoice pay-to-join
|
||||
|
||||
Lightning **hold invoices** (a.k.a. HODL invoices, LND `addHoldInvoice` /
|
||||
CLN `holdinvoice`) let the receiver *accept* a payment (funds locked by the
|
||||
payer) and decide later whether to **settle** (claim) or **cancel** (refund):
|
||||
|
||||
1. Owner's node generates a hold invoice for the join fee, with a `payment_hash`
|
||||
it controls and a description that **commits to `user_id`**.
|
||||
2. Joiner pays; the HTLC is now *accepted* (locked) but **not settled** — the
|
||||
payer cannot spend it elsewhere and the owner hasn't claimed it.
|
||||
3. The gate **reserves a seat** (capacity check) and, iff a seat is available,
|
||||
tells the node to **settle** → owner is paid, joiner is admitted.
|
||||
4. If no seat / timeout / error → **cancel** the invoice → joiner is auto-refunded,
|
||||
nobody paid for a seat they didn't get.
|
||||
|
||||
This is the cleanest way to satisfy principle #4 (atomicity).
|
||||
|
||||
---
|
||||
|
||||
## 5. Pluggable `AdmissionGate` abstraction
|
||||
|
||||
Introduce one small interface so the payment rail is swappable and the
|
||||
"no gate" path is the existing behavior (zero risk to current deployments).
|
||||
|
||||
```python
|
||||
# cmd_chat/server/admission.py (illustrative)
|
||||
class AdmissionGate(Protocol):
|
||||
async def quote(self, user_id: str, username: str) -> Quote | None:
|
||||
"""Return a payable Quote (invoice/LNURL/address + opaque challenge),
|
||||
or None if no payment is required (free room)."""
|
||||
async def verify(self, user_id: str, proof: dict) -> bool:
|
||||
"""True iff `proof` settles `quote` for this user_id, single-use,
|
||||
unexpired, correct amount. Reserves+settles atomically for hold invoices."""
|
||||
|
||||
# backends:
|
||||
# NullGate -> always free (today's behavior; default)
|
||||
# LightningGate -> Tier 1, self-hosted LNbits/LND/CLN
|
||||
# VoucherGate -> Tier 2, verifies owner-signed Ed25519 admission vouchers
|
||||
# (future) MoneroGate, EvmGate
|
||||
```
|
||||
|
||||
Server wiring (`cmd_chat/server/factory.py`): `app.ctx.admission = build_gate(cfg)`.
|
||||
New `serve` flags (all optional; default = NullGate = unchanged):
|
||||
|
||||
```
|
||||
cmd_chat.py serve <ip> <port> --password <pw> \
|
||||
--pay lightning \
|
||||
--pay-amount-sats 500 \
|
||||
--pay-backend lnbits --pay-url https://lnbits.local --pay-key-env HH_LNBITS_INVOICE_KEY \
|
||||
--pay-network signet
|
||||
# or, trust-minimized:
|
||||
--pay voucher --admit-pubkey <ed25519-pub-b64>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Two implementation tiers
|
||||
|
||||
### Tier 1 — server-mediated Lightning gate (MVP, pragmatic)
|
||||
|
||||
The server talks to an **owner-operated** LN backend (LNbits invoice key, or
|
||||
LND/CLN gRPC with an invoice-only macaroon). Justified because in the common
|
||||
self-hosted case the **owner *is* the server operator**, so trusting the server
|
||||
to confirm payment is trusting yourself.
|
||||
|
||||
Flow:
|
||||
1. `POST /srp/init` unchanged, **plus** server returns `pay_required: true` and a
|
||||
`pay_challenge` (random nonce bound to `user_id`) when a gate is active.
|
||||
2. New `POST /pay/quote {user_id}` → server (after a *soft* capacity check) asks
|
||||
the node for a **hold invoice** committing `user_id` in its `description_hash`,
|
||||
returns `{bolt11, payment_hash, amount_sats, lnurl?, expires_at}`.
|
||||
3. Client shows the invoice (BOLT11 + QR + LNURL) in the TUI and waits.
|
||||
4. Joiner pays → HTLC accepted (held).
|
||||
5. `POST /srp/verify {user_id, M, payment_hash}` → server: SRP-verify →
|
||||
**authoritative capacity gate** → ask node "is this invoice ACCEPTED, amount ok,
|
||||
user_id committed, not yet used?" → if yes **settle** + mark used + add session
|
||||
+ issue `ws_token`; if no seat/timeout → **cancel** (refund) and return `402`.
|
||||
|
||||
HTTP semantics: `402 Payment Required` (+ a small JSON `{error, pay_required,
|
||||
pay_challenge}`) when payment is missing/invalid; `409` still means full.
|
||||
|
||||
### Tier 2 — owner-signed admission vouchers (trust-minimized, end state)
|
||||
|
||||
Mirrors the existing "owner broadcasts the ACL, server relays blindly" model and
|
||||
keeps the server **payment-blind** (no node creds, no chain access on the relay).
|
||||
|
||||
Components:
|
||||
- The owner runs a tiny **doorman** next to their node (could live in the Rust
|
||||
client or a sidecar): an LNURL-pay endpoint + voucher signer holding an
|
||||
**Ed25519** key. The server is started with only the **public** key
|
||||
(`--admit-pubkey`).
|
||||
- A joiner pays the owner directly (LNURL/BOLT11). On settlement the doorman signs
|
||||
an **admission voucher**:
|
||||
`voucher = Ed25519_sign( sk_owner, {user_id, room_id, amount, nonce, exp} )`.
|
||||
- Joiner submits the voucher at `POST /srp/verify {…, voucher}`.
|
||||
- Server verifies: signature against `admit_pubkey`, `user_id` matches, `exp`
|
||||
fresh, `nonce` unseen (single-use ledger), then admits. **The server verifies a
|
||||
signature, not a payment** — it never sees funds, invoices, or chain data.
|
||||
|
||||
Tradeoff: the owner/doorman must be reachable to *issue* vouchers; the relay no
|
||||
longer needs to be trusted with money. This is the philosophically correct
|
||||
end-state for a zero-knowledge relay; Tier 1 is the faster path to ship.
|
||||
|
||||
---
|
||||
|
||||
## 7. End-to-end sequence (Tier 1, hold-invoice)
|
||||
|
||||
```
|
||||
client relay (server) owner LN node
|
||||
| POST /srp/init {A, user} | |
|
||||
|-------------------------------->| soft cap/username check |
|
||||
|<-- {user_id,B,salt,room_salt, | |
|
||||
| pay_required, pay_challenge}| |
|
||||
| process_challenge | |
|
||||
| POST /pay/quote {user_id} | |
|
||||
|-------------------------------->| addHoldInvoice(desc_hash= |
|
||||
| | H(user_id|nonce|room)) ----->|
|
||||
|<-- {bolt11, payment_hash, amt, |<------- invoice ---------------|
|
||||
| lnurl, expires_at} | |
|
||||
| [TUI shows invoice/QR; pay] ------------------- pay ----------->| (HTLC accepted/held)
|
||||
| POST /srp/verify {user_id,M, | |
|
||||
| payment_hash} | |
|
||||
|-------------------------------->| SRP verify (:87) |
|
||||
| | AUTHORITATIVE cap gate (:84) |
|
||||
| | lookupInvoice == ACCEPTED? --->|
|
||||
| | amount ok? user_id committed? |
|
||||
| | unused? -> settle ----------->| (owner paid)
|
||||
| | session_store.add (:97) |
|
||||
|<-- {H_AMK, ws_token} | |
|
||||
| check H_AMK (MITM guard) | |
|
||||
| ws /ws/chat?user_id&ws_token | |
|
||||
|================ joined =========| |
|
||||
(on any failure between verify+settle: node.cancelInvoice -> refund, return 402)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Anti-replay, binding, atomicity (the security-critical bits)
|
||||
|
||||
- **Per-join invoice.** A fresh invoice/nonce per join attempt; never a static
|
||||
address. Static addresses enable replay and can't bind identity.
|
||||
- **Identity binding.** Commit `user_id` (and room id) into the invoice
|
||||
`description_hash` (BOLT11) or the voucher payload, so a proof minted for one
|
||||
joiner can't admit another — defeats proof theft by a malicious relay/peer.
|
||||
- **Single-use ledger.** In-RAM set of consumed `payment_hash`/`nonce` for the
|
||||
server's lifetime; reject reuse. (Matches the project's ephemeral, RAM-only
|
||||
ethos — no DB needed.)
|
||||
- **Expiry.** Short invoice/voucher TTL (e.g. 5 min) tied to the SRP
|
||||
`pay_challenge`; expired ⇒ re-quote.
|
||||
- **Atomic seat.** Capacity is reserved at the authoritative gate
|
||||
(`views.py:84-85`) *and* the hold invoice is only settled after the seat is
|
||||
secured; otherwise cancel→refund. Consider a short-lived "seat hold" so two
|
||||
payers don't race the last seat (reserve before settle, release on failure).
|
||||
- **Amount check.** Enforce `amount >= price`; reject underpayment; for overpay,
|
||||
settle and (optionally) note credit — never silently keep extra without policy.
|
||||
|
||||
---
|
||||
|
||||
## 9. Client UX (Rust TUI + Python)
|
||||
|
||||
- **Connect command** gains optional payment handling. When `/srp/init` returns
|
||||
`pay_required`, the client:
|
||||
- fetches a quote, renders a **payment panel**: amount (sats), a copyable BOLT11,
|
||||
an `lnurl:`/`lightning:` URI, and a **QR code** (ratatui can draw a QR via a
|
||||
unicode/half-block widget; Python client prints an ASCII QR).
|
||||
- shows live status: `waiting for payment → received (held) → admitted`.
|
||||
- then proceeds to `/srp/verify` automatically once paid.
|
||||
- **CLI flags** for non-interactive/automation: `--pay-bolt11-out <file>` (dump the
|
||||
invoice) or `--voucher <file>` (Tier 2, present a pre-obtained voucher).
|
||||
- **Help menu:** add to the existing clustered help (`hh/src/ui.rs`
|
||||
`help_clusters`) — a short note under a new `ACCESS` or extended `KEYS`/intro
|
||||
cluster: *"rooms may require a Lightning payment to join; you'll be shown an
|
||||
invoice after the password check."*
|
||||
- **Failure copy:** `402` ⇒ "this house requires payment to enter"; expired ⇒
|
||||
"invoice expired — press R to refresh"; full-after-pay ⇒ "house filled while
|
||||
paying — you were refunded."
|
||||
|
||||
---
|
||||
|
||||
## 10. State, config, key management
|
||||
|
||||
- **New server ctx:** `admission` gate, `paid_nonces` (single-use set),
|
||||
`seat_holds` (transient reservations). All **in-RAM**, cleared on restart —
|
||||
consistent with the existing ephemeral stores (`cmd_chat/server/stores.py`).
|
||||
- **Secrets via env only** (never in repo, never in `serve` argv where it'd hit
|
||||
`ps`): `HH_LNBITS_INVOICE_KEY`, `HH_LND_MACAROON_PATH` (invoice-only macaroon),
|
||||
`HH_LND_TLS_CERT`, etc. Pass *names* on the CLI (`--pay-key-env`), read values
|
||||
from the environment.
|
||||
- **Least privilege:** LNbits **invoice/read key** (not admin); LND macaroon baked
|
||||
to `invoices:write/read` + `invoices:settle`/`cancel` only — **no `onchain`/
|
||||
`offchain` spend** permissions.
|
||||
- **Network guard:** default `--pay-network signet|regtest|testnet`; require an
|
||||
explicit `--pay-network mainnet --i-understand` to use real funds.
|
||||
|
||||
---
|
||||
|
||||
## 11. Threat model
|
||||
|
||||
| Threat | Mitigation |
|
||||
|--------|-----------|
|
||||
| Replay a proof to join repeatedly / share with friends | per-join invoice + single-use nonce ledger + `user_id` binding in `description_hash`/voucher |
|
||||
| Malicious relay steals the preimage to join itself | identity-bound invoice (preimage only admits the committed `user_id`); Tier 2 removes relay from the money path entirely |
|
||||
| Pay but no seat (race / full) | hold invoice + seat reservation; cancel→auto-refund on failure; clear UX |
|
||||
| Payment-backend down → fail-open | gate denies entry on backend error (fail-closed); never silently admits |
|
||||
| Invoice-spam DoS / griefing | reuse existing per-IP rate limit (`helpers.py`) on `/pay/quote`; cap concurrent unpaid holds per IP; short TTLs |
|
||||
| Front-running the last seat | reserve seat *before* settle; release on abort |
|
||||
| Fiat-oracle manipulation | price natively in sats; no oracle in the admission path |
|
||||
| Key leakage | invoice/read-only creds, env-only, no spend keys; mainnet behind explicit flag |
|
||||
| MITM on the HTTP leg | unchanged SRP `H_AMK` mutual-auth guard (`net.rs:87-90`); run behind TLS in prod (today's `--no-tls` is dev-only) |
|
||||
| Privacy deanonymization | Lightning over on-chain; document Monero option; never log payer metadata beyond the ephemeral nonce |
|
||||
| Regulatory/custody risk | strictly non-custodial; funds never touch app-controlled spend keys |
|
||||
|
||||
---
|
||||
|
||||
## 12. Privacy & legal notes (design guidance, not legal advice)
|
||||
|
||||
- **Privacy:** Lightning leaks far less than on-chain; the relay should log only
|
||||
the ephemeral nonce/payment_hash, never amounts tied to usernames/IPs longer
|
||||
than the session. Offer Monero for the strongest payer privacy.
|
||||
- **Compliance:** staying **non-custodial** (funds go owner→owner, app never
|
||||
holds spend keys) keeps this closest to "the owner accepts tips/entry fees,"
|
||||
but money-transmission / KYC-AML obligations vary by jurisdiction and volume.
|
||||
Flag this to the operator; do not build custody or fiat on/off-ramps into the
|
||||
app. This document is engineering guidance, not legal advice.
|
||||
|
||||
---
|
||||
|
||||
## 13. Testing strategy
|
||||
|
||||
- **Unit:** `NullGate` (free), `VoucherGate` signature + expiry + replay vectors
|
||||
(golden Ed25519 cases, mirroring the existing offline SRP vectors in
|
||||
`test_srp.py` / Rust `Selftest`).
|
||||
- **Integration (regtest):** spin LND/CLN in `regtest` (Polar or docker), drive a
|
||||
full pay-to-join: quote → pay → settle → join; and the negative paths
|
||||
(underpay, expire, full-after-pay→cancel/refund, backend-down→deny).
|
||||
- **Interop:** extend the Rust live `Handshake` self-test to optionally carry a
|
||||
voucher; ensure Rust and Python clients produce identical proof framing.
|
||||
- **Headless demo:** a `demo-pay-to-join.sh` (sibling of `demo-save-load.sh`)
|
||||
using a regtest node to film the beat: password → invoice in the TUI → pay →
|
||||
admitted.
|
||||
|
||||
---
|
||||
|
||||
## 14. Phased rollout
|
||||
|
||||
1. **Phase 0 — interface only.** Add `AdmissionGate` + `NullGate`, wire
|
||||
`app.ctx.admission`, thread an optional `proof` field through `/srp/verify`.
|
||||
Default behavior identical to today. (Lowest risk; everything else builds on it.)
|
||||
2. **Phase 1 — VoucherGate (Tier 2 verify side).** Ed25519 voucher verification on
|
||||
the server (`--admit-pubkey`), single-use ledger, `402` path + client flag to
|
||||
present a voucher. Server stays payment-blind. Testable with a CLI signer.
|
||||
3. **Phase 2 — LightningGate (Tier 1) + `/pay/quote`.** Hold-invoice issuance via
|
||||
LNbits/LND, settle/cancel atomicity, TUI payment panel + QR. regtest e2e.
|
||||
4. **Phase 3 — owner doorman (Tier 2 issue side)** + LNURL, so payment is fully
|
||||
owner-authoritative and the relay never touches money. Optional Monero backend.
|
||||
|
||||
---
|
||||
|
||||
## 15. File-change map (when we implement)
|
||||
|
||||
| Area | File(s) | Change |
|
||||
|------|---------|--------|
|
||||
| Gate interface + backends | `cmd_chat/server/admission.py` (new) | `AdmissionGate`, `NullGate`, `VoucherGate`, `LightningGate` |
|
||||
| Server wiring + flags | `cmd_chat/server/factory.py`, `cmd_chat.py` | build gate from `--pay*` flags; ctx state |
|
||||
| Init advertises gate | `cmd_chat/server/views.py:30-63` | add `pay_required`, `pay_challenge` to `/srp/init` |
|
||||
| New quote endpoint | `cmd_chat/server/routes.py`, `views.py` | `POST /pay/quote` (Tier 1) |
|
||||
| Verify enforces payment | `cmd_chat/server/views.py:66-111` | accept `proof`/`payment_hash`; gate between `:87` and `:97`; `402` path; settle/cancel |
|
||||
| Single-use + seat state | `cmd_chat/server/stores.py` | `paid_nonces`, `seat_holds` (in-RAM) |
|
||||
| Python client UX | `cmd_chat/client/client.py` | quote fetch, invoice display/QR, wait-for-pay, send proof |
|
||||
| Rust client UX | `hh/src/net.rs:44-104`, `hh/src/app.rs`, `hh/src/ui.rs` | quote step, payment panel + QR, proof in verify, help entry |
|
||||
| Owner doorman (Tier 2) | new sidecar or `hh/` subcommand | LNURL-pay + Ed25519 voucher signer (holds keys) |
|
||||
| Tests | `cmd_chat/tests/`, Rust `Cmd::*` | gate unit + regtest integration + interop |
|
||||
|
||||
---
|
||||
|
||||
## 16. Open decisions (need owner input)
|
||||
|
||||
1. **Trust posture:** ship Tier 1 (server-mediated, simplest) first, or hold out
|
||||
for Tier 2 (relay stays payment-blind)? Recommendation: Phase 0→1 (voucher
|
||||
verify) gets us trust-minimized verification quickly; add Lightning issuance
|
||||
(Phase 2) for UX.
|
||||
2. **Rail:** Lightning only at first? (Recommended.) Monero as a fast-follow for
|
||||
privacy?
|
||||
3. **Pricing:** flat sats per join? per-room configurable? time-boxed
|
||||
(pay-per-hour) vs one-shot entry?
|
||||
4. **Backend:** LNbits (fastest to integrate, semi-custodial unless self-hosted)
|
||||
vs direct LND/CLN (more setup, fully self-custodial)?
|
||||
5. **What payment buys:** plain entry, or also a role (e.g. auto-`/grant` drive)?
|
||||
Note: roles are owner-broadcast ACL today, so coupling payment→role belongs in
|
||||
the owner's client, not the relay.
|
||||
88
docs/demo-save-load-poc.md
Normal file
88
docs/demo-save-load-poc.md
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
# PoC: persistent sandbox — fast-qwen build → save image → close → reload
|
||||
|
||||
**Goal of the video beat:** prove that a hack-house Docker sandbox is *durable
|
||||
on demand*. A local, CPU-only **fast qwen coder** writes & runs code inside an
|
||||
ephemeral Docker sandbox; we snapshot it to an image with `/sbx save`; we **fully
|
||||
close the session** (container is purged on teardown); we relaunch the client and
|
||||
`/sbx load` the snapshot — the code the model wrote is **still there**.
|
||||
|
||||
This is the headline pitch: *sandboxes are RAM-only/ephemeral by default, but you
|
||||
can freeze a moment of work into an image and thaw it later — nothing leaks to the
|
||||
server, the image lives only on the owner's box.*
|
||||
|
||||
## Why this is non-obvious / worth showing
|
||||
|
||||
- `/sbx stop` and client-quit both run `sbx::teardown` → `docker rm -f hack-house`.
|
||||
The container is **gone**. Normally the work would be gone too.
|
||||
- `/sbx save <label>` runs `docker commit hack-house hh-snap:<label>` *while the
|
||||
container is alive*. The image is independent of the container, so it survives
|
||||
the purge.
|
||||
- `/sbx load <label>` runs a **fresh** container from `hh-snap:<label>` — same
|
||||
filesystem state, new ephemeral instance.
|
||||
|
||||
## Models (CPU-only box: i5-8350U, no GPU)
|
||||
|
||||
| Path | Model | Why |
|
||||
|------|-------|-----|
|
||||
| chat (`/ai <q>`) | `qwen2.5:3b` | general, the locally-pulled default |
|
||||
| sandbox `!task` | `qwen2.5-coder:1.5b` | auto-selected coder; fast TTFT on CPU, better shell/code |
|
||||
|
||||
The agent auto-selects the coder build for the `!task` (sandbox-driving) path when
|
||||
the chat provider is Ollama and a `qwen2.5-coder` is present (it is — pulled).
|
||||
|
||||
## Storyboard (the cut)
|
||||
|
||||
1. **Title card** — "Ephemeral by default. Persistent on demand."
|
||||
2. **Summon** — alice: `/sbx launch docker` → "summoned" sandbox bubble.
|
||||
3. **Spawn the coder** — alice: `/ai start` → `oracle online — ollama/qwen2.5:3b`
|
||||
(the coder model rides along for `!task`).
|
||||
4. **Build, by the fast model** — alice:
|
||||
`/ai oracle !write /root/fib.py that prints the first 10 Fibonacci numbers, then run it`
|
||||
→ agent drives the shared shell; `fib.py` is written and executed; the
|
||||
sandbox pane shows the Fibonacci output.
|
||||
5. **Freeze it** — alice: `/sbx save buildbox` →
|
||||
`⛧ saved sandbox → image hh-snap:buildbox · reload with /sbx load buildbox`.
|
||||
6. **Walk away** — alice: `/sbx stop` (or quits the client entirely). Container is
|
||||
purged; prove it: `docker ps -a` shows no `hack-house`, but
|
||||
`docker images hh-snap` still lists `buildbox`.
|
||||
7. **Come back** — a *fresh* client session; alice: `/sbx load buildbox`.
|
||||
8. **The reveal** — F2 to drive, `cat /root/fib.py && python3 /root/fib.py` →
|
||||
the model's code and output are exactly as left. **Persistence proven.**
|
||||
9. **Result card** — "OPERATIONS CONDUCTED": built by local qwen-coder · saved to
|
||||
image · session closed · reloaded intact.
|
||||
|
||||
## Acceptance (what the PoC script asserts)
|
||||
|
||||
- After step 4: `docker exec hack-house cat /root/fib.py` is non-empty AND running
|
||||
it prints 10 Fibonacci numbers (`0 1 1 2 3 5 8 13 21 34`).
|
||||
- After step 5: `docker images hh-snap --format '{{.Tag}}'` contains `buildbox`.
|
||||
- After step 6 (stop): `docker ps -a --format '{{.Names}}'` has **no** `hack-house`;
|
||||
the `hh-snap:buildbox` image still exists.
|
||||
- After step 7-8 (load): the **new** `hack-house` container's `/root/fib.py`
|
||||
matches the original byte-for-byte.
|
||||
|
||||
## Execution
|
||||
|
||||
`hh/demo-save-load.sh` drives the whole thing headlessly over tmux (per the
|
||||
TUI-tmux test recipe): boots the server, runs client **session A**, injects the
|
||||
beats with `send-keys`, verifies via `capture-pane` + `docker exec`, then quits
|
||||
session A and opens client **session B** to load and confirm. It is a PoC /
|
||||
correctness harness first; once green it feeds the polished `video-toolkit`
|
||||
render.
|
||||
|
||||
### Gotchas baked into the script
|
||||
|
||||
- TUI doesn't bind Ctrl-U (it inserts a literal `u`); clear input with `BSpace`.
|
||||
Send text with `send-keys -l "<text>"` then a separate `Enter`; don't race renders.
|
||||
- Agent name is hardcoded `oracle`; only one `/ai start` per room.
|
||||
- Keep `!task` phrasing single-line; the agent's drive output lands in the sandbox
|
||||
pane, not chat.
|
||||
- `/sbx load` refuses if a sandbox is already running — stop first.
|
||||
- Docker daemon must be up (`docker info`); `/sbx launch docker --start` can boot
|
||||
it (sudo) but we pre-check instead.
|
||||
- Snapshot label charset: alphanumerics, `.`, `_`, `-` (≤64).
|
||||
|
||||
### Teardown / cleanup
|
||||
|
||||
The script removes the `hack-house` container and (optionally) the `hh-snap:*`
|
||||
demo images it created, and kills the server + tmux sessions, so reruns are clean.
|
||||
280
docs/spec-virtualbox-sandbox.md
Normal file
280
docs/spec-virtualbox-sandbox.md
Normal file
|
|
@ -0,0 +1,280 @@
|
|||
# hack-house → VirtualBox Sandbox Backend — Spec
|
||||
|
||||
> **Status:** Draft v1 · **Date:** 2026-06-03
|
||||
> **Scope:** Add VirtualBox as a sandbox backend, in two complementary modes:
|
||||
> **(A)** a headless, owner-hosted VM driven through the existing shared PTY
|
||||
> (drops into the current `Backend` abstraction), and **(B)** a *portable VM
|
||||
> appliance* the room can hand out so each member boots the **actual GUI locally**
|
||||
> on their own machine — including detecting and (with consent) installing
|
||||
> VirtualBox if it's missing.
|
||||
> **Baseline reviewed:** `hh/src/sbx.rs`, `hh/src/app.rs` @ `feat/ai-context`.
|
||||
|
||||
---
|
||||
|
||||
## 0. Decisions to lock
|
||||
|
||||
| # | Decision | Proposal |
|
||||
|---|----------|----------|
|
||||
| A | VirtualBox transport into the guest | **SSH** (NAT port-forward) as primary; `VBoxManage guestcontrol` as a no-SSH fallback. SSH gives a clean PTY and reuses the multipass provisioning model verbatim. |
|
||||
| B | Single shared instance vs. per-user local copies | **Both, as two modes.** Mode A = one owner-hosted headless VM, shared PTY (zero-knowledge preserved). Mode B = export the VM as an `.ova`, distribute over the *existing* `/send` channel, each member imports + launches the GUI locally. |
|
||||
| C | GUI sharing | **No live framebuffer relay.** Sharing the *desktop* = sharing the *appliance*, not the pixels. Sidesteps the zero-knowledge problem entirely (the image rides the encrypted file transfer). |
|
||||
| D | Installation | **Detect-first, then opt-in install.** `ensure-vbox.sh` mirrors `ensure-docker.sh`: never installs silently; prints what it would do and requires an explicit `--yes` (or the `/sbx ... --install` flag). |
|
||||
|
||||
---
|
||||
|
||||
## 1. Why VirtualBox, and what's genuinely new
|
||||
|
||||
The existing backends (`Backend::{Local,Docker,Multipass}` in `hh/src/sbx.rs:51`)
|
||||
are all **headless and text-only**. The owner hosts the box and runs a local PTY
|
||||
into it (`command_for`, `sbx.rs:278`); the PTY bytes are encrypted with the room
|
||||
key and relayed as `_sbx` frames, so the server only ever sees ciphertext.
|
||||
|
||||
VirtualBox adds two things the others can't:
|
||||
|
||||
1. **Arbitrary guest OSes** — Windows, BSD, old kernels, purpose-built
|
||||
malware-analysis or CTF images — with a mature snapshot tree.
|
||||
2. **A real graphical desktop.** This is the part that doesn't fit the PTY relay,
|
||||
and it's the part you explicitly want: *people share a VM and each launches it
|
||||
locally with the GUI.*
|
||||
|
||||
So the integration is deliberately split so each mode keeps the project's trust
|
||||
model intact:
|
||||
|
||||
- **Mode A (shared shell):** one VM, owner-hosted, driven collaboratively through
|
||||
the shared PTY — identical trust story to multipass.
|
||||
- **Mode B (shared appliance):** the VM *image* is the shared artifact. It travels
|
||||
over the existing E2E `/send` transfer; each member runs their **own local copy**
|
||||
in the VirtualBox GUI. No pixels cross the wire — only the (encrypted) disk image.
|
||||
|
||||
---
|
||||
|
||||
## 2. Mode A — headless VirtualBox as a 4th backend
|
||||
|
||||
### 2.1 Enum + labels (`hh/src/sbx.rs`)
|
||||
|
||||
```rust
|
||||
pub enum Backend { Local, Docker, Multipass, VirtualBox } // new variant
|
||||
```
|
||||
|
||||
- `Backend::parse`: add `"virtualbox" | "vbox" => Some(Backend::VirtualBox)`.
|
||||
- `label()`: `"virtualbox"`.
|
||||
- `default_image()`: a named base appliance, e.g. `"hh-base"` (an Ubuntu image we
|
||||
pre-register), since VirtualBox has no "pull by release string" like multipass.
|
||||
|
||||
### 2.2 Mapping every existing fn to `VBoxManage`
|
||||
|
||||
Each function in `sbx.rs` gets one new match arm. The transport (how a command
|
||||
reaches *inside* the guest) is SSH over a NAT port-forward.
|
||||
|
||||
| Fn (`sbx.rs`) | Multipass today | VirtualBox arm |
|
||||
|---|---|---|
|
||||
| `prepare` (`:86`) | `multipass launch` | import appliance if absent (`VBoxManage import <ova> --vsys 0 --vmname <name>`), set forward (`modifyvm <name> --natpf1 "ssh,tcp,127.0.0.1,<port>,,22"`), then `startvm <name> --type headless`; if it already exists just `startvm`. Idempotent like the multipass arm. |
|
||||
| `command_for` (`:278`) | `multipass exec … bash` | `ssh -tt -p <port> -o StrictHostKeyChecking=no <run_user>@127.0.0.1` (login shell). `run_user` empty ⇒ default account. |
|
||||
| `provision` (`:355`) | `useradd` via `mp()` | identical `useradd`/sudoers scripts, run through an SSH helper `vbx()` (mirrors `mp()`/`dk()` at `sbx.rs:319`). Owner gets passwordless sudo via the same `mp_grant_sudo` script. |
|
||||
| `set_sudo` (`:387`) | sudoers drop-in | same script over SSH; gate on `backend == VirtualBox`. |
|
||||
| `save_state` (`:208`) | `multipass snapshot` | `VBoxManage snapshot <name> take <label> --pause` |
|
||||
| `list_snapshots` (`:241`) | `multipass list --snapshots` | `VBoxManage snapshot <name> list --machinereadable`, parse `SnapshotName*=` lines |
|
||||
| `teardown` (`:172`) | `multipass delete --purge` | `VBoxManage controlvm <name> poweroff` then `unregistervm <name> --delete` |
|
||||
|
||||
**Why SSH over `guestcontrol`:** `VBoxManage guestcontrol <vm> run --exe /bin/bash`
|
||||
requires Guest Additions in the image and gives a rough PTY; interactive driving
|
||||
is clunky. SSH needs only an sshd in the base image (cheap to bake once) and the
|
||||
whole P4 permission stack (`/grant`, `/sudo`, drive ACL) works **unchanged**
|
||||
because it's all "run a command through a transport." `guestcontrol` stays as a
|
||||
documented fallback for images without sshd.
|
||||
|
||||
### 2.3 Port allocation
|
||||
|
||||
Each headless VM needs a unique host loopback port for its SSH forward. Reuse the
|
||||
free-port discovery already used by the save/load PoC (see
|
||||
`docs/demo-save-load-poc.md` / `hh/demo-save-load.sh`) so two sandboxes on one
|
||||
host don't collide. The owner is the only one who ever connects to it
|
||||
(`127.0.0.1:<port>`), so it never leaves the host.
|
||||
|
||||
### 2.4 Snapshots tie into existing `/sbx save`/`load`
|
||||
|
||||
`/sbx save`/`load`/`snaps` (`app.rs:1244`–`1307`) already branch on backend.
|
||||
VirtualBox snapshots map cleanly onto the same commands, so the existing UX
|
||||
("save state → quit → load") works for VBox with no new commands — just the new
|
||||
match arms in `save_state`/`list_snapshots`, plus a VirtualBox arm in the `load`
|
||||
path (today `load` hardcodes Docker at `app.rs:1282`; generalize it to the broker's
|
||||
backend).
|
||||
|
||||
---
|
||||
|
||||
## 3. Mode B — share a VM, launch the GUI locally
|
||||
|
||||
This is the new product surface. The shared artifact is the **appliance**, not a
|
||||
live session. Flow:
|
||||
|
||||
```
|
||||
owner: /sbx export [name] → freezes the VM to an .ova on the owner's disk
|
||||
owner: /send hh-box.ova → existing E2E file transfer (chunked, SHA-256)
|
||||
member: /accept → lands in ./downloads/ (existing path)
|
||||
member: /sbx open ./downloads/hh-box.ova
|
||||
→ ensure VirtualBox is installed (detect; offer install)
|
||||
→ VBoxManage import … --vmname hh-box-<member>
|
||||
→ VBoxManage startvm hh-box-<member> --type gui ← real desktop window
|
||||
```
|
||||
|
||||
Everyone ends up running an **identical local VM** — same disk, same tools, same
|
||||
state at export time — but each on their own machine, with a full GUI. Because the
|
||||
image moved over the encrypted `/send` channel, the server never saw it, and there
|
||||
is no live cross-machine display traffic to secure.
|
||||
|
||||
### 3.1 New commands
|
||||
|
||||
| Command | Who | Action |
|
||||
|---|---|---|
|
||||
| `/sbx export [name]` | owner of a VBox sandbox | `VBoxManage export <name> -o <out>.ova` (VM should be powered off or snapshot-exported). Emits the path and hints `/send` it. |
|
||||
| `/sbx open <file.ova> [--install]` | any member | Detect VirtualBox → (consent) install if missing → `import` under a per-member VM name → `startvm --type gui`. |
|
||||
| `/sbx gui [name]` | any member | Launch the GUI for an already-imported VM (`startvm --type gui`), or attach a running headless one (`VBoxManage startvm <name> --type separate`). |
|
||||
|
||||
`/sbx open` and `/sbx export` are deliberately **local-only** operations (like
|
||||
`/pw`): they never broadcast. The only thing that crosses the room is the `.ova`
|
||||
you choose to `/send`.
|
||||
|
||||
### 3.2 Relationship between the two modes
|
||||
|
||||
They compose: the owner can run a **Mode A** headless VM, `/sbx save` a snapshot,
|
||||
`/sbx export` it to an `.ova`, and `/send` it — at which point each member can
|
||||
`/sbx open` it and keep working **locally in the GUI** from the exact same state.
|
||||
"Collaborate live in one shared shell" and "everyone take a copy home and run the
|
||||
desktop" become two ends of one workflow.
|
||||
|
||||
---
|
||||
|
||||
## 4. Installation handling — `ensure-vbox.sh` (detect first)
|
||||
|
||||
Mirror `hh/ensure-docker.sh` exactly in spirit: **a backend never installs
|
||||
anything silently.**
|
||||
|
||||
### 4.1 Detection (always first, zero side effects)
|
||||
|
||||
```rust
|
||||
pub fn vbox_installed() -> bool { // sbx.rs, beside docker_daemon_up()
|
||||
Command::new("VBoxManage").arg("--version")
|
||||
.stdout(Stdio::null()).stderr(Stdio::null())
|
||||
.status().map(|s| s.success()).unwrap_or(false)
|
||||
}
|
||||
```
|
||||
|
||||
If present, every Mode A/B path proceeds normally. If absent, the command **fails
|
||||
loud with the remedy**, exactly like the Docker daemon message at `app.rs:1206`:
|
||||
|
||||
> `VirtualBox isn't installed — retry with /sbx open <file> --install to install it (needs sudo), or run ./ensure-vbox.sh in a terminal first`
|
||||
|
||||
### 4.2 The installer script
|
||||
|
||||
`hh/ensure-vbox.sh`, invoked as `bash ensure-vbox.sh --yes` only when the user
|
||||
passed `--install` (matching how `prepare` shells `ensure-docker.sh --yes` at
|
||||
`sbx.rs:31`). It:
|
||||
|
||||
1. Re-checks `VBoxManage --version`; if found, exits 0 immediately (idempotent).
|
||||
2. Detects the platform and prints the **exact** command it will run *before*
|
||||
running it:
|
||||
- **Debian/Ubuntu:** `sudo apt-get install -y virtualbox` (or add Oracle's repo
|
||||
for a current build).
|
||||
- **Fedora:** `sudo dnf install -y VirtualBox`.
|
||||
- **Arch:** `sudo pacman -S --noconfirm virtualbox`.
|
||||
- **macOS:** `brew install --cask virtualbox` (note: needs the kernel-extension
|
||||
approval in System Settings; the script surfaces that as a manual step).
|
||||
- **Windows / unknown:** do **not** attempt; point at the download page and the
|
||||
`winget install Oracle.VirtualBox` one-liner.
|
||||
3. On any failure, surfaces the last stderr line through the returned error (same
|
||||
pattern as `start_docker_daemon` at `sbx.rs:31`) so it lands in the TUI error
|
||||
popup, never bleeding raw onto the surface.
|
||||
|
||||
> **Honesty note for the spec:** VirtualBox needs a host kernel module
|
||||
> (`vboxdrv`) and, on Secure-Boot machines, a signed/enrolled MOK. The script
|
||||
> detects Secure Boot (`mokutil --sb-state`) and, rather than fail opaquely,
|
||||
> tells the user the one manual step required. We check; we don't pretend it's
|
||||
> always one command.
|
||||
|
||||
### 4.3 Consent UX
|
||||
|
||||
No surprise installs, no surprise sudo. The flow is: try → detect missing → tell
|
||||
the user the remedy and the exact command → they re-issue with `--install`. This
|
||||
matches the project's existing posture (`/sbx launch docker --start` is opt-in
|
||||
daemon-start, not automatic).
|
||||
|
||||
---
|
||||
|
||||
## 5. Command surface (additions)
|
||||
|
||||
Extend the `/sbx` usage line (`app.rs:1309`):
|
||||
|
||||
```
|
||||
/sbx launch [local|docker|multipass|virtualbox] [image]
|
||||
/sbx stop | save [label] | load <label> | snaps
|
||||
/sbx export [name] # freeze host VM → .ova (then /send it)
|
||||
/sbx open <file.ova> [--install]# import + launch the GUI locally
|
||||
/sbx gui [name] [--install] # launch GUI for an imported VM
|
||||
```
|
||||
|
||||
| Command | Broadcasts? | Notes |
|
||||
|---|---|---|
|
||||
| `launch virtualbox` | yes (`_sbx status`) | Mode A; owner-hosted headless, shared PTY |
|
||||
| `export` | no | local; produces an artifact to `/send` |
|
||||
| `open` / `gui` | no | local; each member's own GUI window |
|
||||
|
||||
---
|
||||
|
||||
## 6. Code touchpoints (what actually changes)
|
||||
|
||||
Mode A is small — a new enum variant and ~7 match arms; the broker, drive ACL,
|
||||
sudo delegation, and save/load machinery are all backend-agnostic already.
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `hh/src/sbx.rs` | `Backend::VirtualBox` variant; arms in `parse`/`label`/`default_image`/`prepare`/`command_for`/`provision`/`set_sudo`/`save_state`/`list_snapshots`/`teardown`; `vbox_installed()`, `vbx()` SSH helper; `export_ova()` + `open_local()` (Mode B, local-only). |
|
||||
| `hh/src/app.rs` | accept `virtualbox`/`vbox` in `/sbx launch`; generalize `/sbx load` off hardcoded Docker (`:1282`) to the broker backend; new `/sbx export`, `/sbx open`, `/sbx gui` arms; extend usage string (`:1309`); install-missing error mirroring `:1206`. |
|
||||
| `hh/ensure-vbox.sh` (new) | detect-first installer, per `§4`. |
|
||||
| `hh/src/ui.rs` | add the three new commands to the clustered help menu. |
|
||||
| `README.MD` | backend table (`:175`) gains a `virtualbox` row; a short "share a VM, run it locally" subsection. |
|
||||
| `models.toml` / docs | none. |
|
||||
|
||||
---
|
||||
|
||||
## 7. Security & trust notes
|
||||
|
||||
- **Mode A preserves zero-knowledge**: the VM is owner-local, the SSH forward is
|
||||
`127.0.0.1`-only, and only PTY ciphertext crosses the room — same as multipass.
|
||||
- **Mode B preserves zero-knowledge** by *not* streaming a display at all. The
|
||||
`.ova` is just a file through the existing SHA-256-verified, Fernet-encrypted
|
||||
`/send` path (`hh/src/ft.rs`). Note the existing **50 MB** transfer cap (README
|
||||
`:215`) — real VM images blow past it, so Mode B needs either a raised cap for
|
||||
appliances or an out-of-band hand-off (documented honestly; see Open Questions).
|
||||
- **No silent install, no silent sudo** (`§4.3`).
|
||||
- **Appliance provenance**: a shared `.ova` is executable content. The spec should
|
||||
warn recipients exactly as `/accept` already implies trust in the sender — an
|
||||
imported VM runs code. Worth an explicit one-line caution in the `open` flow.
|
||||
|
||||
---
|
||||
|
||||
## 8. Phasing
|
||||
|
||||
| Phase | Deliverable | Gate |
|
||||
|---|---|---|
|
||||
| **V0** | `vbox_installed()` + `ensure-vbox.sh` (detect-first, Linux apt path) | manual: missing → guided install → present |
|
||||
| **V1** | `Backend::VirtualBox` Mode A: launch/stop/PTY over SSH, provision, sudo | shared shell into a headless VBox VM, two clients driving |
|
||||
| **V2** | snapshots: `save`/`load`/`snaps` arms; generalize `/sbx load` backend | save → stop → load round-trips on VBox |
|
||||
| **V3** | Mode B: `/sbx export` → `/send` → `/sbx open` GUI launch | a second machine boots the shared appliance's desktop |
|
||||
| **V4** | macOS/other-distro install paths; transfer-cap handling for `.ova` | cross-platform `open` works end-to-end |
|
||||
|
||||
---
|
||||
|
||||
## 9. Open questions
|
||||
|
||||
1. **Appliance size vs. the 50 MB `/send` cap.** Options: raise the cap for `.ova`
|
||||
only, compress (`.ova` + zstd), ship a thin base + a provisioning script instead
|
||||
of a fat image, or accept out-of-band transfer for large VMs and keep `/send`
|
||||
for small ones. Needs a product call.
|
||||
2. **Base image sourcing.** Do we bake and ship an `hh-base.ova` (sshd + sudo
|
||||
preinstalled) so Mode A "just works," or import whatever the user points at and
|
||||
require them to have sshd? Baking one image once is the smoother UX.
|
||||
3. **Per-member VM naming / cleanup** for Mode B locals — namespacing
|
||||
(`hh-box-<member>`) and a `/sbx open --replace` to re-import cleanly.
|
||||
4. **`guestcontrol` fallback** — ship it in V1 or document-only until someone needs
|
||||
a no-sshd image?
|
||||
```
|
||||
240
hh/demo-save-load.sh
Executable file
240
hh/demo-save-load.sh
Executable file
|
|
@ -0,0 +1,240 @@
|
|||
#!/usr/bin/env bash
|
||||
# demo-save-load.sh — PoC harness for the "persistent sandbox" video beat.
|
||||
#
|
||||
# Flow (see docs/demo-save-load-poc.md):
|
||||
# session A: /sbx launch docker → /ai start → /grant oracle →
|
||||
# /ai oracle !build fib.py & run it → /sbx save buildbox → quit
|
||||
# prove: container purged on quit, but hh-snap:buildbox image survives
|
||||
# session B: fresh client → /sbx load buildbox → the model's code is intact
|
||||
#
|
||||
# Headless: drives the ratatui client over tmux send-keys, asserts via
|
||||
# capture-pane + `docker exec`. PoC/correctness first; feeds video-toolkit later.
|
||||
#
|
||||
# Usage: hh/demo-save-load.sh [--keep]
|
||||
# --keep leave the server, container, image and tmux sessions up afterwards
|
||||
set -uo pipefail
|
||||
|
||||
# ---- config -----------------------------------------------------------------
|
||||
REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
# Pick a free TCP port so we never collide with a stale server from another
|
||||
# session (a leftover server on a fixed port answers SRP with its own password
|
||||
# → spurious 401s). Honour an explicit $PORT if the caller forces one.
|
||||
pick_port() { local p; for p in $(seq 4200 4280); do ss -ltn 2>/dev/null | grep -q ":$p " || { echo "$p"; return; }; done; echo 4173; }
|
||||
PORT="${PORT:-$(pick_port)}"
|
||||
PW="${PW:-malware-bless}"
|
||||
LABEL="${LABEL:-buildbox}"
|
||||
IMG="${IMG:-python:3.12-slim}" # base image: ships python3 so the built code runs
|
||||
CTR="hack-house" # sbx::SBX_NAME — the container/instance name
|
||||
SNAP="hh-snap:${LABEL}"
|
||||
PY="$REPO/.venv/bin/python"
|
||||
BIN="$REPO/hh/target/debug/hack-house"
|
||||
SRV_SESS="hhpoc-srv"
|
||||
A_SESS="hhpoc-a"
|
||||
B_SESS="hhpoc-b"
|
||||
EVID="$(mktemp -d /tmp/hh-poc.XXXXXX)"
|
||||
KEEP=0; [[ "${1:-}" == "--keep" ]] && KEEP=1
|
||||
|
||||
GREEN=$'\e[32m'; RED=$'\e[31m'; YEL=$'\e[33m'; DIM=$'\e[2m'; RST=$'\e[0m'
|
||||
step() { printf '\n%s== %s ==%s\n' "$YEL" "$*" "$RST"; }
|
||||
ok() { printf '%s ok %s%s\n' "$GREEN" "$*" "$RST"; }
|
||||
bad() { printf '%s XX %s%s\n' "$RED" "$*" "$RST"; }
|
||||
note() { printf '%s %s%s\n' "$DIM" "$*" "$RST"; }
|
||||
|
||||
FAIL=0
|
||||
fail() { bad "$*"; FAIL=1; }
|
||||
|
||||
cleanup() {
|
||||
if [[ $KEEP -eq 1 ]]; then
|
||||
note "--keep: leaving server/sessions/image up. Evidence: $EVID"
|
||||
return
|
||||
fi
|
||||
step "cleanup"
|
||||
tmux kill-session -t "$A_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$B_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$SRV_SESS" 2>/dev/null
|
||||
docker rm -f "$CTR" >/dev/null 2>&1
|
||||
docker rmi -f "$SNAP" >/dev/null 2>&1
|
||||
note "removed container + $SNAP; sessions killed. Evidence kept: $EVID"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
# ---- helpers ----------------------------------------------------------------
|
||||
# say <session> <text> : type a literal line then Enter (no Ctrl-U; renders race)
|
||||
say() {
|
||||
local sess="$1"; shift
|
||||
tmux send-keys -t "$sess" -l "$*"
|
||||
sleep 0.4
|
||||
tmux send-keys -t "$sess" Enter
|
||||
sleep 0.6
|
||||
}
|
||||
cap() { tmux capture-pane -t "$1" -p 2>/dev/null; } # snapshot a pane to stdout
|
||||
snap_evid() { cap "$1" > "$EVID/$2.txt"; } # ...and save it
|
||||
|
||||
# wait_for <session> <regex> <timeout_s> : poll the pane until regex appears
|
||||
wait_for() {
|
||||
local sess="$1" re="$2" t="${3:-30}" i=0
|
||||
while (( i < t*2 )); do
|
||||
cap "$sess" | grep -qE "$re" && return 0
|
||||
sleep 0.5; ((i++))
|
||||
done
|
||||
return 1
|
||||
}
|
||||
# wait_cmd <cmd...> : succeeds within a timeout (seconds via $WT, default 30)
|
||||
wait_cmd() {
|
||||
local t="${WT:-30}" i=0
|
||||
while (( i < t )); do "$@" >/dev/null 2>&1 && return 0; sleep 1; ((i++)); done
|
||||
return 1
|
||||
}
|
||||
|
||||
# ---- 0. preflight -----------------------------------------------------------
|
||||
step "preflight"
|
||||
command -v tmux >/dev/null || { echo "tmux required"; exit 2; }
|
||||
[[ -x "$PY" ]] || { echo "venv python missing: $PY"; exit 2; }
|
||||
docker info >/dev/null 2>&1 || { echo "docker daemon down - start it first"; exit 2; }
|
||||
ollama list 2>/dev/null | grep -q 'qwen2.5-coder' || note "warn: qwen2.5-coder not in 'ollama list' (coder path may fall back)"
|
||||
ollama list 2>/dev/null | grep -q 'qwen2.5:3b' || note "warn: qwen2.5:3b not present (chat default)"
|
||||
docker image inspect "$IMG" >/dev/null 2>&1 || { echo "pulling $IMG..."; docker pull "$IMG"; }
|
||||
if [[ ! -x "$BIN" ]]; then
|
||||
step "building client (debug)"; ( cd "$REPO/hh" && cargo build ) || exit 2
|
||||
fi
|
||||
ok "tools present, docker up, models checked"
|
||||
note "evidence dir: $EVID"
|
||||
|
||||
# clear any stale state
|
||||
tmux kill-session -t "$A_SESS" 2>/dev/null; tmux kill-session -t "$B_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$SRV_SESS" 2>/dev/null
|
||||
docker rm -f "$CTR" >/dev/null 2>&1
|
||||
docker rmi -f "$SNAP" >/dev/null 2>&1
|
||||
|
||||
# ---- 1. server --------------------------------------------------------------
|
||||
step "boot server :$PORT"
|
||||
tmux new-session -d -s "$SRV_SESS" -x 200 -y 50 \
|
||||
"cd '$REPO' && '$PY' cmd_chat.py serve 127.0.0.1 $PORT --password '$PW' --no-tls 2>&1 | tee '$EVID/server.log'"
|
||||
WT=20 wait_cmd bash -c "grep -qiE 'listening|running|serving|started|websocket' '$EVID/server.log'" \
|
||||
|| sleep 3 # some builds log nothing; give it a beat
|
||||
ok "server session up"
|
||||
|
||||
# ---- 2. session A: client ---------------------------------------------------
|
||||
step "session A - alice joins"
|
||||
tmux new-session -d -s "$A_SESS" -x 200 -y 50 \
|
||||
"'$BIN' connect 127.0.0.1 $PORT alice --password '$PW' --no-tls 2>&1 | tee '$EVID/clientA.log'"
|
||||
wait_for "$A_SESS" 'alice|roster|hack-house|owner' 20 && ok "alice in the room" \
|
||||
|| fail "alice never joined (see $EVID/clientA.log)"
|
||||
snap_evid "$A_SESS" 01-joined
|
||||
|
||||
# ---- 3. launch docker sandbox ----------------------------------------------
|
||||
step "launch docker sandbox ($IMG)"
|
||||
say "$A_SESS" "/sbx launch docker $IMG"
|
||||
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" \
|
||||
&& ok "container '$CTR' running" || fail "sandbox container never came up"
|
||||
wait_for "$A_SESS" 'summoned|sandbox|ready|online' 60 >/dev/null
|
||||
snap_evid "$A_SESS" 02-sandbox
|
||||
|
||||
# ---- 4. spawn the coder agent + grant drive --------------------------------
|
||||
step "spawn oracle (qwen2.5:3b chat, qwen2.5-coder:1.5b for !task)"
|
||||
say "$A_SESS" "/ai start"
|
||||
wait_for "$A_SESS" 'oracle|online|ollama' 45 && ok "oracle announced" \
|
||||
|| note "no 'online' line yet - agent log: ${TMPDIR:-/tmp}/hh-agent-oracle.log"
|
||||
say "$A_SESS" "/grant oracle"
|
||||
sleep 1
|
||||
snap_evid "$A_SESS" 03-agent
|
||||
|
||||
# ---- 5. fast model builds code in the sandbox ------------------------------
|
||||
step "fast qwen builds /root/fib.py in the sandbox"
|
||||
say "$A_SESS" "/ai oracle !create /root/fib.py that prints the first 10 fibonacci numbers space-separated on one line, then run it with python3"
|
||||
# Give the CPU coder model room to think, then poll for the file.
|
||||
WT=150 wait_cmd docker exec "$CTR" test -s /root/fib.py
|
||||
NEED='0 1 1 2 3 5 8 13 21 34'
|
||||
runout() { docker exec "$CTR" sh -c 'cd /root && python3 fib.py' 2>&1; }
|
||||
# Accept the model's work only if the file exists AND actually runs to the right
|
||||
# sequence. A 1.5B model typed through a PTY sometimes drops indentation, so fall
|
||||
# back to a known-good file (written BEFORE save, so the snapshot is meaningful).
|
||||
if docker exec "$CTR" test -s /root/fib.py 2>/dev/null && runout | grep -qE "$NEED"; then
|
||||
ok "model wrote a working /root/fib.py"
|
||||
BUILT_BY="qwen2.5-coder"
|
||||
else
|
||||
note "model output missing or not runnable - writing deterministic fallback so the"
|
||||
note "save/load proof still completes (retry for a clean model take in the video)."
|
||||
docker exec "$CTR" sh -c 'cat > /root/fib.py <<"PY"
|
||||
a, b = 0, 1
|
||||
out = []
|
||||
for _ in range(10):
|
||||
out.append(str(a))
|
||||
a, b = b, a + b
|
||||
print(" ".join(out))
|
||||
PY'
|
||||
BUILT_BY="fallback"
|
||||
fi
|
||||
runout > "$EVID/fib-output.txt" 2>&1
|
||||
ORIG_SHA="$(docker exec "$CTR" sha256sum /root/fib.py | awk '{print $1}')"
|
||||
note "fib.py built by: $BUILT_BY"
|
||||
note "fib.py output: $(cat "$EVID/fib-output.txt")"
|
||||
docker exec "$CTR" cat /root/fib.py > "$EVID/fib-src-original.py"
|
||||
snap_evid "$A_SESS" 04-built
|
||||
grep -qE "$NEED" "$EVID/fib-output.txt" \
|
||||
&& ok "fib.py prints the sequence" || fail "fib.py output unexpected"
|
||||
|
||||
# ---- 6. snapshot to an image -----------------------------------------------
|
||||
step "/sbx save $LABEL (docker commit -> $SNAP)"
|
||||
say "$A_SESS" "/sbx save $LABEL"
|
||||
WT=40 wait_cmd bash -c "docker images $SNAP --format '{{.Tag}}' | grep -qx '$LABEL'" \
|
||||
&& ok "image $SNAP created" || fail "snapshot image not found"
|
||||
wait_for "$A_SESS" "saved|hh-snap|$LABEL" 10 >/dev/null
|
||||
snap_evid "$A_SESS" 05-saved
|
||||
|
||||
# ---- 7. close the session (quit the client) --------------------------------
|
||||
step "close session A (Ctrl-Q -> teardown purges the container)"
|
||||
tmux send-keys -t "$A_SESS" C-q
|
||||
sleep 3
|
||||
tmux kill-session -t "$A_SESS" 2>/dev/null
|
||||
WT=20 wait_cmd bash -c "! docker ps -a --format '{{.Names}}' | grep -qx '$CTR'" \
|
||||
&& ok "container '$CTR' purged on quit" || fail "container still present after quit"
|
||||
if docker images "$SNAP" --format '{{.Tag}}' | grep -qx "$LABEL"; then
|
||||
ok "image $SNAP survived the purge"
|
||||
else
|
||||
fail "image $SNAP missing after purge"
|
||||
fi
|
||||
|
||||
# ---- 8. session B: reopen and load -----------------------------------------
|
||||
step "session B - fresh client, /sbx load $LABEL"
|
||||
tmux new-session -d -s "$B_SESS" -x 200 -y 50 \
|
||||
"'$BIN' connect 127.0.0.1 $PORT alice --password '$PW' --no-tls 2>&1 | tee '$EVID/clientB.log'"
|
||||
wait_for "$B_SESS" 'alice|roster|hack-house|owner' 20 && ok "alice re-joined" \
|
||||
|| fail "alice never re-joined"
|
||||
say "$B_SESS" "/sbx load $LABEL"
|
||||
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" \
|
||||
&& ok "container relaunched from $SNAP" || fail "load never started a container"
|
||||
wait_for "$B_SESS" 'summoned|sandbox|ready|loading|online' 60 >/dev/null
|
||||
snap_evid "$B_SESS" 06-loaded
|
||||
|
||||
# ---- 9. the reveal: the model's code is intact -----------------------------
|
||||
step "verify the work persisted"
|
||||
WT=30 wait_cmd docker exec "$CTR" test -s /root/fib.py
|
||||
NEW_SHA="$(docker exec "$CTR" sha256sum /root/fib.py 2>/dev/null | awk '{print $1}')"
|
||||
docker exec "$CTR" cat /root/fib.py > "$EVID/fib-src-loaded.py" 2>/dev/null
|
||||
docker exec "$CTR" sh -c 'cd /root && python3 fib.py' > "$EVID/fib-output-loaded.txt" 2>&1
|
||||
note "original sha: $ORIG_SHA"
|
||||
note "loaded sha: $NEW_SHA"
|
||||
note "loaded output: $(cat "$EVID/fib-output-loaded.txt" 2>/dev/null)"
|
||||
if [[ -n "$NEW_SHA" && "$NEW_SHA" == "$ORIG_SHA" ]]; then
|
||||
ok "fib.py is byte-for-byte identical after close+reload - PERSISTENCE PROVEN"
|
||||
else
|
||||
fail "fib.py differs or missing after reload"
|
||||
fi
|
||||
# show it on the TUI for the camera
|
||||
tmux send-keys -t "$B_SESS" F2; sleep 1 # drive
|
||||
say "$B_SESS" "cat /root/fib.py && python3 /root/fib.py"
|
||||
sleep 2
|
||||
snap_evid "$B_SESS" 07-reveal
|
||||
|
||||
# ---- summary ----------------------------------------------------------------
|
||||
step "result"
|
||||
if [[ $FAIL -eq 0 ]]; then
|
||||
printf '%sPoC PASS%s - built-by=%s, saved=%s, purged-on-quit, reloaded-intact\n' \
|
||||
"$GREEN" "$RST" "$BUILT_BY" "$SNAP"
|
||||
else
|
||||
printf '%sPoC FAIL%s - inspect captures in %s\n' "$RED" "$RST" "$EVID"
|
||||
fi
|
||||
note "captures: $EVID/{01-joined,02-sandbox,03-agent,04-built,05-saved,06-loaded,07-reveal}.txt"
|
||||
note "code: $EVID/fib-src-original.py vs fib-src-loaded.py"
|
||||
exit $FAIL
|
||||
245
hh/film-save-load.sh
Executable file
245
hh/film-save-load.sh
Executable file
|
|
@ -0,0 +1,245 @@
|
|||
#!/usr/bin/env bash
|
||||
# film-save-load.sh — RECORD the "persistent sandbox" beat to an asciinema cast,
|
||||
# then render an MP4. Sibling of demo-save-load.sh (the correctness harness):
|
||||
# this one is for the camera, so it paces the beats and records a single,
|
||||
# continuous take of the real flow:
|
||||
#
|
||||
# launch docker sandbox → /ai start (fast qwen) → agent builds code in it
|
||||
# → /sbx save <label> → Ctrl-Q quit (container purged, image survives)
|
||||
# → fresh client → /sbx load <label> → reveal: the work is intact
|
||||
#
|
||||
# Recording trick (per the TUI-tmux recipe): the demo runs in an inner tmux
|
||||
# session; `asciinema rec` runs in its own detached session that `tmux attach`es
|
||||
# to the inner one, so it mirrors exactly what we drive with send-keys.
|
||||
#
|
||||
# Usage: hh/film-save-load.sh [--keep] [--no-render]
|
||||
# --keep leave server/sessions/container/image up afterwards
|
||||
# --no-render stop after writing the .cast (skip the mp4 render)
|
||||
set -uo pipefail
|
||||
|
||||
# ---- config -----------------------------------------------------------------
|
||||
REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
pick_port() { local p; for p in $(seq 4200 4280); do ss -ltn 2>/dev/null | grep -q ":$p " || { echo "$p"; return; }; done; echo 4173; }
|
||||
PORT="${PORT:-$(pick_port)}"
|
||||
PW="${PW:-malware-bless}"
|
||||
LABEL="${LABEL:-buildbox}"
|
||||
IMG="${IMG:-python:3.12-slim}"
|
||||
CTR="hack-house"
|
||||
SNAP="hh-snap:${LABEL}"
|
||||
PY="$REPO/.venv/bin/python"
|
||||
BIN="$REPO/hh/target/debug/hack-house"
|
||||
COLS=110; ROWS=32
|
||||
SRV_SESS="hhfilm-srv" # server (not recorded)
|
||||
RUN_SESS="hhfilm" # the demo pane we drive
|
||||
REC_SESS="hhfilm-rec" # asciinema attaches here and records
|
||||
OUTDIR="$REPO/docs/demo"
|
||||
CAST="$OUTDIR/save-load.cast"
|
||||
MP4="$OUTDIR/save-load.mp4"
|
||||
CODER="qwen2.5-coder:1.5b"
|
||||
NEED='0 1 1 2 3 5 8 13 21 34'
|
||||
|
||||
KEEP=0; RENDER=1
|
||||
for a in "$@"; do
|
||||
case "$a" in
|
||||
--keep) KEEP=1 ;;
|
||||
--no-render) RENDER=0 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
GREEN=$'\e[32m'; RED=$'\e[31m'; YEL=$'\e[33m'; DIM=$'\e[2m'; RST=$'\e[0m'
|
||||
step() { printf '\n%s== %s ==%s\n' "$YEL" "$*" "$RST"; }
|
||||
ok() { printf '%s ok %s%s\n' "$GREEN" "$*" "$RST"; }
|
||||
bad() { printf '%s XX %s%s\n' "$RED" "$*" "$RST"; }
|
||||
note() { printf '%s %s%s\n' "$DIM" "$*" "$RST"; }
|
||||
FAIL=0; fail() { bad "$*"; FAIL=1; }
|
||||
|
||||
cleanup() {
|
||||
if [[ $KEEP -eq 1 ]]; then
|
||||
note "--keep: leaving server/sessions/container/image up."
|
||||
return
|
||||
fi
|
||||
step "cleanup"
|
||||
tmux kill-session -t "$REC_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$RUN_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$SRV_SESS" 2>/dev/null
|
||||
docker rm -f "$CTR" >/dev/null 2>&1
|
||||
docker rmi -f "$SNAP" >/dev/null 2>&1
|
||||
note "removed container + $SNAP; sessions killed."
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
# ---- helpers ----------------------------------------------------------------
|
||||
# type into the recorded pane: literal text, a beat, then Enter (no Ctrl-U)
|
||||
say() { tmux send-keys -t "$RUN_SESS" -l "$*"; sleep 0.5; tmux send-keys -t "$RUN_SESS" Enter; sleep 0.8; }
|
||||
key() { tmux send-keys -t "$RUN_SESS" "$@"; }
|
||||
cap() { tmux capture-pane -t "$RUN_SESS" -p 2>/dev/null; }
|
||||
wait_for() { local re="$1" t="${2:-30}" i=0; while (( i < t*2 )); do cap | grep -qE "$re" && return 0; sleep 0.5; ((i++)); done; return 1; }
|
||||
wait_cmd() { local t="${WT:-30}" i=0; while (( i < t )); do "$@" >/dev/null 2>&1 && return 0; sleep 1; ((i++)); done; return 1; }
|
||||
runout() { docker exec "$CTR" sh -c 'cd /root && python3 fib.py' 2>&1; }
|
||||
|
||||
# ---- 0. preflight -----------------------------------------------------------
|
||||
step "preflight"
|
||||
command -v tmux >/dev/null || { echo "tmux required"; exit 2; }
|
||||
command -v "$HOME/anaconda3/bin/asciinema" >/dev/null || command -v asciinema >/dev/null || { echo "asciinema required"; exit 2; }
|
||||
ASCIINEMA="$( [[ -x "$HOME/anaconda3/bin/asciinema" ]] && echo "$HOME/anaconda3/bin/asciinema" || command -v asciinema )"
|
||||
[[ -x "$PY" ]] || { echo "venv python missing: $PY"; exit 2; }
|
||||
docker info >/dev/null 2>&1 || { echo "docker daemon down"; exit 2; }
|
||||
ollama list 2>/dev/null | grep -q "$CODER" || { echo "coder model $CODER not pulled"; exit 2; }
|
||||
docker image inspect "$IMG" >/dev/null 2>&1 || { echo "pulling $IMG..."; docker pull "$IMG"; }
|
||||
[[ -x "$BIN" ]] || { step "building client"; ( cd "$REPO/hh" && cargo build ) || exit 2; }
|
||||
mkdir -p "$OUTDIR"
|
||||
ok "tools present, docker up, $CODER ready"
|
||||
|
||||
# clear stale state
|
||||
tmux kill-session -t "$REC_SESS" 2>/dev/null; tmux kill-session -t "$RUN_SESS" 2>/dev/null
|
||||
tmux kill-session -t "$SRV_SESS" 2>/dev/null
|
||||
docker rm -f "$CTR" >/dev/null 2>&1; docker rmi -f "$SNAP" >/dev/null 2>&1
|
||||
rm -f "$CAST"
|
||||
|
||||
# ---- 0b. pre-warm the coder so first-token latency on camera is short -------
|
||||
step "pre-warm $CODER (off camera)"
|
||||
"$PY" - "$CODER" <<'PY' 2>/dev/null || true
|
||||
import sys, json, urllib.request
|
||||
m = sys.argv[1]
|
||||
req = urllib.request.Request("http://127.0.0.1:11434/api/generate",
|
||||
data=json.dumps({"model": m, "prompt": "ok", "stream": False}).encode(),
|
||||
headers={"Content-Type": "application/json"})
|
||||
try:
|
||||
urllib.request.urlopen(req, timeout=120).read()
|
||||
except Exception:
|
||||
pass
|
||||
PY
|
||||
ok "model warmed"
|
||||
|
||||
# ---- 1. server (not recorded) ----------------------------------------------
|
||||
step "boot server :$PORT"
|
||||
tmux new-session -d -s "$SRV_SESS" -x 200 -y 50 \
|
||||
"cd '$REPO' && '$PY' cmd_chat.py serve 127.0.0.1 $PORT --password '$PW' --no-tls 2>&1 | tee /tmp/hhfilm-server.log"
|
||||
WT=20 wait_cmd bash -c "grep -qiE 'listening|running|serving|started|websocket' /tmp/hhfilm-server.log" || sleep 3
|
||||
ok "server up"
|
||||
|
||||
# ---- 2. inner demo pane + recorder -----------------------------------------
|
||||
step "open recorded pane (${COLS}x${ROWS}) and start asciinema"
|
||||
# inner demo session we drive (bash, sized for the cast)
|
||||
tmux new-session -d -s "$RUN_SESS" -x "$COLS" -y "$ROWS" "bash --noprofile --norc"
|
||||
sleep 0.5
|
||||
tmux send-keys -t "$RUN_SESS" -l "cd '$REPO'"; tmux send-keys -t "$RUN_SESS" Enter
|
||||
tmux send-keys -t "$RUN_SESS" -l "clear"; tmux send-keys -t "$RUN_SESS" Enter
|
||||
sleep 0.5
|
||||
# recorder session: same size, just attaches to the demo session and records it
|
||||
tmux new-session -d -s "$REC_SESS" -x "$COLS" -y "$ROWS" \
|
||||
"'$ASCIINEMA' rec --overwrite -c 'tmux attach -t $RUN_SESS' '$CAST'"
|
||||
sleep 2
|
||||
ok "recording → $CAST"
|
||||
|
||||
# ---- 3. title + join --------------------------------------------------------
|
||||
say "echo '⛧ hack-house — ephemeral by default, persistent on demand'"
|
||||
sleep 1.2
|
||||
say "$BIN connect 127.0.0.1 $PORT alice --password '$PW' --no-tls"
|
||||
wait_for 'alice|roster|hack-house|owner' 20 && ok "alice joined" || fail "alice never joined"
|
||||
sleep 1.5
|
||||
|
||||
# ---- 4. launch docker sandbox ----------------------------------------------
|
||||
step "launch docker sandbox"
|
||||
say "/sbx launch docker $IMG"
|
||||
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" && ok "container up" || fail "sandbox never came up"
|
||||
wait_for 'summoned|sandbox|ready|online' 60 >/dev/null
|
||||
sleep 1.5
|
||||
|
||||
# ---- 5. spawn fast qwen agent (auto-grant drive) ---------------------------
|
||||
step "spawn oracle (auto-grant sandbox drive)"
|
||||
say "/ai start $CODER allow"
|
||||
wait_for 'oracle|online|ollama|qwen' 45 && ok "oracle online" || note "no online line yet"
|
||||
sleep 1.5
|
||||
|
||||
# ---- 6. the fast model builds code in the sandbox --------------------------
|
||||
# Transcription-only, ZERO-indentation task so the 1.5B coder can't break it
|
||||
# through the PTY. Validate-by-running; retry once; abort before save if it
|
||||
# still fails (no silent fallback in a film).
|
||||
step "fast qwen writes /root/fib.py and runs it"
|
||||
TASK="/ai oracle !create /root/fib.py with exactly two lines and nothing else: line 1 is nums = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] and line 2 is print(*nums) then run it with: python3 /root/fib.py"
|
||||
BUILT=0
|
||||
for attempt in 1 2; do
|
||||
note "build attempt $attempt"
|
||||
say "$TASK"
|
||||
WT=180 wait_cmd docker exec "$CTR" test -s /root/fib.py
|
||||
if docker exec "$CTR" test -s /root/fib.py 2>/dev/null && runout | grep -qE "$NEED"; then
|
||||
BUILT=1; break
|
||||
fi
|
||||
note "output not yet correct; re-prompting"
|
||||
sleep 2
|
||||
done
|
||||
if [[ $BUILT -eq 1 ]]; then
|
||||
ok "model wrote a working /root/fib.py"
|
||||
else
|
||||
fail "model never produced runnable fib.py after retries — aborting before save"
|
||||
exit $FAIL
|
||||
fi
|
||||
ORIG_SHA="$(docker exec "$CTR" sha256sum /root/fib.py | awk '{print $1}')"
|
||||
note "fib output: $(runout)"
|
||||
sleep 1.5
|
||||
|
||||
# ---- 7. snapshot to an image -----------------------------------------------
|
||||
step "/sbx save $LABEL"
|
||||
say "/sbx save $LABEL"
|
||||
WT=40 wait_cmd bash -c "docker images $SNAP --format '{{.Tag}}' | grep -qx '$LABEL'" && ok "image $SNAP created" || fail "snapshot not found"
|
||||
wait_for "saved|hh-snap|$LABEL" 10 >/dev/null
|
||||
sleep 2
|
||||
|
||||
# ---- 8. close the session (Ctrl-Q purges the container) --------------------
|
||||
step "quit client (Ctrl-Q → teardown purges container)"
|
||||
key C-q
|
||||
sleep 3
|
||||
WT=20 wait_cmd bash -c "! docker ps -a --format '{{.Names}}' | grep -qx '$CTR'" && ok "container purged" || fail "container still present"
|
||||
docker images "$SNAP" --format '{{.Tag}}' | grep -qx "$LABEL" && ok "image survived purge" || fail "image missing"
|
||||
# prove it on camera
|
||||
sleep 1
|
||||
say "docker ps -a --format '{{.Names}}' | grep hack-house || echo '(no hack-house container — purged)'"
|
||||
sleep 1.5
|
||||
say "docker images hh-snap --format '⛧ {{.Repository}}:{{.Tag}}'"
|
||||
sleep 2
|
||||
|
||||
# ---- 9. fresh client → load -------------------------------------------------
|
||||
step "fresh session → /sbx load $LABEL"
|
||||
say "$BIN connect 127.0.0.1 $PORT alice --password '$PW' --no-tls"
|
||||
wait_for 'alice|roster|hack-house|owner' 20 && ok "alice re-joined" || fail "alice never re-joined"
|
||||
sleep 1.5
|
||||
say "/sbx load $LABEL"
|
||||
WT=60 wait_cmd docker ps --format '{{.Names}}' --filter "name=^${CTR}$" && ok "container relaunched" || fail "load never started"
|
||||
wait_for 'summoned|sandbox|ready|loading|online' 60 >/dev/null
|
||||
sleep 1.5
|
||||
|
||||
# ---- 10. the reveal ---------------------------------------------------------
|
||||
step "reveal: the model's code is intact"
|
||||
WT=30 wait_cmd docker exec "$CTR" test -s /root/fib.py
|
||||
NEW_SHA="$(docker exec "$CTR" sha256sum /root/fib.py 2>/dev/null | awk '{print $1}')"
|
||||
note "original sha: $ORIG_SHA"
|
||||
note "loaded sha: $NEW_SHA"
|
||||
[[ -n "$NEW_SHA" && "$NEW_SHA" == "$ORIG_SHA" ]] && ok "byte-for-byte identical — PERSISTENCE PROVEN" || fail "differs/missing after reload"
|
||||
# show it on the TUI for the camera
|
||||
key F2; sleep 1
|
||||
say "cat /root/fib.py && echo '---' && python3 /root/fib.py"
|
||||
sleep 3
|
||||
|
||||
# ---- 11. stop recording -----------------------------------------------------
|
||||
step "stop recording"
|
||||
tmux kill-session -t "$RUN_SESS" 2>/dev/null # attach exits → asciinema writes the cast
|
||||
sleep 2
|
||||
[[ -s "$CAST" ]] && ok "cast written: $CAST ($(du -h "$CAST" | cut -f1))" || fail "cast not written"
|
||||
|
||||
# ---- 12. render -------------------------------------------------------------
|
||||
if [[ $RENDER -eq 1 && $FAIL -eq 0 && -s "$CAST" ]]; then
|
||||
step "render mp4"
|
||||
"$REPO/../../video-toolkit/bin/cast2mp4.sh" "$CAST" "$MP4" --font-size 28 --theme dracula \
|
||||
|| bash ~/coding/video-toolkit/bin/cast2mp4.sh "$CAST" "$MP4" --font-size 28 --theme dracula
|
||||
[[ -s "$MP4" ]] && ok "mp4: $MP4 ($(du -h "$MP4" | cut -f1))" || fail "render produced no mp4"
|
||||
fi
|
||||
|
||||
# ---- summary ----------------------------------------------------------------
|
||||
step "result"
|
||||
if [[ $FAIL -eq 0 ]]; then
|
||||
printf '%sFILM OK%s — cast=%s%s\n' "$GREEN" "$RST" "$CAST" "$( [[ -s "$MP4" ]] && echo " mp4=$MP4" )"
|
||||
else
|
||||
printf '%sFILM FAIL%s — inspect %s\n' "$RED" "$RST" "$CAST"
|
||||
fi
|
||||
exit $FAIL
|
||||
Loading…
Reference in New Issue
Block a user