Add the VirtualBox sandbox design spec (headless 4th backend + share-an- appliance GUI mode with detect-first install), the crypto pay-to-join gate design, and the save/load PoC writeup with its demo/film driver scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.5 KiB
PoC: persistent sandbox — fast-qwen build → save image → close → reload
Goal of the video beat: prove that a hack-house Docker sandbox is durable
on demand. A local, CPU-only fast qwen coder writes & runs code inside an
ephemeral Docker sandbox; we snapshot it to an image with /sbx save; we fully
close the session (container is purged on teardown); we relaunch the client and
/sbx load the snapshot — the code the model wrote is still there.
This is the headline pitch: sandboxes are RAM-only/ephemeral by default, but you can freeze a moment of work into an image and thaw it later — nothing leaks to the server, the image lives only on the owner's box.
Why this is non-obvious / worth showing
/sbx stopand client-quit both runsbx::teardown→docker rm -f hack-house. The container is gone. Normally the work would be gone too./sbx save <label>runsdocker commit hack-house hh-snap:<label>while the container is alive. The image is independent of the container, so it survives the purge./sbx load <label>runs a fresh container fromhh-snap:<label>— same filesystem state, new ephemeral instance.
Models (CPU-only box: i5-8350U, no GPU)
| Path | Model | Why |
|---|---|---|
chat (/ai <q>) |
qwen2.5:3b |
general, the locally-pulled default |
sandbox !task |
qwen2.5-coder:1.5b |
auto-selected coder; fast TTFT on CPU, better shell/code |
The agent auto-selects the coder build for the !task (sandbox-driving) path when
the chat provider is Ollama and a qwen2.5-coder is present (it is — pulled).
Storyboard (the cut)
- Title card — "Ephemeral by default. Persistent on demand."
- Summon — alice:
/sbx launch docker→ "summoned" sandbox bubble. - Spawn the coder — alice:
/ai start→oracle online — ollama/qwen2.5:3b(the coder model rides along for!task). - Build, by the fast model — alice:
/ai oracle !write /root/fib.py that prints the first 10 Fibonacci numbers, then run it→ agent drives the shared shell;fib.pyis written and executed; the sandbox pane shows the Fibonacci output. - Freeze it — alice:
/sbx save buildbox→⛧ saved sandbox → image hh-snap:buildbox · reload with /sbx load buildbox. - Walk away — alice:
/sbx stop(or quits the client entirely). Container is purged; prove it:docker ps -ashows nohack-house, butdocker images hh-snapstill listsbuildbox. - Come back — a fresh client session; alice:
/sbx load buildbox. - The reveal — F2 to drive,
cat /root/fib.py && python3 /root/fib.py→ the model's code and output are exactly as left. Persistence proven. - Result card — "OPERATIONS CONDUCTED": built by local qwen-coder · saved to image · session closed · reloaded intact.
Acceptance (what the PoC script asserts)
- After step 4:
docker exec hack-house cat /root/fib.pyis non-empty AND running it prints 10 Fibonacci numbers (0 1 1 2 3 5 8 13 21 34). - After step 5:
docker images hh-snap --format '{{.Tag}}'containsbuildbox. - After step 6 (stop):
docker ps -a --format '{{.Names}}'has nohack-house; thehh-snap:buildboximage still exists. - After step 7-8 (load): the new
hack-housecontainer's/root/fib.pymatches the original byte-for-byte.
Execution
hh/demo-save-load.sh drives the whole thing headlessly over tmux (per the
TUI-tmux test recipe): boots the server, runs client session A, injects the
beats with send-keys, verifies via capture-pane + docker exec, then quits
session A and opens client session B to load and confirm. It is a PoC /
correctness harness first; once green it feeds the polished video-toolkit
render.
Gotchas baked into the script
- TUI doesn't bind Ctrl-U (it inserts a literal
u); clear input withBSpace. Send text withsend-keys -l "<text>"then a separateEnter; don't race renders. - Agent name is hardcoded
oracle; only one/ai startper room. - Keep
!taskphrasing single-line; the agent's drive output lands in the sandbox pane, not chat. /sbx loadrefuses if a sandbox is already running — stop first.- Docker daemon must be up (
docker info);/sbx launch docker --startcan boot it (sudo) but we pre-check instead. - Snapshot label charset: alphanumerics,
.,_,-(≤64).
Teardown / cleanup
The script removes the hack-house container and (optionally) the hh-snap:*
demo images it created, and kills the server + tmux sessions, so reruns are clean.