Closes the cross-language half of token streaming (perf-plan A3). On the
CPU-only box perceived latency is time-to-first-token, so showing the reply
as it generates makes a slow model feel live.
- Agent: OllamaProvider.stream() runs on a worker thread; bridge relays
cumulative previews as throttled (~5/sec) `_ai:"stream"` control frames,
then a `done` frame clears the preview as the final persisted chat message
is posted. Providers without stream() fall back to blocking complete().
- Rust client: new Net::AiStream variant + parse_ai branch; App.ai_stream
map holds the in-progress text per agent; draw_chat renders it as a dim,
italic preview bubble below history. Cleared on done and on agent leave.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tier A/B/C wins for the CPU-only Ollama box (no GPU → optimize TTFT and
tokens/sec, not VRAM):
- Separate qwen2.5-coder provider for the sandbox `!task` path; chat keeps
the general model. Auto-selected when chat is Ollama and a coder build is
present, override with --code-model.
- OllamaProvider num_ctx default 8192→4096 (8192 was a GPU-mindset default
that inflates prefill/TTFT on CPU); expose num_thread; add --num-ctx,
--num-thread, --num-predict. token_budget default 3000→2000 to fit.
- OllamaProvider.stream() generator over Ollama's stream=True chat endpoint
(provider half of token streaming; agent/Rust rendering is a follow-up).
- Few-shot request→shell exemplars in SANDBOX_SYSTEM to anchor the small
model's fenced-command output.
- Matryoshka embedding truncation: OllamaEmbedder truncate_dim=256 (--embed-dim)
for faster pure-Python cosine and less RAM; query+stored share the dim.
- docs/ai-perf-plan.md records all 8 items with status and the server-side
env (OLLAMA_NUM_PARALLEL=1, keep_alive) that must be set where ollama serve runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Give the agent recall of things said beyond the verbatim window, without
breaking the RAM-only philosophy — nothing is persisted to disk.
- MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python
cosine search (no numpy). Retains far more than the rolling transcript so old
lines can be surfaced on demand; oldest evicted past the cap to bound RAM.
- OllamaEmbedder: local embeddings via nomic-embed-text, on by default and
independent of the chat provider (reuses the Ollama host when chat is Ollama).
- Bridge: captured room messages (live + backfilled) are embedded on a
background worker so a slow embedder can't stall frame draining. On a /ai
question the agent retrieves top-k relevant lines, drops weak (<min_score) and
windowed-duplicate hits, and prepends them as a clearly-fenced "recalled
context" preamble — kept at user role, never elevated to system, so untrusted
room text informs without instructing. Falls back to recency-only if the
embedder is unreachable.
- CLI: --no-rag, --embed-model, --embed-host, --rag-top-k.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OllamaProvider now sends keep_alive (default 30m) so the model stays resident
in VRAM between /ai calls instead of cold-reloading, and sets explicit options
(num_ctx 8192, num_predict 512) — Ollama otherwise caps context at 2048, which
would silently truncate the larger backfilled window.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server already ships the full RAM message backlog in the init frame; the
agent was discarding it. _seed_transcript now decrypts that history with the
room key (skipping our own lines, control frames, and undecryptable blobs) so
the agent has context the moment it joins instead of starting amnesiac.
_window() replaces the fixed last-12 slice on both the answer and sandbox
paths: it walks newest-to-oldest and keeps messages up to --token-budget
(approx, ~4 chars/token), still capped at --context-window count. Keeps small
local models inside their effective context. Nothing touches disk.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents can now run commands and build files in the shared sandbox, but
only when explicitly invoked with the `!` verb and only while the owner
has granted drive. Reuses the existing driver ACL + `_sbx:input` frames:
the Python agent emits the same input frames a human driver does, gated
by the broker's `app.drivers` check — no new transport.
Guardrails: a regex gate holds destructive commands until `/ai <name>
confirm`; blast-radius caps (20 cmds / 8KB); the agent echoes its plan to
the room before running (audit trail). Owner controls: `/grant`, `/ai
start <model> allow` to pre-grant on spawn, and a Ctrl-X panic kill
switch (revoke all non-owner drive + Ctrl-C the shell). The broker now
re-broadcasts the ACL on join so a freshly-summoned agent actually
receives its grant.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make connecting any model a config step, not a code change:
- models.toml named profiles (api_key_env names an env var, never the key)
- providers gain available_models(); add preflight + --list-models/--check
- /ai list and /ai models in-room; client probes local Ollama for
/ai models when no agent is running, and /ai list hints to summon one
- docs/providers.md provider guide + examples/echo_provider.py
- README: command table, AI section, layout updated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner of the spawning client can summon/dismiss a local AI agent from inside
the room (default ollama/qwen2.5:3b); the agent emits encrypted typing frames
that drive a "thinking" spinner in the client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cmd_chat/agent: a headless client that joins a room via SRP, decrypts
broadcasts, and answers /ai <question> through a pluggable model provider
(ollama default + anthropic + openai-compatible + module:Class). Server and
zero-knowledge guarantees unchanged; the agent is just another encrypted client.
Also pin the lets-hack demo to a detached worktree of main (default) so running
it from dev still demos stable main without touching the working checkout.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add /pw (alias /password): reveal this room's password locally (never
broadcast); surfaced in the F1 help overlay and the join hint
- direnv-autostart/: cd-to-launch a single real-user session via direnv;
password is minted in memory at launch (never written to disk, matching the
RAM-only model) and scoped to the child process. setup.sh installs direnv,
hooks bash/zsh, and `direnv allow`s the dir
- lets-hack.sh: boot a FRESH server by default (replacing any live one) with a
--reuse opt-out; add -h/--help/-help; guard against killing the tmux session
you're attached to; switch-client into the coven when run inside tmux
- rename coven→clergy across rust/python/scripts; tests/test_coven.py→test_clergy.py
- snapshots in-progress hack-house client work (sandbox, themes, net, ui)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Begin the coven evolution of cmd-chat (see docs/spec-collaborative-sandbox.md):
a Rust/ratatui client for the unchanged Python Sanic server, plus the
multi-user + zero-knowledge groundwork.
P0 — crypto parity (the spec's #1 risk), proven three ways:
- Hand-rolled SRP-6a (NG_2048, SHA-256, rfc5054 padding) matching pysrp
byte-for-byte, incl. the fixed b"chat" SRP identity and minimal-vs-256B
width quirks. Golden-vector unit test + offline selftest.
- Live handshake against the running server (H_AMK verified).
- Cross-language E2E: Python client decrypts a Rust-encrypted Fernet message.
P2 — multi-user coven (server):
- CMD_CHAT_MAX_USERS capacity cap (default 4, infra-for-more).
- Authoritative roster + user_joined broadcasts.
- Free the slot/username on ws disconnect (was held until 1h stale sweep).
Also: fix requirements.txt (was UTF-16, unparseable by pip).
coven/ : Rust crate (crypto.rs proven; main.rs spike CLI: selftest/handshake/srpm)
docs/ : full feature spec for the 6 requested features.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New commands: /send <filepath>, /accept, /reject
Protocol:
- Sender proposes file (name, size, SHA-256 hash)
- Recipient sees offer and chooses /accept or /reject
- On accept: file chunked (64KB), encrypted with room key, sent over WebSocket
- On receive: chunks reassembled, SHA-256 verified, saved to ./downloads/
- Server never sees file content (E2E encrypted, same as messages)
Limits: 50MB max file size. Files saved with collision-safe naming.
No server changes — server remains a dumb encrypted relay.
All 79 existing tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRITICAL fixes:
- Auto-generated self-signed TLS certs (HTTPS/WSS by default)
- Removed session_key from /srp/verify response (was sent in plaintext)
- Replaced with HMAC-SHA256 ws_token for WebSocket authentication
HIGH fixes:
- WebSocket auth now validates ws_token via hmac.compare_digest()
- /clear endpoint requires Bearer admin_token (printed at server start)
- Password no longer required as CLI arg — supports env var + getpass prompt
- Removed user_ip from Message model (no longer broadcast to clients)
MEDIUM fixes:
- Rate limiter on /srp/init and /srp/verify (10 req/min/IP)
- MessageStore capped at 1000 messages (prevents RAM DoS)
- access_log disabled (was leaking request metadata)
LOW fixes:
- Username sanitization against rich markup injection
- Dead code removed from helpers.py
All 79 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix abstract renderer signatures and add small stubs so type checkers can
see expected attributes (e.g. username, _decrypt). This removes several
mypy false-positives that were caused by mixin/ABC mismatches.
Preserve message text containing ':' by using split(':', 1) in both
DefaultClientRenderer and RichClientRenderer.
Normalize renderer APIs: print_chat(...) now takes the response mapping
and returns None (matches runtime behavior).
Make RSA symmetric-key request more robust: read r.content instead of a
fixed-size r.raw.read(999), avoiding truncated key material.
Improve _connect_ws exception handling in client to ensure a valid
Exception is re-raised if connection attempts fail.
Correct server/service typing: memory_msgs is now typed as
list[Message] and we null-check incoming payload text before creating a
new Message.
Replace manual package list in setup.py with setuptools.find_packages()
so packaging uses valid Python package names.
Installed types-requests in the project venv so mypy no longer flags the
requests import.
Verification: ran python -m compileall and mypy cmd_chat — no issues
remain.
Notes:
Wire format still uses Python literal evaluation in some places (existing
behavior); switching to JSON for client/server payloads is recommended as a
follow-up for robustness and security.