Give the agent recall of things said beyond the verbatim window, without
breaking the RAM-only philosophy — nothing is persisted to disk.
- MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python
cosine search (no numpy). Retains far more than the rolling transcript so old
lines can be surfaced on demand; oldest evicted past the cap to bound RAM.
- OllamaEmbedder: local embeddings via nomic-embed-text, on by default and
independent of the chat provider (reuses the Ollama host when chat is Ollama).
- Bridge: captured room messages (live + backfilled) are embedded on a
background worker so a slow embedder can't stall frame draining. On a /ai
question the agent retrieves top-k relevant lines, drops weak (<min_score) and
windowed-duplicate hits, and prepends them as a clearly-fenced "recalled
context" preamble — kept at user role, never elevated to system, so untrusted
room text informs without instructing. Falls back to recency-only if the
embedder is unreachable.
- CLI: --no-rag, --embed-model, --embed-host, --rag-top-k.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server already ships the full RAM message backlog in the init frame; the
agent was discarding it. _seed_transcript now decrypts that history with the
room key (skipping our own lines, control frames, and undecryptable blobs) so
the agent has context the moment it joins instead of starting amnesiac.
_window() replaces the fixed last-12 slice on both the answer and sandbox
paths: it walks newest-to-oldest and keeps messages up to --token-budget
(approx, ~4 chars/token), still capped at --context-window count. Keeps small
local models inside their effective context. Nothing touches disk.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents can now run commands and build files in the shared sandbox, but
only when explicitly invoked with the `!` verb and only while the owner
has granted drive. Reuses the existing driver ACL + `_sbx:input` frames:
the Python agent emits the same input frames a human driver does, gated
by the broker's `app.drivers` check — no new transport.
Guardrails: a regex gate holds destructive commands until `/ai <name>
confirm`; blast-radius caps (20 cmds / 8KB); the agent echoes its plan to
the room before running (audit trail). Owner controls: `/grant`, `/ai
start <model> allow` to pre-grant on spawn, and a Ctrl-X panic kill
switch (revoke all non-owner drive + Ctrl-C the shell). The broker now
re-broadcasts the ACL on join so a freshly-summoned agent actually
receives its grant.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make connecting any model a config step, not a code change:
- models.toml named profiles (api_key_env names an env var, never the key)
- providers gain available_models(); add preflight + --list-models/--check
- /ai list and /ai models in-room; client probes local Ollama for
/ai models when no agent is running, and /ai list hints to summon one
- docs/providers.md provider guide + examples/echo_provider.py
- README: command table, AI section, layout updated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner of the spawning client can summon/dismiss a local AI agent from inside
the room (default ollama/qwen2.5:3b); the agent emits encrypted typing frames
that drive a "thinking" spinner in the client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cmd_chat/agent: a headless client that joins a room via SRP, decrypts
broadcasts, and answers /ai <question> through a pluggable model provider
(ollama default + anthropic + openai-compatible + module:Class). Server and
zero-knowledge guarantees unchanged; the agent is just another encrypted client.
Also pin the lets-hack demo to a detached worktree of main (default) so running
it from dev still demos stable main without touching the working checkout.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>