Give the agent recall of things said beyond the verbatim window, without
breaking the RAM-only philosophy — nothing is persisted to disk.
- MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python
cosine search (no numpy). Retains far more than the rolling transcript so old
lines can be surfaced on demand; oldest evicted past the cap to bound RAM.
- OllamaEmbedder: local embeddings via nomic-embed-text, on by default and
independent of the chat provider (reuses the Ollama host when chat is Ollama).
- Bridge: captured room messages (live + backfilled) are embedded on a
background worker so a slow embedder can't stall frame draining. On a /ai
question the agent retrieves top-k relevant lines, drops weak (<min_score) and
windowed-duplicate hits, and prepends them as a clearly-fenced "recalled
context" preamble — kept at user role, never elevated to system, so untrusted
room text informs without instructing. Falls back to recency-only if the
embedder is unreachable.
- CLI: --no-rag, --embed-model, --embed-host, --rag-top-k.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roadmap for deepening the /ai agent's conversational context while keeping
the RAM-only philosophy, plus Ollama latency wins. Marks Tier 1 (backfill,
token-budget window) and the perf tuning as in-scope now; RAG and in-RAM
compaction staged next. Grounded in public Anthropic docs, not leaked source.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bump from 960px/12fps to 1280px/15fps with floyd-steinberg dithering
for crisper, retina-legible terminal text — 7.4MB, under GitHub's 10MB
inline-render limit. Exceeds the upstream example.gif (800px/15fps).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.7MB looping GIF rendered from the latest demo capture (alice+bob
sharing a multipass box: summon, drive, per-user sudo).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make connecting any model a config step, not a code change:
- models.toml named profiles (api_key_env names an env var, never the key)
- providers gain available_models(); add preflight + --list-models/--check
- /ai list and /ai models in-room; client probes local Ollama for
/ai models when no agent is running, and /ai list hints to summon one
- docs/providers.md provider guide + examples/echo_provider.py
- README: command table, AI section, layout updated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Begin the coven evolution of cmd-chat (see docs/spec-collaborative-sandbox.md):
a Rust/ratatui client for the unchanged Python Sanic server, plus the
multi-user + zero-knowledge groundwork.
P0 — crypto parity (the spec's #1 risk), proven three ways:
- Hand-rolled SRP-6a (NG_2048, SHA-256, rfc5054 padding) matching pysrp
byte-for-byte, incl. the fixed b"chat" SRP identity and minimal-vs-256B
width quirks. Golden-vector unit test + offline selftest.
- Live handshake against the running server (H_AMK verified).
- Cross-language E2E: Python client decrypts a Rust-encrypted Fernet message.
P2 — multi-user coven (server):
- CMD_CHAT_MAX_USERS capacity cap (default 4, infra-for-more).
- Authoritative roster + user_joined broadcasts.
- Free the slot/username on ws disconnect (was held until 1h stale sweep).
Also: fix requirements.txt (was UTF-16, unparseable by pip).
coven/ : Rust crate (crypto.rs proven; main.rs spike CLI: selftest/handshake/srpm)
docs/ : full feature spec for the 6 requested features.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>