Commit Graph

6 Commits

Author SHA1 Message Date
leetcrypt
07e9c30846 feat(sbx,ui): VM snapshot save/load + collapsible clustered help menu
- /sbx save|load|snaps: docker commit → hh-snap:<label> image that
  survives /sbx stop; load relaunches a fresh sandbox from it; multipass
  delegates to `multipass snapshot`. Local backend unsupported.
- Help overlay redesigned into topical clusters (SANDBOX, AI AGENTS,
  PERMISSIONS, FILES, APPEARANCE, KEYS, ROSTER GLYPHS), collapsed by
  default; up/down highlight a cluster, left/right/Enter expand-collapse
  it (tmux-style), PgUp/PgDn scroll overflow, Esc closes.
- docstring: example uses --model qwen2.5:3b (the locally-pulled model),
  not llama3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 23:03:00 -07:00
leetcrypt
26c651e9ac perf(ai): CPU-tuned local inference + qwen2.5-coder sandbox path
Tier A/B/C wins for the CPU-only Ollama box (no GPU → optimize TTFT and
tokens/sec, not VRAM):

- Separate qwen2.5-coder provider for the sandbox `!task` path; chat keeps
  the general model. Auto-selected when chat is Ollama and a coder build is
  present, override with --code-model.
- OllamaProvider num_ctx default 8192→4096 (8192 was a GPU-mindset default
  that inflates prefill/TTFT on CPU); expose num_thread; add --num-ctx,
  --num-thread, --num-predict. token_budget default 3000→2000 to fit.
- OllamaProvider.stream() generator over Ollama's stream=True chat endpoint
  (provider half of token streaming; agent/Rust rendering is a follow-up).
- Few-shot request→shell exemplars in SANDBOX_SYSTEM to anchor the small
  model's fenced-command output.
- Matryoshka embedding truncation: OllamaEmbedder truncate_dim=256 (--embed-dim)
  for faster pure-Python cosine and less RAM; query+stored share the dim.
- docs/ai-perf-plan.md records all 8 items with status and the server-side
  env (OLLAMA_NUM_PARALLEL=1, keep_alive) that must be set where ollama serve runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 22:37:59 -07:00
leetcrypt
e5e1ad8dee feat(ai): in-RAM semantic recall (RAG) for conversation context
Give the agent recall of things said beyond the verbatim window, without
breaking the RAM-only philosophy — nothing is persisted to disk.

- MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python
  cosine search (no numpy). Retains far more than the rolling transcript so old
  lines can be surfaced on demand; oldest evicted past the cap to bound RAM.
- OllamaEmbedder: local embeddings via nomic-embed-text, on by default and
  independent of the chat provider (reuses the Ollama host when chat is Ollama).
- Bridge: captured room messages (live + backfilled) are embedded on a
  background worker so a slow embedder can't stall frame draining. On a /ai
  question the agent retrieves top-k relevant lines, drops weak (<min_score) and
  windowed-duplicate hits, and prepends them as a clearly-fenced "recalled
  context" preamble — kept at user role, never elevated to system, so untrusted
  room text informs without instructing. Falls back to recency-only if the
  embedder is unreachable.
- CLI: --no-rag, --embed-model, --embed-host, --rag-top-k.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 17:59:01 -07:00
leetcrypt
9b85255d80 feat(ai): backfill context on join + token-budget window
The server already ships the full RAM message backlog in the init frame; the
agent was discarding it. _seed_transcript now decrypts that history with the
room key (skipping our own lines, control frames, and undecryptable blobs) so
the agent has context the moment it joins instead of starting amnesiac.

_window() replaces the fixed last-12 slice on both the answer and sandbox
paths: it walks newest-to-oldest and keeps messages up to --token-budget
(approx, ~4 chars/token), still capped at --context-window count. Keeps small
local models inside their effective context. Nothing touches disk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 17:43:02 -07:00
leetcrypt
65df12de9e feat(ai): model profiles, capability discovery, and agentless /ai list|models
Make connecting any model a config step, not a code change:
- models.toml named profiles (api_key_env names an env var, never the key)
- providers gain available_models(); add preflight + --list-models/--check
- /ai list and /ai models in-room; client probes local Ollama for
  /ai models when no agent is running, and /ai list hints to summon one
- docs/providers.md provider guide + examples/echo_provider.py
- README: command table, AI section, layout updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-01 15:25:07 -07:00
leetcrypt
54b7637ec8 feat(agent): model-agnostic AI agent bridge (PoC) + pin lets-hack demo to main
Add cmd_chat/agent: a headless client that joins a room via SRP, decrypts
broadcasts, and answers /ai <question> through a pluggable model provider
(ollama default + anthropic + openai-compatible + module:Class). Server and
zero-knowledge guarantees unchanged; the agent is just another encrypted client.

Also pin the lets-hack demo to a detached worktree of main (default) so running
it from dev still demos stable main without touching the working checkout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-01 02:05:48 -07:00