hack-house

Files

T

leetcrypt 26c651e9ac perf(ai): CPU-tuned local inference + qwen2.5-coder sandbox path

Tier A/B/C wins for the CPU-only Ollama box (no GPU → optimize TTFT and
tokens/sec, not VRAM):

- Separate qwen2.5-coder provider for the sandbox `!task` path; chat keeps
  the general model. Auto-selected when chat is Ollama and a coder build is
  present, override with --code-model.
- OllamaProvider num_ctx default 8192→4096 (8192 was a GPU-mindset default
  that inflates prefill/TTFT on CPU); expose num_thread; add --num-ctx,
  --num-thread, --num-predict. token_budget default 3000→2000 to fit.
- OllamaProvider.stream() generator over Ollama's stream=True chat endpoint
  (provider half of token streaming; agent/Rust rendering is a follow-up).
- Few-shot request→shell exemplars in SANDBOX_SYSTEM to anchor the small
  model's fenced-command output.
- Matryoshka embedding truncation: OllamaEmbedder truncate_dim=256 (--embed-dim)
  for faster pure-Python cosine and less RAM; query+stored share the dim.
- docs/ai-perf-plan.md records all 8 items with status and the server-side
  env (OLLAMA_NUM_PARALLEL=1, keep_alive) that must be set where ollama serve runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-06-02 22:37:59 -07:00

ai-context-plan.md

feat(ai): in-RAM semantic recall (RAG) for conversation context

2026-06-02 17:59:01 -07:00

ai-perf-plan.md

perf(ai): CPU-tuned local inference + qwen2.5-coder sandbox path

2026-06-02 22:37:59 -07:00

hack-house-demo.gif

docs: higher-quality demo GIF (1280px, 15fps)

2026-06-01 15:56:01 -07:00

providers.md

feat(ai): model profiles, capability discovery, and agentless /ai list|models

2026-06-01 15:25:07 -07:00

spec-agent-bridge.md

docs: AI agent bridge spec (model-agnostic, /ai command, owner-gated ops)

2026-06-01 01:24:48 -07:00

spec-collaborative-sandbox.md

feat(coven): SRP/Fernet crypto parity + multi-user coven foundation ⛧

2026-05-30 11:47:25 -07:00