hack-house

Trilltechnician/hack-house

Fork 0

Commit Graph

Author	SHA1	Message	Date
leetcrypt	e5e1ad8dee	feat(ai): in-RAM semantic recall (RAG) for conversation context Give the agent recall of things said beyond the verbatim window, without breaking the RAM-only philosophy — nothing is persisted to disk. - MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python cosine search (no numpy). Retains far more than the rolling transcript so old lines can be surfaced on demand; oldest evicted past the cap to bound RAM. - OllamaEmbedder: local embeddings via nomic-embed-text, on by default and independent of the chat provider (reuses the Ollama host when chat is Ollama). - Bridge: captured room messages (live + backfilled) are embedded on a background worker so a slow embedder can't stall frame draining. On a /ai question the agent retrieves top-k relevant lines, drops weak (<min_score) and windowed-duplicate hits, and prepends them as a clearly-fenced "recalled context" preamble — kept at user role, never elevated to system, so untrusted room text informs without instructing. Falls back to recency-only if the embedder is unreachable. - CLI: --no-rag, --embed-model, --embed-host, --rag-top-k. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-02 17:59:01 -07:00
leetcrypt	bbb9e82425	docs: plan for AI agent context + local-perf improvements Roadmap for deepening the /ai agent's conversational context while keeping the RAM-only philosophy, plus Ollama latency wins. Marks Tier 1 (backfill, token-budget window) and the perf tuning as in-scope now; RAG and in-RAM compaction staged next. Grounded in public Anthropic docs, not leaked source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-02 17:43:02 -07:00

Author

SHA1

Message

Date

leetcrypt

e5e1ad8dee

feat(ai): in-RAM semantic recall (RAG) for conversation context

Give the agent recall of things said beyond the verbatim window, without
breaking the RAM-only philosophy — nothing is persisted to disk.

- MemoryIndex: a capped, in-memory pool of embedded messages with pure-Python
  cosine search (no numpy). Retains far more than the rolling transcript so old
  lines can be surfaced on demand; oldest evicted past the cap to bound RAM.
- OllamaEmbedder: local embeddings via nomic-embed-text, on by default and
  independent of the chat provider (reuses the Ollama host when chat is Ollama).
- Bridge: captured room messages (live + backfilled) are embedded on a
  background worker so a slow embedder can't stall frame draining. On a /ai
  question the agent retrieves top-k relevant lines, drops weak (<min_score) and
  windowed-duplicate hits, and prepends them as a clearly-fenced "recalled
  context" preamble — kept at user role, never elevated to system, so untrusted
  room text informs without instructing. Falls back to recency-only if the
  embedder is unreachable.
- CLI: --no-rag, --embed-model, --embed-host, --rag-top-k.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-06-02 17:59:01 -07:00

leetcrypt

bbb9e82425

docs: plan for AI agent context + local-perf improvements

Roadmap for deepening the /ai agent's conversational context while keeping
the RAM-only philosophy, plus Ollama latency wins. Marks Tier 1 (backfill,
token-budget window) and the perf tuning as in-scope now; RAG and in-RAM
compaction staged next. Grounded in public Anthropic docs, not leaked source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-06-02 17:43:02 -07:00

2 Commits