hack-house/cmd_chat/agent
leetcrypt 69bce5ead8 feat(ai): stream agent replies token-by-token to the room
Closes the cross-language half of token streaming (perf-plan A3). On the
CPU-only box perceived latency is time-to-first-token, so showing the reply
as it generates makes a slow model feel live.

- Agent: OllamaProvider.stream() runs on a worker thread; bridge relays
  cumulative previews as throttled (~5/sec) `_ai:"stream"` control frames,
  then a `done` frame clears the preview as the final persisted chat message
  is posted. Providers without stream() fall back to blocking complete().
- Rust client: new Net::AiStream variant + parse_ai branch; App.ai_stream
  map holds the in-progress text per agent; draw_chat renders it as a dim,
  italic preview bubble below history. Cleared on done and on agent leave.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-02 22:42:08 -07:00
..
__init__.py feat(agent): model-agnostic AI agent bridge (PoC) + pin lets-hack demo to main 2026-06-01 02:05:48 -07:00
__main__.py perf(ai): CPU-tuned local inference + qwen2.5-coder sandbox path 2026-06-02 22:37:59 -07:00
bridge.py feat(ai): stream agent replies token-by-token to the room 2026-06-02 22:42:08 -07:00
memory.py feat(ai): in-RAM semantic recall (RAG) for conversation context 2026-06-02 17:59:01 -07:00
profiles.py feat(ai): model profiles, capability discovery, and agentless /ai list|models 2026-06-01 15:25:07 -07:00
providers.py perf(ai): CPU-tuned local inference + qwen2.5-coder sandbox path 2026-06-02 22:37:59 -07:00