mirror of https://github.com/khodges42/glassMind.git synced 2026-06-14 18:18:36 +00:00
K. Hodges 18c39f3674 Embedding backend trait plus local deterministic embedding backend
2026-05-24 16:00:05 -07:00
16 KiB

Raw Blame History

# tasks.md

# Glassmind Tasks

## Project Rules

- Prefer small, shippable tasks.
- Every stage should leave the project runnable.
- Avoid premature abstraction.
- Favor inspectability over magic.
- Small application philosophy
- Markdown files are canonical.
- Database state must be rebuildable.
- Local-first is a hard requirement.
- No cloud dependency in core architecture.
- No enshittification.

---

# Phase 1 — Project Skeleton & Foundations

## [x] GM-001 — Initialize Rust workspace

### Goals
- Create Rust project
- Verify build pipeline
- Establish workspace structure

### Tasks
- Run `cargo init`
- Create `/src`

- Create `/examples`
- Create `/fixtures`
- Create `/scripts`
- Create initial `.gitignore`
- Add GPL
- Verify clean build

### Acceptance Criteria
- `cargo build` succeeds
- Repo structure exists
- Project compiles on clean machine

---

## [x] GM-002 — Add core dependencies

### Goals
Install foundational crates.

### Tasks
Add:
- `clap`
- `serde`
- `serde_json`
- `toml`
- `tracing`
- `tracing-subscriber`
- `anyhow`

### Acceptance Criteria
- Project builds
- Logging works
- Config parsing stub exists

---

## [x] GM-003 — Implement CLI skeleton

### Goals
Create top-level CLI interface.

### Tasks
Add commands:
- `init`
- `index`
- `search`
- `context`
- `serve`
- `stats`

### Acceptance Criteria
- `glassmind --help` works
- Subcommands render correctly
- Unknown commands fail cleanly

---

## [x] GM-004 — Create config loader

### Goals
Load user config from disk.

### Tasks
- Define `glassmind.toml`
- Create config structs
- Implement config parsing
- Add defaults
- Add validation
- Add config path resolution

### Acceptance Criteria
- Config loads successfully
- Missing config generates defaults
- Invalid config errors clearly

---

## [x] GM-005 — Implement logging setup

### Goals
Establish consistent logging.

### Tasks
- Configure tracing subscriber
- Add log levels
- Add debug mode
- Add structured logs
- Add startup logging

### Acceptance Criteria
- Logs visible in CLI
- Debug mode works
- Errors produce stack traces

---

# Phase 2 — Vault Discovery

## [x] GM-006 — Implement vault walker

### Goals
Recursively discover markdown files.

### Tasks
- Add `walkdir`
- Walk configured vault path
- Detect `.md` files
- Skip ignored directories
- Support nested folders
- Add file count metrics

### Acceptance Criteria
- Vault scan succeeds
- Ignores work correctly
- Correct markdown count displayed

---

## [x] GM-007 — Implement ignore handling

### Goals
Allow configurable ignore patterns.

### Tasks
Ignore:
- `.git`
- `.obsidian`
- `.trash`
- `.agent/cache`

Add configurable ignores.

### Acceptance Criteria
- Ignored folders skipped
- Configurable ignores work
- No accidental recursion

---

## [x] GM-008 — Add note metadata extraction

### Goals
Extract basic note metadata.

### Tasks
Extract:
- path
- filename
- title
- modified timestamp
- file size

### Acceptance Criteria
- Metadata visible in debug output
- Data stored internally

---

## [x] GM-009 — Add markdown parsing

### Goals
Parse markdown structure.

### Tasks
Add:
- heading extraction
- paragraph extraction
- code block detection
- list detection

Suggested crate:
- `pulldown-cmark`

### Acceptance Criteria
- Headings parsed correctly
- Parser handles malformed markdown gracefully

---

## [x] GM-010 — Extract wikilinks

### Goals
Detect Obsidian-style links.

### Tasks
Support:
- `[[note]]`
- `[[note|alias]]`
- `[[folder/note]]`

Store:
- source
- target
- alias

### Acceptance Criteria
- Links parsed correctly
- Links stored in memory

---

## [x] GM-011 — Extract tags

### Goals
Parse tags from notes.

### Tasks
Support:
- inline tags
- frontmatter tags

Normalize:
- lowercase
- trim whitespace

### Acceptance Criteria
- Tags extracted consistently
- Duplicate tags removed

---

# Phase 3 — Database Layer

## [x] GM-012 — Add SQLite integration

### Goals
Create local metadata database.

### Tasks
- Add SQLite crate
- Create DB initialization
- Create migrations
- Create schema bootstrap

### Acceptance Criteria
- DB initializes automatically
- Schema created successfully

---

## [x] GM-013 — Create notes table

### Goals
Store note metadata.

### Tasks
Create schema for:
- notes
- paths
- timestamps
- hashes

### Acceptance Criteria
- Notes persist correctly
- Duplicate handling works

---

## [x] GM-014 — Create chunks table

### Goals
Store retrieval chunks.

### Tasks
Store:
- note ID
- chunk content
- heading path
- line numbers
- token estimates

### Acceptance Criteria
- Chunks persist correctly
- Relationships resolve correctly

---

## [x] GM-015 — Add content hashing

### Goals
Detect changed notes efficiently.

### Tasks
- Add SHA256 hashing
- Hash note content
- Compare hashes on reindex
- Skip unchanged files

### Acceptance Criteria
- Incremental indexing works
- Unchanged files skipped

---

# Phase 4 — Chunking

## [x] GM-016 — Implement heading-based chunking

### Goals
Split notes into useful retrieval units.

### Tasks
- Split by heading
- Preserve heading hierarchy
- Preserve ordering
- Preserve note references

### Acceptance Criteria
- Chunks remain readable
- Context boundaries make sense

---

## [x] GM-017 — Add fallback chunk splitting

### Goals
Handle giant sections safely.

### Tasks
- Add max chunk size
- Add overlap windows
- Preserve sentence boundaries if possible

### Acceptance Criteria
- Large files chunk correctly
- No giant retrieval blobs

---

## [x] GM-018 — Estimate token counts

### Goals
Prepare for LLM context budgeting.

### Tasks
- Add rough token estimator
- Store token counts
- Expose in debug mode

### Acceptance Criteria
- Estimates reasonably accurate
- Context budgeting possible

---

# Phase 5 — Search

## [x] GM-019 — Implement SQLite FTS search

### Goals
Add keyword search.

### Tasks
- Enable FTS5
- Create search index
- Implement search query
- Add snippet extraction
- Add ranking

### Acceptance Criteria
- Search returns relevant results
- Results ranked correctly

---

## [x] GM-020 — Implement basic CLI search command

### Goals
Expose usable search interface.

### Tasks
- Add search formatting
- Show paths
- Show headings
- Show snippets
- Add JSON output option

### Acceptance Criteria
- `glassmind search` usable daily
- Results readable
- JSON output valid

---

```md id="5m9zsw"
## Embeddings

### [x] GM-021 — Create embedding backend trait

#### Goals
Abstract embedding providers behind a common interface.

#### Tasks
- Create `EmbeddingBackend` trait
- Define embedding request/response types
- Add async support if needed
- Add error handling
- Add provider config support

#### Acceptance Criteria
- Multiple backends can implement trait
- Search pipeline independent from provider implementation

---

### [x] GM-022 — Implement Ollama embedding backend

#### Goals
Generate embeddings locally using Ollama.

#### Tasks
- Add Ollama HTTP client
- Implement embedding requests
- Add configurable embedding model
- Add retry handling
- Add timeout handling

#### Acceptance Criteria
- Query embeddings generated successfully
- Chunk embeddings generated successfully
- Backend configurable through TOML

---

### [x] GM-023 — Add embedding generation pipeline

#### Goals
Generate embeddings during indexing.

#### Tasks
- Embed chunks during index phase
- Skip unchanged embeddings
- Batch embedding requests
- Add embedding queue abstraction
- Add progress reporting

#### Acceptance Criteria
- Vault indexing produces embeddings
- Reindex skips unchanged chunks

---

### [x] GM-024 — Integrate sqlite-vec

#### Goals
Store and search vectors locally.

#### Tasks
- Add sqlite-vec dependency
- Create vector schema
- Store chunk vectors
- Add nearest-neighbor search
- Validate vector dimensions

#### Acceptance Criteria
- Embeddings persist correctly
- Similarity search returns results

---

### [x] GM-025 — Implement semantic search

#### Goals
Search by meaning instead of keywords.

#### Tasks
- Embed query text
- Retrieve nearest vectors
- Rank results by similarity
- Return chunk metadata
- Add configurable result limits

#### Acceptance Criteria
- Semantically related notes retrieved
- Search quality noticeably useful

---

## Hybrid Retrieval

### [x] GM-026 — Create retrieval scoring model

#### Goals
Combine multiple ranking systems.

#### Tasks
Add weighted scoring for:
- semantic similarity
- keyword relevance
- recency
- tags
- wikilinks
- path/project affinity

#### Acceptance Criteria
- Final ranking combines all scoring sources
- Weights configurable

---

### [x] GM-027 — Add recency boosting

#### Goals
Favor recently active notes.

#### Tasks
- Define recency decay function
- Add configurable recency weights
- Support pinned notes
- Add debug scoring output

#### Acceptance Criteria
- Recent notes boosted appropriately
- Old notes still retrievable

---

### [x] GM-028 — Add wikilink graph weighting

#### Goals
Use note relationships during retrieval.

#### Tasks
- Calculate link adjacency
- Boost linked neighbors
- Support bidirectional relationships
- Add graph traversal depth limit

#### Acceptance Criteria
- Related linked notes boosted
- Retrieval continuity improved

---

### [x] GM-029 — Add retrieval debug mode

#### Goals
Make ranking explainable.

#### Tasks
Display:
- semantic score
- keyword score
- recency score
- tag score
- link score
- final score

#### Acceptance Criteria
- Users can inspect ranking behavior
- Retrieval tuning becomes practical

---

## Context Bundles

### [x] GM-030 — Create context bundle builder

#### Goals
Generate LLM-ready retrieval payloads.

#### Tasks
- Define context bundle structure
- Deduplicate overlapping chunks
- Group by note
- Preserve ordering
- Add metadata blocks

#### Acceptance Criteria
- Context bundles readable
- Context bundles useful for LLM prompts

---

### [x] GM-031 — Add token budgeting

#### Goals
Prevent oversized context payloads.

#### Tasks
- Track token estimates
- Add configurable token budget
- Trim low-priority chunks
- Preserve high-score chunks first

#### Acceptance Criteria
- Context stays within configured budget
- Retrieval quality remains useful

---

### [x] GM-032 — Add context summarization hooks

#### Goals
Prepare for future summarization support.

#### Tasks
- Define summarizer interface
- Add optional summarization stage
- Add summary metadata fields
- Support disabling summarization

#### Acceptance Criteria
- Pipeline supports optional summarization
- Core retrieval still functions without summaries

---

### [x] GM-033 — Implement `glassmind context`

#### Goals
Expose high-level retrieval workflow.

#### Tasks
- Add CLI command
- Format markdown output
- Add JSON mode
- Include sources
- Include retrieval metadata

#### Acceptance Criteria
- Command usable directly by humans
- Output usable by agents

---

## HTTP API

### [x] GM-034 — Add Axum server skeleton

#### Goals
Expose Glassmind over HTTP.

#### Tasks
- Add Axum dependency
- Create server bootstrap
- Add config support
- Add graceful shutdown
- Bind localhost by default

#### Acceptance Criteria
- Server starts successfully
- Local requests succeed

---

### [x] GM-035 — Implement `/search` endpoint

#### Goals
Expose search over HTTP.

#### Tasks
- Define request schema
- Define response schema
- Add pagination
- Add JSON serialization
- Add validation

#### Acceptance Criteria
- Endpoint returns valid search results
- Errors handled cleanly

---

### [x] GM-036 — Implement `/context` endpoint

#### Goals
Expose context retrieval API.

#### Tasks
- Add context request schema
- Support token budget parameter
- Return structured context bundles
- Include source metadata

#### Acceptance Criteria
- API returns usable context payloads
- Response structure documented

---

### [x] GM-037 — Implement `/notes/{id}` endpoint

#### Goals
Allow direct note retrieval.

#### Tasks
- Fetch note metadata
- Fetch chunk data
- Return markdown content
- Add error handling

#### Acceptance Criteria
- Notes retrievable by ID
- Missing notes handled correctly

---

### [x] GM-038 — Add `/health` and `/stats`

#### Goals
Support monitoring/debugging.

#### Tasks
- Add health endpoint
- Add DB stats
- Add vault metrics
- Add embedding counts

#### Acceptance Criteria
- Health checks usable
- Stats endpoint informative

---

## MCP Support

### [x] GM-039 — Create MCP server skeleton

#### Goals
Allow AI tools to call Glassmind directly.

#### Tasks
- Add MCP transport support
- Define tool registry
- Implement request dispatch
- Add structured tool responses

#### Acceptance Criteria
- MCP server starts successfully
- Tool calls function correctly

---

### [x] GM-040 — Implement `glassmind_search` MCP tool

#### Goals
Expose search through MCP.

#### Tasks
- Define tool schema
- Add search execution
- Return structured results
- Include source paths

#### Acceptance Criteria
- MCP clients can search successfully

---

### [x] GM-041 — Implement `glassmind_context` MCP tool

#### Goals
Expose context bundles through MCP.

#### Tasks
- Add context generation
- Add token budgeting
- Return structured context payloads

#### Acceptance Criteria
- MCP clients receive usable context bundles

---

### [x] GM-042 — Implement `glassmind_read` MCP tool

#### Goals
Allow agents to inspect notes directly.

#### Tasks
- Fetch note content
- Support chunk-specific reads
- Add note metadata
- Add error handling

#### Acceptance Criteria
- Agents can retrieve note contents reliably

---

### [x] GM-043 — Add MCP integration examples

#### Goals
Document real-world integration.

#### Tasks
- Add Claude Desktop example
- Add Codex example
- Add local agent example
- Add config examples

#### Acceptance Criteria
- Users can integrate Glassmind without guesswork

---

## Incremental Indexing

### [x] GM-044 — Add file change detection

#### Goals
Avoid full vault reindexing.

#### Tasks
- Compare content hashes
- Detect added files
- Detect deleted files
- Detect modified files

#### Acceptance Criteria
- Incremental indexing functions correctly
- Unchanged notes skipped

---

### [x] GM-045 — Add filesystem watch mode

#### Goals
Support live vault updates.

#### Tasks
- Add filesystem watcher
- Debounce rapid changes
- Trigger partial reindex
- Add watch logging

#### Acceptance Criteria
- File edits reflected automatically
- No runaway indexing loops

---

### [x] GM-046 — Add partial embedding regeneration

#### Goals
Avoid recomputing unchanged vectors.

#### Tasks
- Detect changed chunks
- Recompute only dirty embeddings
- Preserve existing vectors
- Handle deleted chunks

#### Acceptance Criteria
- Reindex significantly faster after small edits

---

## Agent Workspace

### [x] GM-047 — Create `.agent/` workspace structure

#### Goals
Establish safe agent-owned storage.

#### Tasks
Create:
- `.agent/memories`
- `.agent/tasks`
- `.agent/summaries`
- `.agent/logs`
- `.agent/cache`

#### Acceptance Criteria
- Workspace generated automatically
- Structure documented

---

### [x] GM-048 — Add memory capture commands

#### Goals
Allow structured memory persistence.

#### Tasks
Add:
- `capture-memory`
- `capture-task`
- `capture-decision`

Store entries as markdown.

#### Acceptance Criteria
- Commands append correctly
- Entries index correctly

---

### [x] GM-049 — Index `.agent/` content

#### Goals
Allow generated memory retrieval.

#### Tasks
- Include `.agent/` in indexing pipeline
- Tag generated content
- Preserve provenance metadata

#### Acceptance Criteria
- Agent-generated notes searchable
- Provenance visible

---

### [x] GM-050 — Add retrieval audit logging

#### Goals
Track retrieval behavior for debugging.

#### Tasks
Log:
- query
- retrieved chunks
- retrieval scores
- timestamp
- requesting client

#### Acceptance Criteria
- Retrievals traceable
- Logs useful for tuning/debugging
What's Next

Retrieval Quality

Evaluation datasets
Ranking tuning
Query debugging
Explainable scoring
Performance

Parallel indexing
Cached embeddings
Batch embedding generation
Large vault optimization
Future Ideas

Git history awareness
Temporal retrieval
Canvas parsing
Code-aware chunking
Multi-vault support
Graph exploration
Retrieval visualization
Vault analytics
Semantic diffing
“What changed?” context reports
Local reranking models
Session continuity memory
Agent-safe write proposals
16 KiB Raw Blame History

What's Next

Retrieval Quality

Performance

Future Ideas

16 KiB

Raw Blame History