mirror of
https://github.com/khodges42/glassMind.git
synced 2026-06-14 18:18:36 +00:00
16 KiB
16 KiB
# tasks.md
# Glassmind Tasks
## Project Rules
- Prefer small, shippable tasks.
- Every stage should leave the project runnable.
- Avoid premature abstraction.
- Favor inspectability over magic.
- Small application philosophy
- Markdown files are canonical.
- Database state must be rebuildable.
- Local-first is a hard requirement.
- No cloud dependency in core architecture.
- No enshittification.
---
# Phase 1 — Project Skeleton & Foundations
## [x] GM-001 — Initialize Rust workspace
### Goals
- Create Rust project
- Verify build pipeline
- Establish workspace structure
### Tasks
- Run `cargo init`
- Create `/src`
- Create `/examples`
- Create `/fixtures`
- Create `/scripts`
- Create initial `.gitignore`
- Add GPL
- Verify clean build
### Acceptance Criteria
- `cargo build` succeeds
- Repo structure exists
- Project compiles on clean machine
---
## [x] GM-002 — Add core dependencies
### Goals
Install foundational crates.
### Tasks
Add:
- `clap`
- `serde`
- `serde_json`
- `toml`
- `tracing`
- `tracing-subscriber`
- `anyhow`
### Acceptance Criteria
- Project builds
- Logging works
- Config parsing stub exists
---
## [x] GM-003 — Implement CLI skeleton
### Goals
Create top-level CLI interface.
### Tasks
Add commands:
- `init`
- `index`
- `search`
- `context`
- `serve`
- `stats`
### Acceptance Criteria
- `glassmind --help` works
- Subcommands render correctly
- Unknown commands fail cleanly
---
## [x] GM-004 — Create config loader
### Goals
Load user config from disk.
### Tasks
- Define `glassmind.toml`
- Create config structs
- Implement config parsing
- Add defaults
- Add validation
- Add config path resolution
### Acceptance Criteria
- Config loads successfully
- Missing config generates defaults
- Invalid config errors clearly
---
## [x] GM-005 — Implement logging setup
### Goals
Establish consistent logging.
### Tasks
- Configure tracing subscriber
- Add log levels
- Add debug mode
- Add structured logs
- Add startup logging
### Acceptance Criteria
- Logs visible in CLI
- Debug mode works
- Errors produce stack traces
---
# Phase 2 — Vault Discovery
## [x] GM-006 — Implement vault walker
### Goals
Recursively discover markdown files.
### Tasks
- Add `walkdir`
- Walk configured vault path
- Detect `.md` files
- Skip ignored directories
- Support nested folders
- Add file count metrics
### Acceptance Criteria
- Vault scan succeeds
- Ignores work correctly
- Correct markdown count displayed
---
## [x] GM-007 — Implement ignore handling
### Goals
Allow configurable ignore patterns.
### Tasks
Ignore:
- `.git`
- `.obsidian`
- `.trash`
- `.agent/cache`
Add configurable ignores.
### Acceptance Criteria
- Ignored folders skipped
- Configurable ignores work
- No accidental recursion
---
## [x] GM-008 — Add note metadata extraction
### Goals
Extract basic note metadata.
### Tasks
Extract:
- path
- filename
- title
- modified timestamp
- file size
### Acceptance Criteria
- Metadata visible in debug output
- Data stored internally
---
## [x] GM-009 — Add markdown parsing
### Goals
Parse markdown structure.
### Tasks
Add:
- heading extraction
- paragraph extraction
- code block detection
- list detection
Suggested crate:
- `pulldown-cmark`
### Acceptance Criteria
- Headings parsed correctly
- Parser handles malformed markdown gracefully
---
## [x] GM-010 — Extract wikilinks
### Goals
Detect Obsidian-style links.
### Tasks
Support:
- `[[note]]`
- `[[note|alias]]`
- `[[folder/note]]`
Store:
- source
- target
- alias
### Acceptance Criteria
- Links parsed correctly
- Links stored in memory
---
## [x] GM-011 — Extract tags
### Goals
Parse tags from notes.
### Tasks
Support:
- inline tags
- frontmatter tags
Normalize:
- lowercase
- trim whitespace
### Acceptance Criteria
- Tags extracted consistently
- Duplicate tags removed
---
# Phase 3 — Database Layer
## [x] GM-012 — Add SQLite integration
### Goals
Create local metadata database.
### Tasks
- Add SQLite crate
- Create DB initialization
- Create migrations
- Create schema bootstrap
### Acceptance Criteria
- DB initializes automatically
- Schema created successfully
---
## [x] GM-013 — Create notes table
### Goals
Store note metadata.
### Tasks
Create schema for:
- notes
- paths
- timestamps
- hashes
### Acceptance Criteria
- Notes persist correctly
- Duplicate handling works
---
## [x] GM-014 — Create chunks table
### Goals
Store retrieval chunks.
### Tasks
Store:
- note ID
- chunk content
- heading path
- line numbers
- token estimates
### Acceptance Criteria
- Chunks persist correctly
- Relationships resolve correctly
---
## [x] GM-015 — Add content hashing
### Goals
Detect changed notes efficiently.
### Tasks
- Add SHA256 hashing
- Hash note content
- Compare hashes on reindex
- Skip unchanged files
### Acceptance Criteria
- Incremental indexing works
- Unchanged files skipped
---
# Phase 4 — Chunking
## [x] GM-016 — Implement heading-based chunking
### Goals
Split notes into useful retrieval units.
### Tasks
- Split by heading
- Preserve heading hierarchy
- Preserve ordering
- Preserve note references
### Acceptance Criteria
- Chunks remain readable
- Context boundaries make sense
---
## [x] GM-017 — Add fallback chunk splitting
### Goals
Handle giant sections safely.
### Tasks
- Add max chunk size
- Add overlap windows
- Preserve sentence boundaries if possible
### Acceptance Criteria
- Large files chunk correctly
- No giant retrieval blobs
---
## [x] GM-018 — Estimate token counts
### Goals
Prepare for LLM context budgeting.
### Tasks
- Add rough token estimator
- Store token counts
- Expose in debug mode
### Acceptance Criteria
- Estimates reasonably accurate
- Context budgeting possible
---
# Phase 5 — Search
## [x] GM-019 — Implement SQLite FTS search
### Goals
Add keyword search.
### Tasks
- Enable FTS5
- Create search index
- Implement search query
- Add snippet extraction
- Add ranking
### Acceptance Criteria
- Search returns relevant results
- Results ranked correctly
---
## [x] GM-020 — Implement basic CLI search command
### Goals
Expose usable search interface.
### Tasks
- Add search formatting
- Show paths
- Show headings
- Show snippets
- Add JSON output option
### Acceptance Criteria
- `glassmind search` usable daily
- Results readable
- JSON output valid
---
```md id="5m9zsw"
## Embeddings
### [ ] GM-021 — Create embedding backend trait
#### Goals
Abstract embedding providers behind a common interface.
#### Tasks
- Create `EmbeddingBackend` trait
- Define embedding request/response types
- Add async support if needed
- Add error handling
- Add provider config support
#### Acceptance Criteria
- Multiple backends can implement trait
- Search pipeline independent from provider implementation
---
### [ ] GM-022 — Implement Ollama embedding backend
#### Goals
Generate embeddings locally using Ollama.
#### Tasks
- Add Ollama HTTP client
- Implement embedding requests
- Add configurable embedding model
- Add retry handling
- Add timeout handling
#### Acceptance Criteria
- Query embeddings generated successfully
- Chunk embeddings generated successfully
- Backend configurable through TOML
---
### [ ] GM-023 — Add embedding generation pipeline
#### Goals
Generate embeddings during indexing.
#### Tasks
- Embed chunks during index phase
- Skip unchanged embeddings
- Batch embedding requests
- Add embedding queue abstraction
- Add progress reporting
#### Acceptance Criteria
- Vault indexing produces embeddings
- Reindex skips unchanged chunks
---
### [ ] GM-024 — Integrate sqlite-vec
#### Goals
Store and search vectors locally.
#### Tasks
- Add sqlite-vec dependency
- Create vector schema
- Store chunk vectors
- Add nearest-neighbor search
- Validate vector dimensions
#### Acceptance Criteria
- Embeddings persist correctly
- Similarity search returns results
---
### [ ] GM-025 — Implement semantic search
#### Goals
Search by meaning instead of keywords.
#### Tasks
- Embed query text
- Retrieve nearest vectors
- Rank results by similarity
- Return chunk metadata
- Add configurable result limits
#### Acceptance Criteria
- Semantically related notes retrieved
- Search quality noticeably useful
---
## Hybrid Retrieval
### [ ] GM-026 — Create retrieval scoring model
#### Goals
Combine multiple ranking systems.
#### Tasks
Add weighted scoring for:
- semantic similarity
- keyword relevance
- recency
- tags
- wikilinks
- path/project affinity
#### Acceptance Criteria
- Final ranking combines all scoring sources
- Weights configurable
---
### [ ] GM-027 — Add recency boosting
#### Goals
Favor recently active notes.
#### Tasks
- Define recency decay function
- Add configurable recency weights
- Support pinned notes
- Add debug scoring output
#### Acceptance Criteria
- Recent notes boosted appropriately
- Old notes still retrievable
---
### [ ] GM-028 — Add wikilink graph weighting
#### Goals
Use note relationships during retrieval.
#### Tasks
- Calculate link adjacency
- Boost linked neighbors
- Support bidirectional relationships
- Add graph traversal depth limit
#### Acceptance Criteria
- Related linked notes boosted
- Retrieval continuity improved
---
### [ ] GM-029 — Add retrieval debug mode
#### Goals
Make ranking explainable.
#### Tasks
Display:
- semantic score
- keyword score
- recency score
- tag score
- link score
- final score
#### Acceptance Criteria
- Users can inspect ranking behavior
- Retrieval tuning becomes practical
---
## Context Bundles
### [ ] GM-030 — Create context bundle builder
#### Goals
Generate LLM-ready retrieval payloads.
#### Tasks
- Define context bundle structure
- Deduplicate overlapping chunks
- Group by note
- Preserve ordering
- Add metadata blocks
#### Acceptance Criteria
- Context bundles readable
- Context bundles useful for LLM prompts
---
### [ ] GM-031 — Add token budgeting
#### Goals
Prevent oversized context payloads.
#### Tasks
- Track token estimates
- Add configurable token budget
- Trim low-priority chunks
- Preserve high-score chunks first
#### Acceptance Criteria
- Context stays within configured budget
- Retrieval quality remains useful
---
### [ ] GM-032 — Add context summarization hooks
#### Goals
Prepare for future summarization support.
#### Tasks
- Define summarizer interface
- Add optional summarization stage
- Add summary metadata fields
- Support disabling summarization
#### Acceptance Criteria
- Pipeline supports optional summarization
- Core retrieval still functions without summaries
---
### [ ] GM-033 — Implement `glassmind context`
#### Goals
Expose high-level retrieval workflow.
#### Tasks
- Add CLI command
- Format markdown output
- Add JSON mode
- Include sources
- Include retrieval metadata
#### Acceptance Criteria
- Command usable directly by humans
- Output usable by agents
---
## HTTP API
### [ ] GM-034 — Add Axum server skeleton
#### Goals
Expose Glassmind over HTTP.
#### Tasks
- Add Axum dependency
- Create server bootstrap
- Add config support
- Add graceful shutdown
- Bind localhost by default
#### Acceptance Criteria
- Server starts successfully
- Local requests succeed
---
### [ ] GM-035 — Implement `/search` endpoint
#### Goals
Expose search over HTTP.
#### Tasks
- Define request schema
- Define response schema
- Add pagination
- Add JSON serialization
- Add validation
#### Acceptance Criteria
- Endpoint returns valid search results
- Errors handled cleanly
---
### [ ] GM-036 — Implement `/context` endpoint
#### Goals
Expose context retrieval API.
#### Tasks
- Add context request schema
- Support token budget parameter
- Return structured context bundles
- Include source metadata
#### Acceptance Criteria
- API returns usable context payloads
- Response structure documented
---
### [ ] GM-037 — Implement `/notes/{id}` endpoint
#### Goals
Allow direct note retrieval.
#### Tasks
- Fetch note metadata
- Fetch chunk data
- Return markdown content
- Add error handling
#### Acceptance Criteria
- Notes retrievable by ID
- Missing notes handled correctly
---
### [ ] GM-038 — Add `/health` and `/stats`
#### Goals
Support monitoring/debugging.
#### Tasks
- Add health endpoint
- Add DB stats
- Add vault metrics
- Add embedding counts
#### Acceptance Criteria
- Health checks usable
- Stats endpoint informative
---
## MCP Support
### [ ] GM-039 — Create MCP server skeleton
#### Goals
Allow AI tools to call Glassmind directly.
#### Tasks
- Add MCP transport support
- Define tool registry
- Implement request dispatch
- Add structured tool responses
#### Acceptance Criteria
- MCP server starts successfully
- Tool calls function correctly
---
### [ ] GM-040 — Implement `glassmind_search` MCP tool
#### Goals
Expose search through MCP.
#### Tasks
- Define tool schema
- Add search execution
- Return structured results
- Include source paths
#### Acceptance Criteria
- MCP clients can search successfully
---
### [ ] GM-041 — Implement `glassmind_context` MCP tool
#### Goals
Expose context bundles through MCP.
#### Tasks
- Add context generation
- Add token budgeting
- Return structured context payloads
#### Acceptance Criteria
- MCP clients receive usable context bundles
---
### [ ] GM-042 — Implement `glassmind_read` MCP tool
#### Goals
Allow agents to inspect notes directly.
#### Tasks
- Fetch note content
- Support chunk-specific reads
- Add note metadata
- Add error handling
#### Acceptance Criteria
- Agents can retrieve note contents reliably
---
### [ ] GM-043 — Add MCP integration examples
#### Goals
Document real-world integration.
#### Tasks
- Add Claude Desktop example
- Add Codex example
- Add local agent example
- Add config examples
#### Acceptance Criteria
- Users can integrate Glassmind without guesswork
---
## Incremental Indexing
### [ ] GM-044 — Add file change detection
#### Goals
Avoid full vault reindexing.
#### Tasks
- Compare content hashes
- Detect added files
- Detect deleted files
- Detect modified files
#### Acceptance Criteria
- Incremental indexing functions correctly
- Unchanged notes skipped
---
### [ ] GM-045 — Add filesystem watch mode
#### Goals
Support live vault updates.
#### Tasks
- Add filesystem watcher
- Debounce rapid changes
- Trigger partial reindex
- Add watch logging
#### Acceptance Criteria
- File edits reflected automatically
- No runaway indexing loops
---
### [ ] GM-046 — Add partial embedding regeneration
#### Goals
Avoid recomputing unchanged vectors.
#### Tasks
- Detect changed chunks
- Recompute only dirty embeddings
- Preserve existing vectors
- Handle deleted chunks
#### Acceptance Criteria
- Reindex significantly faster after small edits
---
## Agent Workspace
### [ ] GM-047 — Create `.agent/` workspace structure
#### Goals
Establish safe agent-owned storage.
#### Tasks
Create:
- `.agent/memories`
- `.agent/tasks`
- `.agent/summaries`
- `.agent/logs`
- `.agent/cache`
#### Acceptance Criteria
- Workspace generated automatically
- Structure documented
---
### [ ] GM-048 — Add memory capture commands
#### Goals
Allow structured memory persistence.
#### Tasks
Add:
- `capture-memory`
- `capture-task`
- `capture-decision`
Store entries as markdown.
#### Acceptance Criteria
- Commands append correctly
- Entries index correctly
---
### [ ] GM-049 — Index `.agent/` content
#### Goals
Allow generated memory retrieval.
#### Tasks
- Include `.agent/` in indexing pipeline
- Tag generated content
- Preserve provenance metadata
#### Acceptance Criteria
- Agent-generated notes searchable
- Provenance visible
---
### [ ] GM-050 — Add retrieval audit logging
#### Goals
Track retrieval behavior for debugging.
#### Tasks
Log:
- query
- retrieved chunks
- retrieval scores
- timestamp
- requesting client
#### Acceptance Criteria
- Retrievals traceable
- Logs useful for tuning/debugging
What's Next
Retrieval Quality
- Evaluation datasets
- Ranking tuning
- Query debugging
- Explainable scoring
Performance
- Parallel indexing
- Cached embeddings
- Batch embedding generation
- Large vault optimization
Future Ideas
- Git history awareness
- Temporal retrieval
- Canvas parsing
- Code-aware chunking
- Multi-vault support
- Graph exploration
- Retrieval visualization
- Vault analytics
- Semantic diffing
- “What changed?” context reports
- Local reranking models
- Session continuity memory
- Agent-safe write proposals