mirror of https://github.com/khodges42/glassMind.git synced 2026-06-14 18:18:36 +00:00

K. Hodges 966b7e5757 Explainer and readme update. we are working.

2026-05-24 16:05:24 -07:00

12 KiB

Raw Blame History

Glassmind Technical Explainer

This document explains how Glassmind works for someone who is comfortable with software development, but new to RAG, embeddings, vector search, Obsidian-style markdown indexing, or MCP-style tool surfaces.

Glassmind is a local retrieval layer for a directory of markdown files. It treats your notes as the source of truth, builds a rebuildable SQLite cache from them, and then uses that cache to answer search and context requests.

The short version:

markdown files
  -> scanner
  -> markdown parser
  -> chunker
  -> SQLite cache
  -> keyword index
  -> embedding cache
  -> hybrid retriever
  -> search results / context bundle / HTTP / MCP-style tools

What Problem Glassmind Solves

Large language models do not automatically know what is in your local notes. Even if an AI tool can read files, dumping an entire vault into a prompt is slow, expensive, noisy, and usually impossible because context windows are limited.

RAG means Retrieval-Augmented Generation. The idea is simple:

user asks something
system retrieves relevant source material
LLM receives only that relevant context
LLM answers with better grounding

Glassmind is the retrieval part. It does not try to be the chatbot. It finds the pieces of your markdown vault that matter.

Core Design Rule

Markdown is canonical.

That means:

your .md files are the real data
SQLite is only a cache
embeddings are only derived data
deleting the database should not delete knowledge
indexing should be repeatable

By default, Glassmind only writes generated project data under .agent/. Normal notes are read, not edited.

Vault Scanning

The scanner walks the configured vault path and finds markdown files.

By default it skips folders such as:

.git
.obsidian
.trash
.agent/cache

It intentionally does not skip all of .agent/, because generated memories, decisions, and task notes should be searchable. It only skips .agent/cache, which is where the SQLite database lives.

For each markdown file, Glassmind records metadata:

relative path
filename
title
modified timestamp
file size
SHA256 content hash

The hash is important. It lets Glassmind tell whether a note changed since the last index run.

Markdown Parsing

After reading a file, Glassmind parses useful markdown structure.

It extracts:

headings
paragraphs
code blocks
list items
Obsidian-style wikilinks
tags

Supported wikilinks include:

[[note]]
[[note|alias]]
[[folder/note]]

Tags can come from inline markdown:

This is about #rust and #local-first tooling.

Or frontmatter:

---
tags: [rust, retrieval, notes]
---

Tags are normalized to lowercase and deduplicated.

Chunking

Search works better on smaller pieces of text than whole files. Those pieces are called chunks.

Glassmind currently chunks by heading section first. For example:

# Project

Intro text.

## Design

Design text.

## Tasks

Task text.

This becomes separate retrieval chunks for the top-level section and child sections. Each chunk keeps its heading path, so results can point back to where they came from:

Project > Design
Project > Tasks

If a section is too large, Glassmind splits it into smaller overlapping chunks. Overlap helps avoid cutting useful context exactly at a boundary.

Each chunk stores:

note id
chunk index
heading path
content
chunk type
start line
end line
rough token estimate
chunk content hash

The token estimate is currently simple word counting. It is not perfect, but it is good enough for budgeting context bundles.

SQLite Cache

The local database lives here by default:

.agent/cache/glassmind.sqlite3

It is ignored by Git and can be rebuilt from markdown.

The main tables are:

notes: one row per markdown note
chunks: retrieval chunks
tags: normalized tag names
note_tags: many-to-many join table
links: wikilinks from notes
embeddings: vector cache for chunks
retrieval_audit: search history for debugging retrieval behavior
memory_events: generated memory records
migrations: schema bootstrap marker

On index, Glassmind compares the current file hash with the hash stored in notes.

If the hash matches and the index version matches, the note is skipped.

If the note changed, Glassmind rewrites its child rows:

old chunks
old FTS rows
old embeddings
old tags mapping
old links

Then it inserts the fresh metadata.

If a file was deleted from the vault, the indexer removes that note and its derived rows from the cache.

Keyword Search With FTS

SQLite includes a full-text search engine called FTS5. Glassmind creates an FTS table for chunk content.

When chunks are written, matching FTS rows are written too.

A keyword search runs roughly like this:

SELECT chunk metadata, snippet, rank
FROM chunks_fts
JOIN chunks
JOIN notes
WHERE chunks_fts MATCH query
ORDER BY bm25 rank

FTS gives Glassmind:

fast local keyword search
ranked results
snippets with matched terms highlighted

This is the most reliable baseline search mode because it does not require a model.

Embeddings

Embeddings are numeric representations of text meaning.

Conceptually:

"local memory for agents"
  -> [0.12, -0.04, 0.77, ...]

Texts with similar meaning should produce vectors that are close to each other.

Glassmind has an EmbeddingBackend trait:

text in
vector out

Right now there is a deterministic local embedding backend. It is not a real language model embedding, but it lets the full pipeline work locally and predictably while the storage and retrieval flow stabilizes.

There is also an Ollama-shaped backend stub. The code has the right boundary for an Ollama implementation, but the current version does not call Ollama over HTTP yet.

Embeddings are stored in SQLite as JSON arrays in the embeddings table.

This is not the final high-performance vector storage design. The intended future path is native sqlite-vec. The current implementation keeps everything runnable with plain SQLite while preserving the architecture.

Semantic Search

Semantic search compares the query embedding to chunk embeddings.

The comparison uses cosine similarity:

1.0  = very similar
0.0  = unrelated
-1.0 = opposite direction

In practice, Glassmind:

embeds the query
loads candidate chunks
compares query vector to chunk vectors
assigns a semantic score

The current semantic path is useful as plumbing and scoring infrastructure. Search quality will improve when the Ollama or sqlite-vec pieces become real model-backed vector search.

Hybrid Retrieval

Pure keyword search is brittle. Pure semantic search can be fuzzy or surprising.

Glassmind combines multiple scoring signals:

keyword score
semantic score
recency score
tag score
wikilink score

The config has weights:

[search]
semantic_weight = 0.55
keyword_weight = 0.25
recency_weight = 0.10
link_weight = 0.05
tag_weight = 0.05

The final score is a weighted blend.

You can inspect the pieces with:

cargo run -- search "local memory" --debug-scores

That makes retrieval behavior less magical. If a result is weird, you can see whether it came from keyword matching, semantic similarity, recency, tags, or links.

Context Bundles

Search results are useful for humans, but agents usually need a compact context packet.

That is what glassmind context builds.

Example:

cargo run -- context "continue glassmind" --budget 4000

The context builder:

runs retrieval
takes the highest-scoring chunks
respects the token budget
outputs markdown by default
includes source paths

The result is meant to be pasted into an LLM prompt or returned to an agent.

There is also a summarizer hook. It is disabled right now, but the interface exists so local summarization can be added later without changing the bundle format.

Agent Workspace

Glassmind owns .agent/.

The current structure is:

.agent/
  memories/
  summaries/
  tasks/
  decisions/
  logs/
  cache/

Capture commands append markdown into this workspace:

cargo run -- capture memory --project Glassmind --text "SQLite is rebuildable cache."
cargo run -- capture task --project Glassmind --text "Wire real Ollama embeddings."
cargo run -- capture decision --project Glassmind --text "Markdown remains canonical."

These generated files are indexed like normal markdown. That gives agents a place to write memory without touching user-owned notes.

HTTP API

glassmind serve starts a small localhost HTTP server.

Default bind:

127.0.0.1:7331

Endpoints:

GET /health
GET /stats
POST /search
POST /context
GET /notes/{path}

Example:

curl http://127.0.0.1:7331/health

Search request:

{
  "query": "local memory",
  "limit": 5
}

Context request:

{
  "query": "continue glassmind",
  "limit": 8,
  "budget": 6000
}

The current server is intentionally simple and uses the Rust standard library. A later version can swap this for Axum without changing the core indexing and retrieval flow.

MCP-Style Tool Commands

MCP means Model Context Protocol. It is a way for AI tools to call external tools.

Glassmind currently has an MCP-style command surface:

cargo run -- mcp tools
cargo run -- mcp search "local memory"
cargo run -- mcp context "continue glassmind"
cargo run -- mcp read "README.md"

This is not a full MCP transport yet. It is the command and response shape that a real MCP server can reuse later.

Watch Mode

There is a simple polling watch mode:

cargo run -- index --watch

It reindexes every five seconds.

This is intentionally plain. A real filesystem watcher can replace it later, but the current loop proves the live indexing behavior without another moving part.

Retrieval Audit Logging

Hybrid searches write audit rows into SQLite.

The audit log stores:

query
returned paths
timestamp
client label

This is for tuning. Retrieval systems are hard to improve if you cannot inspect what they returned and why.

Typical Local Workflow

For this repo:

cargo run -- index --embeddings
cargo run -- search "glassmind local memory" --debug-scores
cargo run -- context "continue glassmind" --budget 3000

For a personal Obsidian vault:

cargo run -- --vault "E:\notes\Brain" index --embeddings
cargo run -- --vault "E:\notes\Brain" search "project ideas" --debug-scores
cargo run -- --vault "E:\notes\Brain" context "what was I thinking about local agents?"

If the path has spaces, keep the quotes.

What Is Real Now

The working spine is:

scan
parse
chunk
hash
cache
FTS search
embedding cache
hybrid scoring
context bundles
HTTP surface
MCP-style commands
agent memory capture
audit logging

That is enough to test the product shape end to end.

What Is Still Placeholder Or Lightweight

Some pieces are intentionally MVP-level:

the Ollama backend has the right interface but does not call Ollama yet
vector storage is SQLite JSON, not native sqlite-vec
semantic search is brute-force over candidate chunks
HTTP uses a tiny standard-library server, not Axum
MCP is command-shaped, not a full MCP protocol server
watch mode is polling, not filesystem events

These are implementation swaps, not architecture rewrites.

The main architecture is already pointing in the right direction:

markdown source of truth
rebuildable local cache
inspectable retrieval
agent-safe writes
human-readable output

12 KiB Raw Blame History