Architecture Overview¶
everything lives in one sqlite file. no external services.
module map¶
engram/
├── store.py # sqlite schema, CRUD, FTS5, entity graph, ANN lifecycle
├── ann_index.py # HNSW approximate nearest neighbor (hnswlib)
├── embeddings.py # multi-backend: mlx, sentence-transformers, voyage, openai, gemini
├── retrieval.py # 8-stage hybrid pipeline
├── extractor.py # LLM fact extraction + hypothetical query generation
├── entities.py # regex entity extraction, relationship graph
├── surprise.py # k-NN novelty scoring at write time
├── deep_retrieval.py # learned MLP reranker
├── skill_select.py # task-aware skill selection gate
├── lifecycle.py # retention regularization, 9-factor importance, promotion
├── consolidator.py # dream cycle (7 steps)
├── codebase.py # project scanner → codebase layer
├── conversations.py # conversation ingest + classification
├── dedup.py # semantic deduplication
├── layers.py # L0-L3 graduated context
├── compress.py # token-budget compression
├── formats.py # parsers: markdown, JSON, PDF, slack
├── llm.py # claude CLI + mlx backend
├── evolution.py # memory enrichment, evolution, CRUD, trust, canonicalization
├── drift.py # memory drift detection + auto-fix
├── patterns.py # procedural pattern extraction
├── quantize.py # lifecycle embedding compression (FRQAD)
├── communities.py # label propagation community detection
├── hopfield.py # Hopfield associative retrieval
├── mcp_server.py # 63-tool MCP server (stdio + SSE)
├── cli.py # 15 CLI commands
├── config.py # yaml config with env overrides, auto-dim
└── web/
├── app.py # FastAPI with auth, model warmup
├── routes.py # 57 REST endpoints
└── templates/
└── index.html # single-page dashboard
data flow¶
write path¶
content → canonicalize → enrich (keywords+tags+summary) → embed
→ surprise gate (k-NN novelty check)
→ CRUD classification (ADD/UPDATE/NOOP)
→ memory evolution (update neighbors if context changed)
→ save to SQLite + update FTS5 + add to ANN index
→ extract entities + build relationships
→ compute importance (9-factor)
read path¶
query → intent classification → 4 parallel channels
→ RRF fusion → temporal boost → cross-encoder rerank
→ deep MLP rerank → noise + threshold gate
→ record access → return results
lifecycle¶
dream cycle (consolidate):
forgetting curve → cluster + merge → peer cards
→ cross-domain bridges → belief probing
→ drift detection → archive old + prune logs
storage¶
one SQLite database with WAL mode:
memories— content, embedding (BLOB), importance, layer, timestampsmemories_fts— FTS5 virtual table for BM25entities— canonical name, aliases, type, metadataentity_mentions— memory ↔ entity linksrelationships— entity ↔ entity with type, strength, temporal validityaccess_log— every recall recorded for reranker trainingevents— all reads/writes for the web dashboarddiary_entries— session notesimportance_history— importance score over timehypothetical_queries— generated questions per memory (docTTTTTquery)ingest_log— file hash tracking for dedup
HNSW index persists separately at ~/.local/share/engram/hnsw.index.
design decisions¶
- SQLite over Postgres — single file, no ops, WAL handles concurrent readers. good enough for 1M+ memories
- hnswlib over FAISS — lighter, pip-installable, cosine space native, simpler API
- hybrid retrieval over single-channel — each channel catches what others miss. RRF fusion is provably better than any single signal
- local-first — everything runs on your machine by default. API backends are optional
- surprise at write time — prevents garbage in, not just garbage out