Retrieval Pipeline¶
engram runs an 8-stage hybrid retrieval pipeline that fuses four parallel search channels.
the pipeline¶
query
│
├── intent classification (why/when/who/how/what)
│ → dynamic signal weights per intent type
│
├── dense HNSW search (bge-small, 384-dim, hnswlib) → top 3k candidates
├── BM25 via sqlite FTS5 (content + hypothetical queries) → top 3k candidates
├── entity graph BFS (1-hop traversal, strength-weighted) → top k candidates
└── Hopfield associative (pattern completion, β=8.0) → top k candidates
│
▼
intent-weighted reciprocal rank fusion (k=60)
│
▼
temporal + importance boosting
│
▼
cross-encoder reranking (ms-marco-MiniLM or Voyage rerank-2.5)
│
▼
deep MLP reranker (optional, trained on access patterns)
│
▼
gaussian noise (ACT-R, σ=0.02) + threshold gate
│
▼
final top-k results
stage 0: intent classification¶
queries are classified into 5 intent types using regex patterns:
| intent | triggers | effect |
|---|---|---|
why |
"why", "because", "reason" | boost graph (causal reasoning) |
when |
"when", "date", "timeline" | boost BM25 (date matching) |
who |
"who", "person", "built" | boost graph (entity lookup) |
how |
"how to", "steps", "fix" | boost dense (procedural) |
what |
default | balanced weights |
stage 1: parallel candidate generation¶
four channels run independently:
dense search (HNSW)¶
embeds the query with bge-small-en-v1.5, searches the HNSW index for approximate nearest neighbors. O(log n) at any scale.
BM25 (FTS5)¶
full-text search via SQLite's FTS5 extension. matches on content text and hypothetical queries (generated at ingestion time via docTTTTTquery).
entity graph BFS¶
extracts entity names from the query, finds matching entities, retrieves their memories (hop 0, score 1.0), then traverses 1-hop related entities (score 0.5 * relationship strength).
Hopfield associative¶
pattern completion via modern Hopfield network: ξ_new = X^T · softmax(β · X · ξ). retrieves memories by associative recall, not just similarity.
stage 2: RRF fusion¶
reciprocal rank fusion combines all four channels:
the k=60 constant comes from Cormack et al. 2009. intent-specific weights adjust channel importance per query type.
stage 3: temporal + importance boosting¶
- temporal boost for date-matching memories
- importance scaling:
score *= (0.8 + 0.4 * importance) - recency decay (Ebbinghaus):
exp(-0.693 * age_days / half_life)for episodic memories - access frequency boost:
1.0 + 0.1 * log(1 + access_count)
stage 4: cross-encoder reranking¶
top-20 candidates are jointly scored by a cross-encoder (query + document → relevance score). more accurate than bi-encoder similarity but O(n*q) so only applied to the shortlist.
supports local (cross-encoder/ms-marco-MiniLM-L-6-v2) and cloud (rerank-2.5 via Voyage API).
stage 5: deep MLP reranker¶
optional learned reranker trained on access patterns. 2-layer MLP taking 10 features (cosine similarity, importance, access count, age, layer, retention score) → relevance prediction. <1ms per query.
train with train_reranker after accumulating usage data.
stage 6: noise + threshold¶
small gaussian noise (σ=0.02, ACT-R inspired) for beneficial retrieval variation. minimum score threshold gates out low-quality results.
tuning¶
all parameters are in config.yaml:
retrieval:
top_k: 10 # final results
rrf_k: 60 # RRF constant
min_confidence: 0.60 # threshold gate
rerank_candidates: 20 # cross-encoder shortlist
dense_multiplier: 3 # dense candidates = top_k * 3
bm25_multiplier: 3 # BM25 candidates = top_k * 3
debugging¶
use --debug on CLI or debug=true on the API to see per-stage breakdown:
returns dense candidates, BM25 candidates, graph candidates, RRF scores, boosted scores, reranked scores, and total latency.