Memory
Snippbot does not just process conversations — it learns from them. A 5-tier cognitive architecture filters noise at input, manages active context, stores episodes with semantic search, builds a knowledge graph of entities and relations, and closes the loop with recall feedback that improves memory quality over time. All data stays on your machine. No cloud. No sync. Your agent’s memory is yours alone.
The 5-tier cognitive architecture
Section titled “The 5-tier cognitive architecture”Every piece of information flows through five tiers — from raw input filtering all the way to closed-loop learning. Each tier serves a distinct role in how your agent remembers.
Tier 1: Sensory Buffer
Section titled “Tier 1: Sensory Buffer”The first line of defense against memory noise. Before any conversation exchange becomes a permanent episode, the sensory buffer applies:
- Minimum-length filtering — trivial exchanges like “ok” or “sure” (under 100 characters) are skipped entirely.
- Near-duplicate detection — Jaccard word-set similarity checks prevent storing the same information twice. Episodes with greater than 80% overlap are deduplicated.
- Content normalization — whitespace cleanup, Unicode NFC normalization, and truncation at 10,000 characters.
- Importance and valence scoring — calculated from content keywords and sentiment analysis, not hardcoded defaults.
Tier 2: Working Memory
Section titled “Tier 2: Working Memory”What your agent is “thinking about” right now. The context window manager enforces token budgets (up to 1M tokens for supported models) and compacts older messages via LLM summarization when limits are reached. A session entity tracker monitors which entities are “hot” in the current conversation — topics mentioned repeatedly get priority in recall, making the agent more contextually aware as conversations deepen.
Tier 3: Episodic Memory
Section titled “Tier 3: Episodic Memory”Long-term storage for everything your agent has experienced. Every conversation is captured as an episode — stored in SQLite with full-text search via FTS5/BM25 and semantic similarity via 384-dimensional embeddings in an HNSW index. Episodes are scored for importance (0.0 to 1.0) and valence (negative to positive sentiment), enabling nuanced retrieval that prioritizes breakthroughs, failures, and decisions.
Tier 4: Knowledge Graph
Section titled “Tier 4: Knowledge Graph”A structured entity-relation graph that captures what your agent knows about the world. As conversations are processed, the system extracts entities (languages, frameworks, tools, concepts, people) and relations (uses, knows, depends_on, prefers) with confidence scoring. The graph enables multi-hop traversal — for example, “Python depends_on pip, which is used_by Django, which the user prefers” — powering the Active Association recall principle.
Tier 5: Meta-Cognitive Layer
Section titled “Tier 5: Meta-Cognitive Layer”The system that closes the learning loop. After every chat response, the meta-cognitive layer detects whether recalled memories actually influenced the LLM’s output using key-phrase overlap analysis. Memories that are consistently useful get their importance boosted (+0.05); memories that are injected but ignored get decayed (-0.02). An Ebbinghaus forgetting curve applies time-based exponential decay weighted by importance, automatically archiving stale memories. The query analyzer adapts search weights per query type — keyword-heavy for error codes, semantic-heavy for conceptual questions.
The 3 recall principles
Section titled “The 3 recall principles”Inspired by how expert learners build knowledge, every memory recall follows three principles: relevance first, hierarchy second, connections third.
1. Establish Relevance
Section titled “1. Establish Relevance”Before injecting any memory into context, results must pass a minimum relevance threshold (configurable, default 0.25). Below that threshold, nothing is injected — silence is better than noise. Every result that passes includes a “why” annotation showing the matched keyword or semantic highlight, so the agent knows why this memory surfaced.
2. Semantic Tree
Section titled “2. Semantic Tree”Recalled memory is organized hierarchically, not dumped as a flat list:
- Trunk — high-importance entities you work with (e.g., Python, FastAPI). These are core knowledge anchors with importance scores of 0.7 or higher.
- Branches — related entities discovered via the knowledge graph (e.g., Django depends_on Python). Up to 3 neighbors per entity are traversed.
- Leaves — specific episode content that matched the query. These are the concrete details and conversations.
The agent anchors new information to what it already knows, understanding fundamentals before details.
3. Active Association
Section titled “3. Active Association”Beyond keyword and semantic search, the knowledge graph discovers related episodes via entity links. Ask about Django? The graph knows you use Django, that Django depends on Python, and that you discussed Python async patterns last week. Those connections surface automatically — not because the words matched, but because the concepts are linked.
Hybrid search
Section titled “Hybrid search”The memory system combines two search strategies and merges them using Reciprocal Rank Fusion (RRF) for the best of both.
Keyword search (FTS5/BM25)
Section titled “Keyword search (FTS5/BM25)”SQLite FTS5 with BM25 ranking — the same algorithm behind search engines. Excels at exact matches, error codes, version numbers, and technical identifiers. Includes a 7-day recency boost that gives recent memories up to 10% more weight. Supports phrase matching, boolean operators, and prefix wildcards.
Vector search (HNSW)
Section titled “Vector search (HNSW)”Dense 384-dimensional embeddings via the all-MiniLM-L6-v2 model, indexed in an HNSW graph with cosine similarity. Understands meaning, not just words — “how do I handle errors?” finds memories about exception handling, fault tolerance, and retry patterns even without keyword overlap. Configuration: M=16 connections, EF construction=200, EF search=50.
Reciprocal Rank Fusion (RRF)
Section titled “Reciprocal Rank Fusion (RRF)”Both search strategies run in parallel and their results are merged using RRF — a proven technique from information retrieval that combines rankings without requiring score normalization. A query analyzer auto-detects the optimal blend:
| Query Type | Keyword Weight | Vector Weight | Example |
|---|---|---|---|
| Keyword-heavy | 70% | 30% | Error codes, acronyms, version numbers |
| Balanced (default) | 30% | 70% | General queries |
| Semantic-heavy | 15% | 85% | “How do I…”, conceptual questions |
Knowledge graph
Section titled “Knowledge graph”The knowledge graph is built automatically from your conversations. As you interact with your agent, entities and relationships are extracted and linked together.
Entity types
Section titled “Entity types”| Type | Examples |
|---|---|
| Languages | Python, JavaScript, TypeScript, Rust, Go, Java, C++, Ruby, Swift, Kotlin, and more |
| Frameworks | React, Angular, Vue, Django, Flask, FastAPI, Express, Next.js, TensorFlow, PyTorch, and more |
| Tools | Git, Docker, Kubernetes, AWS, GCP, Azure, Terraform, GitHub, VS Code, npm, pip, and more |
| Concepts | Machine learning, microservices, serverless, CI/CD, testing, TDD, database, API, and more |
| People | Names extracted from conversations |
Relation types
Section titled “Relation types”| Relation | Detection Keywords |
|---|---|
| uses | ”use”, “using”, “used”, “utilize” |
| knows | ”know”, “familiar with”, “experience with” |
| prefers | ”prefer”, “like”, “love”, “favorite” |
| dislikes | ”dislike”, “hate”, “avoid” |
| depends_on | ”depend”, “requires”, “needs”, “built on” |
| related_to | ”related”, “similar”, “comparable” |
Entities matched from the known sets receive a confidence score of 0.8. Pattern-matched entities not in the known sets receive 0.5. Entities below 0.3 confidence are filtered out.
Episode importance and valence
Section titled “Episode importance and valence”Every episode is scored for importance and valence (sentiment) to help the system prioritize what matters.
Importance scoring
Section titled “Importance scoring”| Factor | Boost | Description |
|---|---|---|
| Task failure | +0.2 over base | Failures are boosted because agents learn from mistakes |
| Project completion | 0.9 base | Major milestones are high-value |
| High-importance keywords | +0.1 | Words like “critical”, “decision”, “breakthrough”, “learned” |
| Long content (>1,000 chars) | +0.1 | Substantial conversations carry more weight |
| Complex content (>5,000 tokens) | +0.1 | Token complexity indicates depth |
Importance is capped at 1.0. Episodes are categorized as high (0.7 or above), medium (0.3 to 0.69), or low (below 0.3).
Valence
Section titled “Valence”Valence measures emotional tone on a scale from -1.0 (negative) to +1.0 (positive). Positive keywords include “success”, “completed”, “fixed”, “resolved”. Negative keywords include “error”, “failed”, “broken”, “crashed”. The raw sentiment ratio is dampened by 0.8 to avoid extreme swings.
Memory settings
Section titled “Memory settings”Configure memory from Settings in the sidebar, under the Memory section.
| Setting | Default | Description |
|---|---|---|
| Memory Enabled | On | Master toggle — enables or disables the entire memory system |
| Auto Recall | On | Automatically surface relevant memories during conversations |
| Recall Scope | All | Which memories to search: All (global), Project (current project only), Session (current session only), or None |
| Retention Policy | Forever | How long to keep episodes: Forever, 1 Year, 6 Months, 3 Months, or 1 Month |
| Max Episodes | 10,000 | Maximum number of episodes to store (range: 100 to 100,000) |
| Auto Summarize | On | Automatically compress older episodes via LLM summarization |
| Summarize Threshold | 30 | Number of messages before auto-summarization triggers (range: 7 to 90 days) |
| Min Relevance Threshold | 0.25 | Minimum similarity score (0.0 to 1.0) for injecting memories into chat context. Below this threshold, nothing is injected. |
Privacy and data sovereignty
Section titled “Privacy and data sovereignty”All memory data stays on your local machine. There is no cloud sync, no data transmission, and no external storage. Your agent’s memory exists only on your hardware.
Privacy settings
Section titled “Privacy settings”Configure privacy from Settings in the sidebar, under the Memory section alongside the general memory settings.
| Setting | Default | Description |
|---|---|---|
| Redact Secrets | On | Automatically strips API keys, passwords, and other secrets from stored episodes before they are written to the database |
| Anonymize Names | Off | Removes or replaces personal names in stored episodes |
| Local Only | On | Enforces that all memory data remains on the local machine — no cloud sync |
| Exclude Patterns | Empty | Regex or glob patterns to exclude from memory storage (e.g., *.env, sensitive file paths) |
Data controls
Section titled “Data controls”- Export Memory — download a JSON file containing all episodes and metadata.
- Clear All Memory — cascading delete of all episodes, the vector index, and the knowledge graph. This action requires confirmation and is irreversible.
- Retention Policy — automatically prunes episodes older than the configured retention period.
- Episode Limit — when the maximum is reached, the oldest low-importance episodes are removed first.
Secret store encryption
Section titled “Secret store encryption”API keys, OAuth tokens, and credentials managed by Snippbot are encrypted with AES-256-GCM and PBKDF2 key derivation (600,000 iterations). The OS keychain holds the master key — never a plaintext file. Episodic memory content is stored locally in SQLite with configurable secret redaction to strip sensitive values before storage.
How auto-capture works
Section titled “How auto-capture works”Memory capture is fully automatic and non-blocking. After every successful chat response:
- The user message and assistant response are combined into a single content block.
- The Sensory Buffer (Tier 1) filters the content — trivial exchanges are skipped, near-duplicates are detected, and content is normalized.
- Importance and valence are calculated from the content.
- An episode is created in the SQLite database and automatically indexed by FTS5.
- Entities are extracted and added to (or updated in) the knowledge graph.
- The content is embedded into a 384-dimensional vector and inserted into the HNSW index.
This entire process runs as a background task and does not block the chat response.
Related
Section titled “Related”- Manage Memory guide — step-by-step instructions for searching, configuring, and managing memory through the UI