Memory

Snippbot does not just process conversations — it learns from them. A 5-tier cognitive architecture filters noise at input, manages active context, stores episodes with semantic search, builds a knowledge graph of entities and relations, and closes the loop with recall feedback that improves memory quality over time. All data stays on your machine. No cloud. No sync. Your agent’s memory is yours alone.

The 5-tier cognitive architecture

Every piece of information flows through five tiers — from raw input filtering all the way to closed-loop learning. Each tier serves a distinct role in how your agent remembers.

Tier 1: Sensory Buffer

The first line of defense against memory noise. Before any conversation exchange becomes a permanent episode, the sensory buffer applies:

Minimum-length filtering — trivial exchanges like “ok” or “sure” (under 100 characters) are skipped entirely.
Near-duplicate detection — Jaccard word-set similarity checks prevent storing the same information twice. Episodes with greater than 80% overlap are deduplicated.
Content normalization — whitespace cleanup, Unicode NFC normalization, and truncation at 10,000 characters.
Importance and valence scoring — calculated from content keywords and sentiment analysis, not hardcoded defaults.

Tier 2: Working Memory

What your agent is “thinking about” right now. The context window manager enforces token budgets (up to 1M tokens for supported models) and compacts older messages via LLM summarization when limits are reached. A session entity tracker monitors which entities are “hot” in the current conversation — topics mentioned repeatedly get priority in recall, making the agent more contextually aware as conversations deepen.

Tier 3: Episodic Memory

Long-term storage for everything your agent has experienced. Every conversation is captured as an episode — stored in SQLite with full-text search via FTS5/BM25 and semantic similarity via 384-dimensional embeddings in an HNSW index. Episodes are scored for importance (0.0 to 1.0) and valence (negative to positive sentiment), enabling nuanced retrieval that prioritizes breakthroughs, failures, and decisions.

Tier 4: Knowledge Graph

A structured entity-relation graph that captures what your agent knows about the world. As conversations are processed, the system extracts entities (languages, frameworks, tools, concepts, people) and relations (uses, knows, depends_on, prefers) with confidence scoring. The graph enables multi-hop traversal — for example, “Python depends_on pip, which is used_by Django, which the user prefers” — powering the Active Association recall principle.

Tier 5: Meta-Cognitive Layer

The system that closes the learning loop. After every chat response, the meta-cognitive layer detects whether recalled memories actually influenced the LLM’s output using key-phrase overlap analysis. Memories that are consistently useful get their importance boosted (+0.05); memories that are injected but ignored get decayed (-0.02). An Ebbinghaus forgetting curve applies time-based exponential decay weighted by importance, automatically archiving stale memories. The query analyzer adapts search weights per query type — keyword-heavy for error codes, semantic-heavy for conceptual questions.

The 3 recall principles

Inspired by how expert learners build knowledge, every memory recall follows three principles: relevance first, hierarchy second, connections third.

1. Establish Relevance

Before injecting any memory into context, results must pass a minimum relevance threshold (configurable, default 0.25). Below that threshold, nothing is injected — silence is better than noise. Every result that passes includes a “why” annotation showing the matched keyword or semantic highlight, so the agent knows why this memory surfaced.

2. Semantic Tree

Recalled memory is organized hierarchically, not dumped as a flat list:

Trunk — high-importance entities you work with (e.g., Python, FastAPI). These are core knowledge anchors with importance scores of 0.7 or higher.
Branches — related entities discovered via the knowledge graph (e.g., Django depends_on Python). Up to 3 neighbors per entity are traversed.
Leaves — specific episode content that matched the query. These are the concrete details and conversations.

The agent anchors new information to what it already knows, understanding fundamentals before details.

3. Active Association

Beyond keyword and semantic search, the knowledge graph discovers related episodes via entity links. Ask about Django? The graph knows you use Django, that Django depends on Python, and that you discussed Python async patterns last week. Those connections surface automatically — not because the words matched, but because the concepts are linked.

Hybrid search

The memory system combines two search strategies and merges them using Reciprocal Rank Fusion (RRF) for the best of both.

Keyword search (FTS5/BM25)

SQLite FTS5 with BM25 ranking — the same algorithm behind search engines. Excels at exact matches, error codes, version numbers, and technical identifiers. Includes a 7-day recency boost that gives recent memories up to 10% more weight. Supports phrase matching, boolean operators, and prefix wildcards.

Vector search (HNSW)

Dense 384-dimensional embeddings via the all-MiniLM-L6-v2 model, indexed in an HNSW graph with cosine similarity. Understands meaning, not just words — “how do I handle errors?” finds memories about exception handling, fault tolerance, and retry patterns even without keyword overlap. Configuration: M=16 connections, EF construction=200, EF search=50.

Reciprocal Rank Fusion (RRF)

Both search strategies run in parallel and their results are merged using RRF — a proven technique from information retrieval that combines rankings without requiring score normalization. A query analyzer auto-detects the optimal blend:

Query Type	Keyword Weight	Vector Weight	Example
Keyword-heavy	70%	30%	Error codes, acronyms, version numbers
Balanced (default)	30%	70%	General queries
Semantic-heavy	15%	85%	“How do I…”, conceptual questions

Knowledge graph

The knowledge graph is built automatically from your conversations. As you interact with your agent, entities and relationships are extracted and linked together.

Entity types

Type	Examples
Languages	Python, JavaScript, TypeScript, Rust, Go, Java, C++, Ruby, Swift, Kotlin, and more
Frameworks	React, Angular, Vue, Django, Flask, FastAPI, Express, Next.js, TensorFlow, PyTorch, and more
Tools	Git, Docker, Kubernetes, AWS, GCP, Azure, Terraform, GitHub, VS Code, npm, pip, and more
Concepts	Machine learning, microservices, serverless, CI/CD, testing, TDD, database, API, and more
People	Names extracted from conversations

Relation types

Relation	Detection Keywords
uses	”use”, “using”, “used”, “utilize”
knows	”know”, “familiar with”, “experience with”
prefers	”prefer”, “like”, “love”, “favorite”
dislikes	”dislike”, “hate”, “avoid”
depends_on	”depend”, “requires”, “needs”, “built on”
related_to	”related”, “similar”, “comparable”

Entities matched from the known sets receive a confidence score of 0.8. Pattern-matched entities not in the known sets receive 0.5. Entities below 0.3 confidence are filtered out.

Episode importance and valence

Every episode is scored for importance and valence (sentiment) to help the system prioritize what matters.

Importance scoring

Factor	Boost	Description
Task failure	+0.2 over base	Failures are boosted because agents learn from mistakes
Project completion	0.9 base	Major milestones are high-value
High-importance keywords	+0.1	Words like “critical”, “decision”, “breakthrough”, “learned”
Long content (>1,000 chars)	+0.1	Substantial conversations carry more weight
Complex content (>5,000 tokens)	+0.1	Token complexity indicates depth

Importance is capped at 1.0. Episodes are categorized as high (0.7 or above), medium (0.3 to 0.69), or low (below 0.3).

Valence

Valence measures emotional tone on a scale from -1.0 (negative) to +1.0 (positive). Positive keywords include “success”, “completed”, “fixed”, “resolved”. Negative keywords include “error”, “failed”, “broken”, “crashed”. The raw sentiment ratio is dampened by 0.8 to avoid extreme swings.

Memory settings

Configure memory from Settings in the sidebar, under the Memory section.

Setting	Default	Description
Memory Enabled	On	Master toggle — enables or disables the entire memory system
Auto Recall	On	Automatically surface relevant memories during conversations
Recall Scope	All	Which memories to search: All (global), Project (current project only), Session (current session only), or None
Retention Policy	Forever	How long to keep episodes: Forever, 1 Year, 6 Months, 3 Months, or 1 Month
Max Episodes	10,000	Maximum number of episodes to store (range: 100 to 100,000)
Auto Summarize	On	Automatically compress older episodes via LLM summarization
Summarize Threshold	30	Number of messages before auto-summarization triggers (range: 7 to 90 days)
Min Relevance Threshold	0.25	Minimum similarity score (0.0 to 1.0) for injecting memories into chat context. Below this threshold, nothing is injected.

Privacy and data sovereignty

All memory data stays on your local machine. There is no cloud sync, no data transmission, and no external storage. Your agent’s memory exists only on your hardware.

Privacy settings

Configure privacy from Settings in the sidebar, under the Memory section alongside the general memory settings.

Setting	Default	Description
Redact Secrets	On	Automatically strips API keys, passwords, and other secrets from stored episodes before they are written to the database
Anonymize Names	Off	Removes or replaces personal names in stored episodes
Local Only	On	Enforces that all memory data remains on the local machine — no cloud sync
Exclude Patterns	Empty	Regex or glob patterns to exclude from memory storage (e.g., `*.env`, sensitive file paths)

Data controls

Export Memory — download a JSON file containing all episodes and metadata.
Clear All Memory — cascading delete of all episodes, the vector index, and the knowledge graph. This action requires confirmation and is irreversible.
Retention Policy — automatically prunes episodes older than the configured retention period.
Episode Limit — when the maximum is reached, the oldest low-importance episodes are removed first.

Secret store encryption

API keys, OAuth tokens, and credentials managed by Snippbot are encrypted with AES-256-GCM and PBKDF2 key derivation (600,000 iterations). The OS keychain holds the master key — never a plaintext file. Episodic memory content is stored locally in SQLite with configurable secret redaction to strip sensitive values before storage.

How auto-capture works

Memory capture is fully automatic and non-blocking. After every successful chat response:

The user message and assistant response are combined into a single content block.
The Sensory Buffer (Tier 1) filters the content — trivial exchanges are skipped, near-duplicates are detected, and content is normalized.
Importance and valence are calculated from the content.
An episode is created in the SQLite database and automatically indexed by FTS5.
Entities are extracted and added to (or updated in) the knowledge graph.
The content is embedded into a 384-dimensional vector and inserted into the HNSW index.

This entire process runs as a background task and does not block the chat response.

Manage Memory guide — step-by-step instructions for searching, configuring, and managing memory through the UI

Memory

The 5-tier cognitive architecture

Tier 1: Sensory Buffer

Tier 2: Working Memory

Tier 3: Episodic Memory

Tier 4: Knowledge Graph

Tier 5: Meta-Cognitive Layer

The 3 recall principles

1. Establish Relevance

2. Semantic Tree

3. Active Association

Hybrid search

Keyword search (FTS5/BM25)

Vector search (HNSW)

Reciprocal Rank Fusion (RRF)

Knowledge graph

Entity types

Relation types

Episode importance and valence

Importance scoring

Valence

Memory settings

Privacy and data sovereignty

Privacy settings

Data controls

Secret store encryption

How auto-capture works

Docs

Snippbot

Legal

Memory

The 5-tier cognitive architecture

Tier 1: Sensory Buffer

Tier 2: Working Memory

Tier 3: Episodic Memory

Tier 4: Knowledge Graph

Tier 5: Meta-Cognitive Layer

The 3 recall principles

1. Establish Relevance

2. Semantic Tree

3. Active Association

Hybrid search

Keyword search (FTS5/BM25)

Vector search (HNSW)

Reciprocal Rank Fusion (RRF)

Knowledge graph

Entity types

Relation types

Episode importance and valence

Importance scoring

Valence

Memory settings

Privacy and data sovereignty

Privacy settings

Data controls

Secret store encryption

How auto-capture works

Related

Docs

Snippbot

Legal