Skip to content

Memory

Snippbot does not just process conversations — it learns from them. A 5-tier cognitive architecture filters noise at input, manages active context, stores episodes with semantic search, builds a knowledge graph of entities and relations, and closes the loop with recall feedback that improves memory quality over time. All data stays on your machine. No cloud. No sync. Your agent’s memory is yours alone.

Every piece of information flows through five tiers — from raw input filtering all the way to closed-loop learning. Each tier serves a distinct role in how your agent remembers.

The first line of defense against memory noise. Before any conversation exchange becomes a permanent episode, the sensory buffer applies:

  • Minimum-length filtering — trivial exchanges like “ok” or “sure” (under 100 characters) are skipped entirely.
  • Near-duplicate detection — Jaccard word-set similarity checks prevent storing the same information twice. Episodes with greater than 80% overlap are deduplicated.
  • Content normalization — whitespace cleanup, Unicode NFC normalization, and truncation at 10,000 characters.
  • Importance and valence scoring — calculated from content keywords and sentiment analysis, not hardcoded defaults.

What your agent is “thinking about” right now. The context window manager enforces token budgets (up to 1M tokens for supported models) and compacts older messages via LLM summarization when limits are reached. A session entity tracker monitors which entities are “hot” in the current conversation — topics mentioned repeatedly get priority in recall, making the agent more contextually aware as conversations deepen.

Long-term storage for everything your agent has experienced. Every conversation is captured as an episode — stored in SQLite with full-text search via FTS5/BM25 and semantic similarity via 384-dimensional embeddings in an HNSW index. Episodes are scored for importance (0.0 to 1.0) and valence (negative to positive sentiment), enabling nuanced retrieval that prioritizes breakthroughs, failures, and decisions.

A structured entity-relation graph that captures what your agent knows about the world. As conversations are processed, the system extracts entities (languages, frameworks, tools, concepts, people) and relations (uses, knows, depends_on, prefers) with confidence scoring. The graph enables multi-hop traversal — for example, “Python depends_on pip, which is used_by Django, which the user prefers” — powering the Active Association recall principle.

The system that closes the learning loop. After every chat response, the meta-cognitive layer detects whether recalled memories actually influenced the LLM’s output using key-phrase overlap analysis. Memories that are consistently useful get their importance boosted (+0.05); memories that are injected but ignored get decayed (-0.02). An Ebbinghaus forgetting curve applies time-based exponential decay weighted by importance, automatically archiving stale memories. The query analyzer adapts search weights per query type — keyword-heavy for error codes, semantic-heavy for conceptual questions.

Inspired by how expert learners build knowledge, every memory recall follows three principles: relevance first, hierarchy second, connections third.

Before injecting any memory into context, results must pass a minimum relevance threshold (configurable, default 0.25). Below that threshold, nothing is injected — silence is better than noise. Every result that passes includes a “why” annotation showing the matched keyword or semantic highlight, so the agent knows why this memory surfaced.

Recalled memory is organized hierarchically, not dumped as a flat list:

  • Trunk — high-importance entities you work with (e.g., Python, FastAPI). These are core knowledge anchors with importance scores of 0.7 or higher.
  • Branches — related entities discovered via the knowledge graph (e.g., Django depends_on Python). Up to 3 neighbors per entity are traversed.
  • Leaves — specific episode content that matched the query. These are the concrete details and conversations.

The agent anchors new information to what it already knows, understanding fundamentals before details.

Beyond keyword and semantic search, the knowledge graph discovers related episodes via entity links. Ask about Django? The graph knows you use Django, that Django depends on Python, and that you discussed Python async patterns last week. Those connections surface automatically — not because the words matched, but because the concepts are linked.

The memory system combines two search strategies and merges them using Reciprocal Rank Fusion (RRF) for the best of both.

SQLite FTS5 with BM25 ranking — the same algorithm behind search engines. Excels at exact matches, error codes, version numbers, and technical identifiers. Includes a 7-day recency boost that gives recent memories up to 10% more weight. Supports phrase matching, boolean operators, and prefix wildcards.

Dense 384-dimensional embeddings via the all-MiniLM-L6-v2 model, indexed in an HNSW graph with cosine similarity. Understands meaning, not just words — “how do I handle errors?” finds memories about exception handling, fault tolerance, and retry patterns even without keyword overlap. Configuration: M=16 connections, EF construction=200, EF search=50.

Both search strategies run in parallel and their results are merged using RRF — a proven technique from information retrieval that combines rankings without requiring score normalization. A query analyzer auto-detects the optimal blend:

Query TypeKeyword WeightVector WeightExample
Keyword-heavy70%30%Error codes, acronyms, version numbers
Balanced (default)30%70%General queries
Semantic-heavy15%85%“How do I…”, conceptual questions

The knowledge graph is built automatically from your conversations. As you interact with your agent, entities and relationships are extracted and linked together.

TypeExamples
LanguagesPython, JavaScript, TypeScript, Rust, Go, Java, C++, Ruby, Swift, Kotlin, and more
FrameworksReact, Angular, Vue, Django, Flask, FastAPI, Express, Next.js, TensorFlow, PyTorch, and more
ToolsGit, Docker, Kubernetes, AWS, GCP, Azure, Terraform, GitHub, VS Code, npm, pip, and more
ConceptsMachine learning, microservices, serverless, CI/CD, testing, TDD, database, API, and more
PeopleNames extracted from conversations
RelationDetection Keywords
uses”use”, “using”, “used”, “utilize”
knows”know”, “familiar with”, “experience with”
prefers”prefer”, “like”, “love”, “favorite”
dislikes”dislike”, “hate”, “avoid”
depends_on”depend”, “requires”, “needs”, “built on”
related_to”related”, “similar”, “comparable”

Entities matched from the known sets receive a confidence score of 0.8. Pattern-matched entities not in the known sets receive 0.5. Entities below 0.3 confidence are filtered out.

Every episode is scored for importance and valence (sentiment) to help the system prioritize what matters.

FactorBoostDescription
Task failure+0.2 over baseFailures are boosted because agents learn from mistakes
Project completion0.9 baseMajor milestones are high-value
High-importance keywords+0.1Words like “critical”, “decision”, “breakthrough”, “learned”
Long content (>1,000 chars)+0.1Substantial conversations carry more weight
Complex content (>5,000 tokens)+0.1Token complexity indicates depth

Importance is capped at 1.0. Episodes are categorized as high (0.7 or above), medium (0.3 to 0.69), or low (below 0.3).

Valence measures emotional tone on a scale from -1.0 (negative) to +1.0 (positive). Positive keywords include “success”, “completed”, “fixed”, “resolved”. Negative keywords include “error”, “failed”, “broken”, “crashed”. The raw sentiment ratio is dampened by 0.8 to avoid extreme swings.

Configure memory from Settings in the sidebar, under the Memory section.

SettingDefaultDescription
Memory EnabledOnMaster toggle — enables or disables the entire memory system
Auto RecallOnAutomatically surface relevant memories during conversations
Recall ScopeAllWhich memories to search: All (global), Project (current project only), Session (current session only), or None
Retention PolicyForeverHow long to keep episodes: Forever, 1 Year, 6 Months, 3 Months, or 1 Month
Max Episodes10,000Maximum number of episodes to store (range: 100 to 100,000)
Auto SummarizeOnAutomatically compress older episodes via LLM summarization
Summarize Threshold30Number of messages before auto-summarization triggers (range: 7 to 90 days)
Min Relevance Threshold0.25Minimum similarity score (0.0 to 1.0) for injecting memories into chat context. Below this threshold, nothing is injected.

All memory data stays on your local machine. There is no cloud sync, no data transmission, and no external storage. Your agent’s memory exists only on your hardware.

Configure privacy from Settings in the sidebar, under the Memory section alongside the general memory settings.

SettingDefaultDescription
Redact SecretsOnAutomatically strips API keys, passwords, and other secrets from stored episodes before they are written to the database
Anonymize NamesOffRemoves or replaces personal names in stored episodes
Local OnlyOnEnforces that all memory data remains on the local machine — no cloud sync
Exclude PatternsEmptyRegex or glob patterns to exclude from memory storage (e.g., *.env, sensitive file paths)
  • Export Memory — download a JSON file containing all episodes and metadata.
  • Clear All Memory — cascading delete of all episodes, the vector index, and the knowledge graph. This action requires confirmation and is irreversible.
  • Retention Policy — automatically prunes episodes older than the configured retention period.
  • Episode Limit — when the maximum is reached, the oldest low-importance episodes are removed first.

API keys, OAuth tokens, and credentials managed by Snippbot are encrypted with AES-256-GCM and PBKDF2 key derivation (600,000 iterations). The OS keychain holds the master key — never a plaintext file. Episodic memory content is stored locally in SQLite with configurable secret redaction to strip sensitive values before storage.

Memory capture is fully automatic and non-blocking. After every successful chat response:

  1. The user message and assistant response are combined into a single content block.
  2. The Sensory Buffer (Tier 1) filters the content — trivial exchanges are skipped, near-duplicates are detected, and content is normalized.
  3. Importance and valence are calculated from the content.
  4. An episode is created in the SQLite database and automatically indexed by FTS5.
  5. Entities are extracted and added to (or updated in) the knowledge graph.
  6. The content is embedded into a 384-dimensional vector and inserted into the HNSW index.

This entire process runs as a background task and does not block the chat response.

  • Manage Memory guide — step-by-step instructions for searching, configuring, and managing memory through the UI