Skip to content

Execution

When you send a message in the Snippbot UI, it flows through a multi-stage execution pipeline:

User message in chat UI
Context built (project, memory, user preferences)
Context window checked (summarize older messages if over budget)
Message sent to LLM (streaming response)
├──→ Text response → stream to UI
└──→ Tool calls → execute tools → feed results back to LLM
└── Loop up to 10 turns until done

The core execution engine is the streaming chat loop. When you send a message:

  1. The conversation history is loaded and any file attachments are processed (text extraction, OCR, or vision analysis for images)
  2. The Context Builder assembles additional context: project info, user preferences, and memory recall results
  3. The Context Window Manager checks if the conversation exceeds the token budget --- if so, older messages are summarized into a compact preamble
  4. The message is streamed to the LLM provider
  5. If the agent is in agentic mode (tool use enabled), tool calls are executed locally and their results are fed back to the LLM for up to 10 turns per message

The chat UI supports several context modes that change how the agent behaves:

ModePurpose
DefaultGeneral-purpose conversation
brainstormRequirements discussion --- preserves confirmed requirements and decisions
output-refineIterative refinement --- tracks changes requested and applied
browserWeb automation with Playwright
gameTabletop RPG narrative with specialized story summarization

Each mode uses a tailored summarization prompt when the context window is compacted, so the agent retains the most relevant information for that mode.

Snippbot automatically manages the conversation context window to stay within model token limits. This is handled by the Context Window Manager (part of the Working Memory tier in the cognitive memory architecture).

  1. Token estimation: Messages are estimated at roughly 4 characters per token. Images count as approximately 1,000 tokens each.
  2. Budget calculation: The effective budget is determined by the model’s context limit minus the system prompt tokens and a safety margin of 20,000 tokens. The default budget is 150,000 tokens.
  3. Split point: When the conversation exceeds the budget, the manager walks backward from the most recent messages, always keeping at least the last 4 messages (2 full turns).
  4. Summarization: Older messages are summarized by Claude Haiku for speed and cost. If the LLM summary fails, a heuristic fallback truncates each message to 150 characters.
  5. Reassembly: The summary is injected as a system message, followed by the recent messages in full.
ModelContext limit
Claude Sonnet 4.5200,000 tokens
Claude Sonnet 4200,000 tokens
Claude Haiku 4.5200,000 tokens
Claude Opus 4200,000 tokens
Gemini 2.0 Flash1,000,000 tokens
Gemini 2.5 Pro1,000,000 tokens

You can control context window behavior from the Agent Settings panel in the UI:

StrategyBehavior
preserve (default)Keep as much conversation history as possible, summarize only when necessary
compactAggressively summarize to minimize token usage
summarizeAlways summarize older messages to maintain a compact context

When agentic mode is enabled in the chat UI, the agent can use tools (file operations, code execution, web browsing, etc.) to complete tasks. Each message can trigger a multi-turn loop:

  1. The LLM responds with one or more tool calls
  2. Each tool is executed locally by the Tool Executor, which runs within the project’s working directory
  3. Tool results are collected and sent back to the LLM as the next turn
  4. The loop continues until the LLM produces a final text response (no more tool calls) or the 10-turn safety limit is reached

Execution behavior is configurable from the Agent Settings page in the Snippbot UI. These settings are stored per user in the local database.

SettingDefaultRangeDescription
Personalitybalancedprecise, balanced, creative, minimalAgent response style
Verbositynormalminimal, normal, verbose, debugHow much detail in responses
Auto-executefalseon/offWhether tasks run without manual approval
Approval threshold500—100Risk threshold above which approval is required
Max tokens per task4,0961,024—131,072Token limit for a single task
Max retries30—10Maximum retry attempts on failure
Timeout300s30—3,600sTask timeout in seconds
Context window strategypreservecompact, preserve, summarizeHow older messages are handled

Failed tasks are automatically retried based on how the failure is classified:

ClassDescriptionRetry behavior
transientTemporary error (rate limit, network timeout)Retry with exponential backoff
recoverableLogic error the agent might fix on retryRetry with error context added to prompt
terminalUnrecoverable (bad credentials, invalid input)Mark failed immediately, no retries

When retries are enabled (default: 3 attempts), the delay doubles each attempt:

Attempt 1: immediate
Attempt 2: 30s delay
Attempt 3: 60s delay
Attempt 4: 120s delay → mark failed

Snippbot supports sub-agents --- specialized agents that can be spawned to handle subtasks. Sub-agents have their own lifecycle, resource budgets, and concurrency controls.

RolePurpose
researcherInformation gathering and analysis
coderCode writing and implementation
reviewerCode review and quality assessment
testerTest creation and execution
analystData analysis
creativeCreative writing and ideation
generalGeneral-purpose tasks

Sub-agents move through these states:

pending → awaiting_approval → queued → running → completed
├── failed
├── cancelled
└── timed_out

Sub-agent execution is governed by concurrency limits:

  • Global maximum: 8 concurrent sub-agents across all parents
  • Per-parent maximum: 5 concurrent sub-agents per parent agent
  • Priority queuing: When limits are reached, sub-agents are queued with priority-based ordering (1 = highest, 10 = lowest)

Each sub-agent has configurable resource constraints:

LimitDefaultRange
Max turns201—100
Max tokens100,0001,000—1,000,000
Timeout60 minutes1—240 minutes
Priority51—10

For complex development tasks, Snippbot provides team orchestration --- an autonomous multi-agent loop that follows the Architect, Executor, and Reviewer pattern:

  1. Architect (read-only): Analyzes the task, creates a plan, and identifies requirements
  2. Executor (full access): Implements the plan with full tool access (file writes, code execution)
  3. Reviewer (read-only): Reviews the output and issues a verdict: APPROVE, REQUEST_CHANGES, or BLOCK

If the reviewer requests changes, the loop iterates (up to 3 iterations by default). Each phase has its own model, turn limit, and tool access controls.

SettingDefault
Max iterations3
Architect max turns15
Executor max turns30
Reviewer max turns15
Timeout30 minutes
Total token budget500,000

Snippbot emits events throughout the execution lifecycle that you can observe in the UI:

  • Execution lifecycle: execution.started, execution.paused, execution.resumed, execution.completed, execution.failed
  • Task lifecycle: task.queued, task.started, task.completed, task.failed, task.retrying
  • Sub-agent lifecycle: subagent.spawned, subagent.started, subagent.completed, subagent.failed
  • Team orchestration: team.run.started, team.phase.started, team.phase.completed, team.review.decision, team.run.completed
  • Context window: context.window.applied (emitted when older messages are summarized)

The Snippbot UI provides several views for monitoring execution:

  • Chat panel: Shows streaming responses, tool call results, and error messages in real time
  • Activity panel: Displays execution events, tool outcomes, and sub-agent status
  • Projects page: Shows task-level status for project workflows, including retry counts and failure details