Monitor API

Base path: /api/monitoring (with the exception of health probes at /health/* and Prometheus at /api/metrics/prometheus)

All endpoints require authentication (except health probes and Prometheus). See API Overview for auth details.

System health

Basic health check

GET /health

Public — no authentication required. Returns immediately:

{
  "status": "ok",
  "version": "0.5.0",
  "uptime_seconds": 86400
}

Readiness probe

GET /health/ready

Returns 200 when the daemon is ready to serve requests, 503 during startup or if critical services are down. Public — use for load balancer health checks and Kubernetes readiness probes.

Liveness probe

GET /health/live

Returns 200 as long as the process is alive. Public — use for container restart policies.

Detailed system health

GET /api/monitoring/health

Full component health including latency:

{
  "overall": "healthy",
  "uptime": 604800,
  "version": "0.5.0",
  "components": [
    {"id": "api", "name": "API Server", "status": "healthy", "latency": 45, "last_check": "2026-03-02T12:00:00Z"},
    {"id": "db", "name": "Database", "status": "healthy", "latency": 12, "last_check": "2026-03-02T12:00:00Z"},
    {"id": "workers", "name": "Workers", "status": "healthy", "latency": 5, "last_check": "2026-03-02T12:00:00Z"},
    {"id": "memory", "name": "Memory Store", "status": "degraded", "latency": 150, "message": "High latency detected", "last_check": "2026-03-02T12:00:00Z"},
    {"id": "ai", "name": "AI Service", "status": "healthy", "latency": 800, "last_check": "2026-03-02T12:00:00Z"}
  ],
  "last_check": "2026-03-02T12:00:00Z"
}

Component detail

GET /api/monitoring/health/components/{id}

Returns status for a single component by ID (e.g., api, db, workers, memory, ai).

Trigger health check

POST /api/monitoring/health/check

Forces an immediate health re-evaluation and returns updated status.

Usage statistics

Token usage summary

GET /api/monitoring/tokens

Returns current token consumption and limits:

{
  "total_used": 2500000,
  "total_limit": 5000000,
  "used_today": 150000,
  "limit_today": 500000,
  "by_model": {
    "claude-sonnet-4-6": {
      "model": "claude-sonnet-4-6",
      "input_tokens": 80000,
      "output_tokens": 40000,
      "total_tokens": 120000,
      "cost_estimate": 3.60
    }
  },
  "by_project": [
    {"project_id": "proj_001", "project_name": "API Development", "tokens_used": 100000, "percentage": 66.7}
  ],
  "trend": "stable"
}

Token usage history

GET /api/monitoring/tokens/history

Query: period (day, week, month)

Returns a time series for charting:

[
  {"date": "2026-03-01", "input_tokens": 40000, "output_tokens": 20000, "total_tokens": 60000},
  {"date": "2026-03-02", "input_tokens": 45000, "output_tokens": 22000, "total_tokens": 67000}
]

Usage overview

GET /api/monitoring/usage

Aggregate usage summary including agent activity.

Activity summary

GET /api/monitoring/activity-summary

Returns a summary of recent system events grouped by type.

Performance

Performance overview

GET /api/monitoring/performance

Returns current performance metrics (response times, throughput, error rates).

Performance history

GET /api/monitoring/performance/history

Query: period, granularity

Returns historical performance data for trending charts.

System metrics

GET /api/monitoring/system

Returns system resource usage (CPU, memory, disk).

Metrics (Prometheus)

GET /api/metrics/prometheus

Public — no authentication required. Returns Prometheus text-format metrics:

# HELP snippbot_tasks_total Total tasks executed
# TYPE snippbot_tasks_total counter
snippbot_tasks_total{status="completed"} 4821
snippbot_tasks_total{status="failed"} 47

# HELP snippbot_token_usage_total Total tokens consumed
# TYPE snippbot_token_usage_total counter
snippbot_token_usage_total{model="claude-sonnet-4-6",type="input"} 12847291

# HELP snippbot_active_sessions Current active chat sessions
# TYPE snippbot_active_sessions gauge
snippbot_active_sessions 3

# HELP snippbot_scheduler_jobs_active Active scheduled jobs
# TYPE snippbot_scheduler_jobs_active gauge
snippbot_scheduler_jobs_active 10

Prometheus scrape config:

scrape_configs:
  - job_name: snippbot
    static_configs:
      - targets: ['localhost:18781']
    metrics_path: /api/metrics/prometheus

Alerts

List alerts

GET /api/monitoring/alerts

{
  "alerts": [
    {
      "id": "alert_abc123",
      "name": "High error rate",
      "condition": "task_error_rate > 0.1",
      "threshold": 0.1,
      "period_minutes": 60,
      "delivery": {"type": "webhook", "url": "https://..."},
      "enabled": true,
      "last_triggered": null
    }
  ]
}

Get an alert

GET /api/monitoring/alerts/{id}

Acknowledge an alert

POST /api/monitoring/alerts/{id}/acknowledge

Resolve an alert

POST /api/monitoring/alerts/{id}/resolve

Snooze an alert

POST /api/monitoring/alerts/{id}/snooze

Dismiss an alert

DELETE /api/monitoring/alerts/{id}

Analytics

GET /api/monitoring/analytics

Aggregated analytics across all monitoring dimensions.