v1.6.0 · Stable Apache 2.0 · Open Source

Kubernetes
that operates
itself.

KubeBolt is the open-source operations layer for Kubernetes. A streaming Copilot named Kobi answers your team in real time today — and an event-driven Autopilot will investigate incidents and manage cluster lifecycles autonomously on the 2026 roadmap.

Footprint per node

Memory

< 50 MB

CPU ceiling

< 1%

API latency p95

< 5 ms

Cold start

< 3 s

Built on a forked OpenTelemetry Collector. Hard resource ceilings.

Quickstart

$ helm install kubebolt oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt
$ kubectl port-forward svc/kubebolt 3000:80
→ Ready at http://localhost:3000
01

Surface — Operator UI

See your cluster.
In motion.

A real-time map of every pod, service, and request path. Mini-dashboards on every resource. Traffic flow you can actually watch. Built for engineers who'd rather see the topology than read another kubectl get.

01

23 resource views

Pods, Deployments, Services, Ingresses, ConfigMaps, Secrets, Jobs, Nodes — every kind gets a purpose-built view with the right columns, the right actions, the right detail panes.

02

Live, not polled

Every list, graph, and edge is driven by Watch streams over the K8s API. State changes show up in milliseconds — no F5, no stale rows, no surprise.

03

⌘K everything

Type to jump to any resource across any namespace. Fuzzy match across kinds. Recent context pinned. Works the same as your editor — because that's how you already think.

02

Subsystem A — Autonomous

Roadmap · 2026 · Cloud / Enterprise

Autopilot.
Event-driven. Long-running. Auditable.

The Autopilot will wake up only when something matters. A crash loop spreads. SLO error budget burns. A node degrades. It will open a session, gather context, decide, and act — using Claude Agent SDK with multi-region failover across Anthropic API, AWS Bedrock, and Google Vertex AI. Shipping with the KubeBolt Cloud / Enterprise tier in 2026.

01

Root-cause analysis

When an incident fires, Autopilot launches a long-running investigation session. It correlates events, logs, deploys, and prior incidents — then writes a verdict you can audit.

02

Guided remediation

Proposes a patch, runs it through a deterministic executor with policy guardrails, and rolls back if SLO breaches. You approve once; it remembers.

03

Postmortems on autopilot

Generates a draft postmortem from the incident timeline. Action items linked to PRs. Five-whys included. Edit, don't write from scratch.

04

Cluster lifecycle ops

Schedules power-on / power-off across EKS, AKS, GKE, OpenShift and node pools. Pays for KubeBolt itself in most environments.

Why Claude Agent SDK exclusively

When KubeBolt acts on your cluster, you don't buy infrastructure flexibility — you buy outcomes. We picked the model that produces the best results for autonomous K8s operations and built three failover endpoints around it. No router. No model zoo. No ambiguity in postmortems.

03

Subsystem B — Interactive Copilot

Kobi.
Senior SRE. Streaming. Yours.

Press ⌘ J and ask anything. Kobi queries your live cluster through 17+ tools, streams the answer, and proposes context-aware actions you click to execute. Available everywhere you work — UI, IDE, CLI.

Multi-model

Multi-provider. Bring your key.

Anthropic Claude or OpenAI GPT — switch per workspace, or set one as the fallback when the primary returns 429 / 5xx. Prompt caching on both sides keeps the bill predictable.

  • → Anthropic · Claude Sonnet / Opus / Haiku
  • → OpenAI · GPT-5 / o-series
  • → Auto-failover on rate limit / outage
  • → BYO API key, end-to-end

Anywhere via MCP

Cursor. Claude Code. Your terminal.

Kobi exposes a Model Context Protocol server. Plug it into your IDE and your AI assistant gets read access to your cluster — without leaving the editor.

# claude code
/mcp add kubebolt

OSS · Self-Hosted

Available today

Free forever. No caps.

Apache 2.0. You operate the infra. KubeBolt operates nothing.

  • Clustersunlimited
  • Nodes / podsunlimited
  • Usersunlimited
  • Retentionyour disk
  • AI CopilotBYOK · unlimited
  • SupportGitHub · Community
Install in 60 seconds →

SaaS · Cloud

Coming 2026

Free hosted tier. Caps included.

We operate the control plane. You only deploy the agent. Lead-magnet limits.

  • Clusters2
  • Nodes / pods10 / 150
  • Active users3 (hard cap)
  • Retention15 days
  • AI credits / mo500 (cutoff)
  • Webhooks · custom rules3 · 3
Join the waitlist →
04

Module — Cluster Lifecycle

Roadmap · 2026 · Business / Enterprise

The module
that will pay for itself.

Most clusters run 24/7 even though humans don't. KubeBolt's lifecycle module will schedule power-on / power-off across your fleet — preview environments, dev clusters, weekend downtime — and track the savings. Shipping with the KubeBolt Business tier in 2026.

Typical savings

~65%

on non-production cluster spend, when scheduled to nights & weekends with smart warm-up before working hours.

The infrastructure savings often exceed the KubeBolt subscription itself.

Supported platforms

  • EKS
  • AKS
  • GKE
  • OpenShift
  • Generic node pools

Per-cluster schedules, weekday / weekend rules, manual override, audit log of every power transition. Designed not to interrupt CI runs in flight.

05

Architecture — Determinism first

L1 shipped · L2–L6 roadmap 2026

Skills.
The cheapest LLM call
is the one you don't make.

A Skill is a declarative, deterministic diagnostic routine — a recipe for a known failure pattern. The L1 Detectors layer ships today as the 12-rule Insights Engine below. The full six-layer hierarchy lands with Autopilot in 2026.

L1

SHIPPED

Detectors

Deterministic. No AI.

L2

ROADMAP

Router

Haiku-class. Triage.

L3

ROADMAP

Investigator

Sonnet-class. RCA.

L4

ROADMAP

Planner

Sonnet / Opus.

L5

ROADMAP

Executor

Deterministic. Guardrails.

L6

ROADMAP

Postmortem

Sonnet / Opus.

06

Insights Engine

12 rules. Zero configuration.

Continuous evaluation against proven heuristics. Actionable recommendations, not raw PromQL. Each rule is a Skill — so the engine works even before any model is configured.

Crash loop detected Critical
OOM killed Critical
Zero replicas Critical
Node not ready Critical
Image pull backoff Critical
CPU throttle risk Warning
Memory pressure Warning
HPA maxed out Warning
PVC pending Warning
Frequent restarts Warning
Under-requested resources Info
Evicted pods Info
07

Install

One command.
Any cluster.

OCI chart on GHCR. Configurable RBAC, Ingress, auth, resources.

helm install kubebolt \
  oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt

kubectl port-forward svc/kubebolt 3000:80

Need the full reference? Read the docs →

08

Architecture

Lightweight by design.

Source

Kubernetes

  • API Server
  • Metrics Server
  • OTel Collector (forked)

Core

KubeBolt Engine

  • Go · Auth · RBAC
  • BoltDB embedded
  • Insights Engine · 12 rules
  • MCP servers

Surfaces

Where you work

  • Web UI · 23 views
  • Slack · Discord · Email
  • Cursor · Claude Code (MCP)
Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript · Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript ·

Early access

Be first on the
commercial cloud.

The open-source agent is free forever. KubeBolt Cloud — hosted Autopilot, Lifecycle Management, and team SSO — launches in 2026. Join the waitlist for early access and founding-customer pricing.

No spam. Unsubscribe with one click. We share product updates only.