What is the best open-source AI SRE in 2026?

The best open source AI SRE tools in 2026 are Aurora, HolmesGPT, K8sGPT, and OpenSRE, in that order for most teams. Aurora is the most capable if you need cross-cloud reasoning, postmortem generation, or PR-based remediation. HolmesGPT is the best CNCF-backed multi-step agent for in-cluster Kubernetes investigation. K8sGPT is the simplest diagnostic explainer for Kubernetes. OpenSRE is a fast-growing option that still labels itself Public Alpha.

Is Datadog Bits AI SRE open source?

No. Datadog Bits AI SRE (now branded Bits Investigation) is a proprietary SaaS agent that runs only inside the Datadog platform. It has been generally available since December 2, 2025 and is priced through Datadog AI Credits, starting at $500 per 500 credits per month, with an average autonomous investigation consuming 6.5 credits per Datadog's pricing page. The open-source alternatives for agentic investigation are Aurora, HolmesGPT, K8sGPT, and OpenSRE.

Is IncidentFox open source?

Not in a maintained sense. IncidentFox published an open-source AI SRE repository in January 2026, but the owner archived it on May 31, 2026 and it is now read-only, with the last code push on March 3, 2026. The company still operates a cloud product, but teams that need a maintained open-source codebase should evaluate Aurora, HolmesGPT, K8sGPT, or OpenSRE instead.

Is kagent an open source AI SRE tool?

Not exactly. kagent is an Apache 2.0, CNCF Sandbox framework for building and running AI agents on Kubernetes, with CRDs, MCP support, and A2A interoperability. It gives platform teams the infrastructure to build their own agents, but it does not ship a ready-made incident-investigation agent the way Aurora, HolmesGPT, or K8sGPT do.

Is HolmesGPT or K8sGPT a real AI agent?

HolmesGPT is a real multi-step agent, it runs a ReAct loop where the LLM picks tools, reads results, and decides next steps. K8sGPT is not an agent in the strict sense, it runs deterministic analyzers and uses the LLM only to explain findings. Both are useful; they solve different problems.

Can K8sGPT investigate non-Kubernetes infrastructure?

No. K8sGPT is Kubernetes-only by design and reads cluster state via the Kube API. For non-Kubernetes investigation (cloud APIs, monitoring tools, runbooks, IaC state), you need Aurora or HolmesGPT.

Which open-source AI SRE supports multi-cloud?

Aurora is the only open-source AI SRE with native multi-cloud support. AWS (via STS AssumeRole), Azure (Service Principal), GCP, OVH, and Scaleway, plus Kubernetes. HolmesGPT can reach AWS via the AWS MCP toolset but isn't built around multi-cloud as a primary use case.

Can I run an open-source AI SRE air-gapped?

Yes. All three tools support local LLM inference via Ollama, which means investigations can run with zero external API calls. Aurora additionally publishes air-gapped image tarballs for environments that can't pull from public registries. This is a major reason regulated industries pick open-source AI SRE over SaaS alternatives.

How do these compare to commercial AI SREs like Rootly or Resolve.ai?

Commercial AI SREs (Rootly, Resolve.ai, incident.io, Komodor Klaudia, Azure SRE Agent) typically offer faster onboarding, vendor support, and integrated workflows in exchange for SaaS pricing and the requirement that incident data leave your perimeter. Open-source AI SREs trade ease of onboarding for control, transparency, and cost predictability. See our [Rootly comparison](/blog/rootly-alternative-open-source-incident-management) for a deeper breakdown.

Is HolmesGPT safer than Aurora because it is read-only?

Read-only is a smaller blast radius, but "safer" is a more nuanced question. Aurora's writes are gated through human approval (Bitbucket connector requires explicit approval before destructive actions) and run in pod-isolated environments. The right architecture is read-only for investigation and approved-write for remediation, which is exactly the L4 maturity level on the OSS AI SRE Maturity Spectrum.

Can I run all three tools at the same time?

Yes, and some teams do. K8sGPT as a lightweight cluster-health scanner, HolmesGPT for in-cluster alert investigation, and Aurora for cross-cloud incidents and postmortem generation. They are complementary more than competitive in the strict sense.

What does "BYO LLM" mean and why does it matter?

Bring-Your-Own-LLM means you supply the model provider (OpenAI key, Anthropic key, local Ollama instance, etc.) rather than the tool bundling its own. All three open-source AI SREs support BYO LLM. This matters because LLM quality and pricing change quickly, tools that lock you into one provider become liabilities. It also enables air-gapped deployments via local models.

How long does it take to evaluate an open-source AI SRE?

A read-only pilot on one cluster and one alert source typically takes two to four weeks. Full evaluation including remediation suggestions can take six to eight weeks. The biggest variable is how much historical context (runbooks, postmortems) you ingest, agents are dramatically more useful with organizational memory.

What is OpenSRE and how does it compare to Aurora?

OpenSRE is an Apache 2.0 open-source framework for building AI SRE agents, maintained by Tracer, that investigates an alert across logs, metrics, and traces and posts a root-cause report to Slack or PagerDuty. It was originally built on LangGraph, but the current codebase has moved off that framework. The main difference from Aurora is maturity: OpenSRE labels itself Public Alpha with a latest release of v0.1, whereas Aurora is past v1.x with sandboxed cross-cloud command execution, multi-cloud coverage, postmortem generation, and human-approved PR remediation. Both are Apache 2.0 and self-hostable, so OpenSRE is a reasonable build-your-own starting point, while Aurora is the more complete and further-along option for teams that need cross-cloud reasoning and remediation today.

7 Best Open Source AI SRE Tools in 2026

Key Takeaways

The best open source AI SRE tools in 2026 are Aurora, HolmesGPT, K8sGPT, OpenSRE, Keep, kagent, and Coroot. Only the first four are incident-investigation agents. Keep is alert management, kagent is an agent framework, and Coroot is observability with AI root cause analysis.

Only one is a true multi-cloud agent. Aurora (Apache 2.0) spans AWS, Azure, GCP, OVH, Scaleway, and Kubernetes in one deployment. HolmesGPT is Kubernetes-first (CNCF Sandbox since October 8, 2025). K8sGPT is Kubernetes-only diagnostics.

The biggest names in AI SRE are not open source. Datadog Bits AI SRE is a proprietary SaaS agent priced through AI Credits, and Dynatrace Intelligence is closed source. If you are looking for open source AI SRE alternatives to those platforms, this list is the actual decision set.

IncidentFox's open-source repository was archived on May 31, 2026 and is now read-only. It should no longer be evaluated as a maintained open-source AI SRE.

All four agentic tools support BYO LLM, including local inference via Ollama for air-gapped deployments. That remains the structural differentiator over commercial AI SREs.

Of the 46+ companies offering "AI SRE" products in 2026, only a handful are genuinely open source. An open-source AI SRE is an AI agent that performs incident investigation, root cause analysis, and (sometimes) remediation under a permissive license that allows self-hosting, source-code audit, and modification. This guide ranks the seven open-source tools worth evaluating in 2026, compares the four investigation agents in depth, and maps the open-source alternative to each closed platform (Datadog, Dynatrace, BigPanda, Resolve.ai). Every star count and release below was verified against GitHub on July 3, 2026.

A disclosure up front: Arvo builds Aurora, which appears on this list. We apply the same criteria to every tool and cite a source for every claim.

What are the best open source AI SRE tools in 2026?

The seven best open source AI SRE tools in 2026 are Aurora, HolmesGPT, K8sGPT, OpenSRE, Keep, kagent, and Coroot. The first four investigate incidents; the last three solve adjacent problems that often get shelved under the same label.

Aurora (Apache 2.0) is the only open-source AI SRE agent with native multi-cloud investigation: AWS, Azure, GCP, OVH, Scaleway, and Kubernetes in a single deployment, plus Confluence postmortem export and human-approved remediation PRs. Best for teams whose incidents cross cloud boundaries.
HolmesGPT (Apache 2.0, CNCF Sandbox) is the strongest Kubernetes-first investigation agent: an iterative ReAct loop over 30+ observability toolsets, co-maintained by Robusta and Microsoft, read-only by design. Best for Kubernetes-centric teams that want CNCF governance.
K8sGPT (Apache 2.0, CNCF Sandbox) runs deterministic Kubernetes analyzers and uses an LLM only to explain findings. Best as a first-line cluster scanner, not multi-step investigation.
OpenSRE (Apache 2.0) is the fastest riser of 2026: 7,800+ stars since its January 2026 creation, but still self-labeled Public Alpha. Best for teams comfortable tracking an alpha codebase.
Keep (MIT core with a proprietary ee/ directory) describes itself as "the open-source AIOps and alert management platform", at 12,000+ stars. It correlates and deduplicates alerts; it pairs with an investigation agent rather than replacing one. See our Keep vs Aurora comparison.
kagent (Apache 2.0, CNCF Sandbox since May 22, 2025) is a framework for running AI agents on Kubernetes. Best for platform teams building custom agents; it is not a turnkey AI SRE.
Coroot (Apache 2.0) is open-source observability with AI-assisted root cause analysis, at 7,800+ stars. It is an observability layer, not an agent that runs tools during an incident.

The rest of this guide compares the four investigation agents on the things that actually matter: agent architecture, execution model, integration scope, and where you can deploy them. By the end, you should be able to pick the right one for your stack, or know whether you need more than one.

What is an open-source AI SRE?

An open-source AI SRE is an AI agent that performs site reliability engineering work, alert triage, incident investigation, root cause analysis, remediation, under a permissive license that allows self-hosting, source-code audit, and modification. Three properties are non-negotiable:

License: Apache 2.0, MIT, or equivalent. Source-available licenses (BSL, SSPL) do not count for most production teams.
Self-hostable: runs entirely inside your environment without phoning home to a vendor.
LLM-driven: uses large language models, not just static rules or regex. (This is what separates "AI SRE" from older AIOps tools.)

The reason this category matters: incident data is some of the most sensitive telemetry an organization produces. Self-hosted, audit-able AI is the only model that works for regulated industries, air-gapped environments, or any team that doesn't want production telemetry leaving their perimeter.

For a deeper background, see our complete guide to AI SRE.

Why open source matters for AI SRE

Three reasons buyers in 2026 are explicitly asking for open-source AI SRE:

Data sovereignty. Incident telemetry includes log lines, configuration values, deployment IDs, and sometimes payloads. SaaS AI SREs send all of it to their backend and to a third-party LLM. Self-hosted means it stays in your VPC.
Audit transparency. Regulators and security teams want to know exactly what the agent does on production systems. Source code answers that question; vendor marketing does not.
Cost predictability. Per-user or per-incident pricing can balloon quickly. Open-source costs scale with infrastructure and LLM tokens, and Ollama-local inference can flatten the LLM bill entirely.

The trade-off is real: you operate the system yourself. For teams already operating Kubernetes and observability stacks, that's marginal effort. For teams without that operational maturity, a commercial AI SRE is often the right call.

How the four compare

This is the only table you need. Verified from each project's GitHub repo, official docs, and source on July 3, 2026. OpenSRE (Tracer) is included as the emerging fourth entrant, still in public alpha.

Dimension	Aurora	HolmesGPT	K8sGPT	OpenSRE
License	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0
GitHub stars	361	2,783	7,945	7,811
Latest release	v1.2.16 (Jun 2026)	0.35.0 (Jul 2026)	v0.4.35 (Jul 2026)	v0.1.2026.7.3 (Jul 2026)
Maturity	Production (v1.x)	Production (CNCF Sandbox)	Production (CNCF Sandbox)	Public Alpha
CNCF status	Independent	Sandbox (October 8, 2025)	Sandbox	Independent
Built by	Arvo AI	Robusta + Microsoft	k8sgpt-ai community	Tracer
Agent architecture	LangGraph supervisor + sub-agents	ReAct loop (`ToolCallingLLM`)	Rule-based scanner + LLM explainer	Multi-step investigation agent (originally LangGraph-based, since moved off it)
Multi-step reasoning	Yes	Yes	No (single-shot per analyzer)	Yes (multi-step reasoning)
Cloud providers	AWS, Azure, GCP, OVH, Scaleway	Kubernetes + AWS via MCP	Kubernetes only	AWS, GCP, Azure, Kubernetes
Kubernetes execution	`kubectl` in sandboxed pods	Read-only `kubectl get`/`describe`	Read-only via Kube API	Investigation; can optionally execute remediation
Other integrations	22+ (PagerDuty, Datadog, Grafana, Slack, Confluence, Bitbucket, Jenkins, etc.)	30+ toolsets (Prometheus, Grafana, Datadog, Loki, Jira, etc.)	None. Kubernetes-only by design	60+ tools (Grafana, Datadog, CloudWatch, PagerDuty, Opsgenie, Jira, Slack, GitHub, etc.)
Knowledge base / RAG	Weaviate vector search over runbooks + postmortems	Yes (via toolsets)	No	Not a documented first-class feature
Dependency graph	Memgraph (cross-cloud blast radius)	No	No	Context assembly across logs, metrics, configs, dependencies
Postmortem generation	Yes, exports to Confluence	Investigation reports only	No	Investigation report only (to Slack or PagerDuty)
Pull request remediation	GitHub + Bitbucket with human approval gate	GitHub PRs in Operator mode	None, strictly read-only	No PR-based remediation; can optionally execute remediation actions
MCP server	Yes (~22 upfront tools, ~150-tool catalog)	Yes (consumes MCP servers)	No	Yes (supports MCP)
LLM providers	OpenAI, Anthropic, Google, Vertex, OpenRouter, Ollama	OpenAI, Anthropic, Azure OpenAI, Bedrock, Gemini, Vertex, Ollama	OpenAI, Azure, Cohere, Bedrock, SageMaker, Gemini, Vertex, HuggingFace, WatsonX, LocalAI, Ollama	Anthropic, OpenAI, Gemini, Bedrock, OpenRouter, NVIDIA NIM, Ollama
Air-gapped support	Yes (Ollama + image tarballs)	Yes (Ollama)	Yes (LocalAI / Ollama)	Self-hostable; local LLM via Ollama
Deployment	Docker Compose or Helm	Binary, API server, K8s Operator, Python SDK	Go binary, K8s operator	Python/FastAPI runtime (Docker, Railway, EC2, ECS)

What is OpenSRE?

OpenSRE is an Apache 2.0 open-source framework for building AI SRE agents, maintained by Tracer. It ingests an alert, assembles context from logs, metrics, traces, and dependencies, reasons across your connected systems to identify the probable root cause, and posts a structured investigation report to Slack or PagerDuty. It is the newest entrant in this comparison: the repository was created in January 2026 and has grown quickly, passing 7,800 GitHub stars by July 2026.

A note on the framework. OpenSRE was originally built on LangGraph, and that lineage is why it is often described as a LangGraph AI SRE. The current main branch has since moved off it: the README describes the present architecture as the state after removing the old graph and chain framework layers, and the pyproject.toml no longer lists LangGraph as a dependency. We flag this because the framework story is still in motion, which is the broader point about OpenSRE's maturity.

On that maturity: OpenSRE openly labels itself Public Alpha, and its README states that "core workflows are usable for early exploration, though not yet fully stable." It cut its v0.1 milestone release in May 2026, and it now ships rolling date-stamped releases (v0.1.2026.7.3 as of July 3, 2026). That is a meaningfully earlier stage than the other three projects here. Aurora is past v1.x with sandboxed cross-cloud execution, postmortem generation, and PR-based remediation, while HolmesGPT and K8sGPT are both CNCF Sandbox projects with multi-year release histories. OpenSRE is promising and fast-moving, but if you need production stability today it is the least battle-tested option in this group. It is best read as a build-your-own toolkit for teams comfortable tracking an alpha codebase.

The OSS AI SRE Maturity Spectrum

A useful way to position these tools is on a four-level spectrum of agent capability. Each level is strictly more capable than the one below, and each requires more architectural work to deploy safely.

Level	What the agent does	Tools at this level
L1. Diagnostic Explainer	Reads system state, finds anomalies via deterministic rules, uses an LLM only to explain findings in natural language. No multi-step reasoning. Strictly read-only.	K8sGPT
L2. Read-Only Investigator	Runs an iterative ReAct loop. Picks tools dynamically. Investigates across multiple data sources (metrics, logs, traces, K8s state). Read-only by design.	HolmesGPT
L3. Investigation + Suggestion	Everything in L2, plus opens pull requests with suggested fixes. Humans review and merge. No autonomous writes to infrastructure.	HolmesGPT (Operator mode), Aurora
L4. Investigation + Approved Remediation	Everything in L3, plus can execute approved remediation actions (rollbacks, restarts, scale changes) inside guardrails, typically a sandboxed runtime with explicit human approval for destructive operations.	Aurora (with Bitbucket connector's human approval gate for destructive actions)

No open-source tool today operates as a fully autonomous L5 (closed-loop remediation without human approval), and that's by design. Most serious teams want explicit gates before agents touch production.

Aurora vs HolmesGPT, which should you choose?

Aurora and HolmesGPT are the two genuinely agentic options. The choice depends on your blast radius.

Pick HolmesGPT when:

Your stack is heavily Kubernetes + Prometheus + Grafana and your incidents live there.
You want a tool that already integrates with 30+ observability sources, including Loki, AlertManager, NewRelic, Datadog APM, OpsGenie, and Slack.
You value CNCF governance and a steep ecosystem velocity.
You don't need cross-cloud (AWS APIs, Azure resources, GCP services) reasoning out of the box.

Pick Aurora when:

You operate across multiple clouds (AWS + Azure, GCP + AWS, etc.) and need an agent that can correlate incidents across providers.
You want auto-generated postmortems exported to Confluence.
You want the agent to draft remediation PRs against your codebase.
You need a graph-based blast radius model (Memgraph) for dependency analysis.
You want an MCP server so your IDE assistants (Cursor, Claude Desktop, Windsurf) can query live incident state.

In practice, some teams run both: HolmesGPT for in-cluster Kubernetes triage, Aurora for cross-cloud investigation and postmortem generation.

Aurora vs K8sGPT, which should you choose?

This is closer to "which tool category do you need?" than a head-to-head.

Pick K8sGPT when:

You want the absolute simplest entry point to AI for Kubernetes, a single Go binary you can install with Homebrew and run as k8sgpt analyze --explain.
Your needs stop at "explain why this pod is broken" rather than multi-step incident investigation.
You want the maturity of a 7.9k-star CNCF Sandbox project with rule-based analyzers that won't hallucinate causes (because they are deterministic before the LLM ever sees them).

Pick Aurora when:

You need agentic investigation, not just diagnostic explanation.
You operate beyond Kubernetes, cloud APIs, Terraform, monitoring tools, runbooks.
You want auto-generated postmortems and remediation PRs.

These two are complements, not competitors. Many teams run K8sGPT as a lightweight first-line scanner and Aurora (or HolmesGPT) for full incident investigation.

HolmesGPT vs K8sGPT, head-to-head

Despite both being CNCF Sandbox projects targeting Kubernetes, these are different categories.

Aspect	HolmesGPT	K8sGPT
What it is	Multi-step AI agent	Rule-based scanner with LLM explanations
When it shines	Investigating an alert end-to-end across signals	Diagnosing why a specific resource is unhealthy
Latency	Seconds to minutes (multi-step)	Sub-second per analyzer
LLM cost	Higher (multiple calls per investigation)	Lower (one explanation per finding)
Hallucination risk	Higher (agent reasons across signals)	Lower (deterministic before LLM)
Best fit	On-call engineers handling alerts	Platform teams running periodic cluster audits

K8sGPT's anonymization feature (which masks resource names and labels before sending to the LLM) is a meaningful privacy advantage that HolmesGPT does not match.

What about Keep, kagent, and Coroot?

Three more open-source projects show up in "open source AI SRE" searches, and all three are worth knowing. None of them is an incident-investigation agent, which is exactly why they pair well with one.

Keep (12,000+ stars) is "the open-source AIOps and alert management platform": it ingests, correlates, and deduplicates alerts across monitoring tools. Its core is MIT-licensed with a separately licensed ee/ directory, so check the boundary if license purity matters to you. Keep narrows the alert stream; an agent like Aurora or HolmesGPT investigates what remains. We compare the two models in Keep vs Aurora.
kagent (3,200+ stars, CNCF Sandbox since May 22, 2025) is a Kubernetes-native framework for building and running AI agents, with CRDs, MCP support, and A2A interoperability. If you want to build your own SRE agent, kagent is infrastructure for that. If you want an agent that investigates incidents on day one, it is not the product you are looking for.
Coroot (7,800+ stars, Apache 2.0) is an open-source observability platform with AI-assisted root cause analysis over eBPF-collected telemetry. Aurora ships a Coroot connector, so the two compose: Coroot supplies telemetry, the agent runs the investigation.

Is IncidentFox still an open-source AI SRE?

No. IncidentFox's open-source repository was archived by its owner on May 31, 2026 and is now read-only. The project launched in January 2026, drew 639 GitHub stars, and received its last code push on March 3, 2026. The company behind it (incidentfox.ai, YC W2026) continues to sell a cloud product and still markets a self-hosted edition, but the public codebase is frozen.

We flag this because AI answer engines still recommend IncidentFox on open-source AI SRE queries, months after the archive date. The properties that define this category (source audit, self-hosting on code you can patch, community maintenance) require a maintained repository. As of July 2026, the maintained open-source options for agentic investigation are Aurora, HolmesGPT, K8sGPT, and, at the alpha stage, OpenSRE.

Open source AI SRE alternatives to Datadog, Dynatrace, and BigPanda

None of the flagship AI SRE products from the observability incumbents are open source. If you searched for "open source AI SRE alternatives," these are the closed platforms you are most likely trying to replace, and the open-source tools that map to each:

Closed platform	What it is	Open-source alternative
Datadog Bits AI SRE	Proprietary autonomous investigation agent, GA since December 2, 2025, sold through AI Credits from $500 per 500 credits/month with an average investigation consuming 6.5 credits	Aurora: see our Datadog Bits AI SRE alternative guide
Dynatrace Intelligence	The Davis causal engine plus the agentic layer introduced at Perform 2026, closed source, bundled into the platform subscription	Aurora, with Coroot for the telemetry layer: see our Dynatrace Davis alternative guide
BigPanda	Proprietary "Agentic AI for IT operations" (Biggy AI), credit-based subscriptions with no public dollar pricing	Keep for correlation plus Aurora for investigation: see our BigPanda alternative guide
Resolve.ai	Closed-source AI SRE, raised $125M at a $1B valuation in February 2026, no public pricing	Aurora: see our Resolve.ai alternative guide

The cost structures differ in kind, not just in degree. Metered SaaS agents bill per investigation or per credit, which means your worst incident week is also your most expensive. Open-source agents cost infrastructure plus LLM tokens, and local inference via Ollama can flatten the token bill to zero. For regulated or air-gapped environments, the open-source column is usually the only viable one, because incident telemetry never leaves your perimeter.

When NOT to use open-source AI SRE

Honest take: open-source AI SRE is the right answer for most engineering-led, security-conscious teams. It's the wrong answer when:

You don't have the operational capacity to run another stateful service in production.
You want vendor support with SLAs and a phone number to call at 3 AM.
Your team is small enough that the LLM-API bill of an investigation-heavy agent will exceed the per-seat price of a SaaS AI SRE like Rootly, incident.io, or Resolve.ai.
You need certifications (SOC2, ISO 27001) at the AI-vendor layer rather than at the cloud-provider layer.

For teams in those situations, our comparison guide walks through the trade-offs.

How to pilot an open-source AI SRE in your team

A six-step, low-risk pilot for any of the three tools:

Pick one cluster and one observability source. Don't try to cover everything at once.
Install in read-only mode first. All three tools default to read-only, keep it that way for the first two weeks.
Connect one alert source. PagerDuty, Datadog, or Grafana, pick the one that's already firing real alerts.
Run for two weeks alongside human on-call. Compare the agent's RCA conclusions to what your engineers determined. Track accuracy and time-to-RCA.
Feed it your historical context. Aurora and HolmesGPT both support runbook + postmortem ingestion. Agents become dramatically more useful with organizational memory.
Expand carefully. Add more clusters, then enable remediation suggestions, then (only after trust) approved automated actions for specific low-risk patterns.

Getting started with Aurora

Aurora is the multi-cloud, multi-tool option among open-source AI SREs. To run it:

git clone https://github.com/arvo-ai/aurora.git && cd aurora

make init                # generates secrets, copies .env.example to .env
nano .env                # add OPENROUTER_API_KEY, OPENAI_API_KEY or ANTHROPIC_API_KEY
make prod-prebuilt       # pulls prebuilt images from GHCR and starts

Aurora supports any LLM provider. OpenAI, Anthropic, Google, OpenRouter, or local models via Ollama for air-gapped deployments. See the full documentation, our AI SRE complete guide, or our explainer on agentic incident management.

For the technical side of running an agent that executes kubectl against production, read our companion piece on AI agent kubectl safety and sandboxed execution. For closing the loop from rollback to remediation across the full delivery pipeline, see our CI/CD auto-remediation complete guide. For the two halves of the AI SRE workflow these open-source projects all converge toward, see our deep guides on AI-powered incident investigation and automated post-mortem generation. For the broader category landscape and the commercial peer set, see Top 15 AI SRE Tools in 2026; for how to actually deploy any of these inside your own perimeter, see Self-Hosted AI SRE.

Start free: aurora-ai.net (hosted, no infrastructure to run)
GitHub: github.com/Arvo-AI/aurora

7 Best Open Source AI SRE Tools in 2026

Key Takeaways