Aurora

Agentic Interface to the Cloud

Explore Features Below

The challenge

Root cause analysis takes too long

SREs spend hours digging through logs, metrics, and cloud consoles trying to find the root cause of incidents. Aurora gives you instant answers.

What caused this production incident at 3 AM?

Why did our API response times suddenly spike?

What's the root cause of this service degradation?

Why are we seeing intermittent timeout errors?

Which infrastructure change broke our deployment?

What triggered this security alert in our cloud environment?

Why is our database experiencing connection pool exhaustion?

How can we prevent this incident from happening again?

Why did our Kubernetes pods crash during the deployment?

What's causing the memory leak in our application?

Why is our cache hit rate dropping suddenly?

What's causing these sporadic 503 errors?

The SRE Flow

Aurora fits into your stack

Aurora sits between incident alerts and your postmortems — transforming hours of manual investigation into instant root cause analysis.

Incident

Production
issue detected

Monitoring

PagerDuty,
DataDog, Grafana

Aurora

•

Automatically investigate incidents

•

Query logs and metrics

•

Identify root causes instantly

Resolution

Postmortem
& prevention

Most SRE teams have great monitoring platforms and solid postmortem processes. But between the alert firing and understanding what happened, there's a critical gap of manual investigation. That's where Aurora lives.

See Aurora in Action

Watch how Aurora investigates infrastructure issues automatically, querying your cloud providers and analyzing logs in real-time.

Aurora Chat

I have an issue with app-server-prod-01. Can you investigate it?

I'll investigate the app-server-prod-01 VM for you. Let me start by checking its current status and configuration.

gcloud compute instances describe app-server-prod-01 --zone us-central1-a

Now let me check the firewall rules that might be affecting this instance:

gcloud compute firewall-rules list --filter="targetTags:web-server OR name~web"

Let me also check if there are any Cloud Logging errors for this instance:

gcloud logging read "resource.type=gce_instance AND resource.labels.instance_id=5428493216708587129" --limit 20 --format="table(timestamp,severity,textPayload)"

Let me test connectivity to the instance:

timeout 5 curl -v http://34.122.18.110 2>&1 | head -30

Investigation Complete - Critical Issues Found

I've identified three major problems with app-server-prod-01:

ISSUE #1: FIREWALL BLOCKING ALL WEB TRAFFIC (CRITICAL)

Problem: A deny firewall rule is actively blocking all web access to your server.

Details:

Rule Name: web-app-deny
Action: DENY (blocks traffic)
Priority: 900 (higher priority = takes precedence over allow rules)
Blocked Ports: 80, 443, 8080, 3000
Target: Instances with tag web-server (your VM has this exact tag!)
Source: 0.0.0.0/0 (blocks from everywhere)

Evidence: Connection timeout when testing http://34.122.18.110 - the firewall is dropping packets before they reach the VM.

Impact: Your application is completely unreachable from the internet.

ISSUE #2: BROKEN STARTUP SCRIPT (CRITICAL)

Problem: The startup script contains multiple errors that prevent proper application startup.

Script errors: apt-get install -y nonexistent-package-xyz (Package doesn't exist - install will fail) systemctl start application-service (Service doesn't exist - will fail) echo "Application started" > /root/app.log (May have permission issues) python3 -m http.server 80 (Requires root, conflicts with other services)

Impact: Application fails to start properly on boot. Even if the firewall was open, nothing would be listening.

ISSUE #3: UNDERSIZED INFRASTRUCTURE (WARNING)

Problem: VM is too small for production workloads.

Current Specs:

Machine Type: e2-micro (0.25-1 vCPU shared, 1GB RAM)
Disk Size: 10GB
Labels: environment=production, tier=frontend

Impact: Performance degradation, potential out-of-memory crashes, disk space constraints for a production frontend server.

Current Status Summary

VM Status: RUNNING (but not functional)
External IP: 34.122.18.110
Zone: us-central1-a
Network: default VPC
OS: Debian 11 (Bullseye)

Recommended Actions

Would you like me to fix these issues? I can:

Delete the blocking firewall rule (web-app-deny)
Fix the startup script (remove broken commands, add working application setup)
Upgrade the VM to appropriate production specs (e.g., e2-medium or e2-standard-2)

Which would you like me to tackle first, or should I fix all three?

Aurora automatically investigates incidents, queries cloud providers, and provides root cause analysis in seconds.

Root Cause Analysis Across All Clouds

Ask Aurora about incidents across AWS, Azure, and GCP in natural language. Get instant answers without switching between provider consoles.

AWS

Azure

Google Cloud

Multi-Cloud Queries

Ask questions about resources across all your cloud providers at once

Instant Incident Analysis

Quickly identify root causes across AWS, Azure, and GCP infrastructure

Cross-Cloud Context

Understand how resources interact across different cloud providers

Natural Language RCA

Simply ask Aurora what caused an issue, regardless of which cloud it's on

Complete Infrastructure Coverage

Aurora automatically investigates incidents across your infrastructure, giving you instant answers when things go wrong.

Kubernetes Support

Debug pods, check logs, and analyze resource usage across all your clusters with natural language commands.

Aurora Chat

Why did my Kubernetes pods crash?

Checking pod status and resource usage across your clusters...

kubectl describe pod app-deployment-xyz --namespace production

kubectl top pods --namespace production

Root Cause Found

Memory limit set to 128Mi but application requires minimum 512Mi. OOMKilled events detected in last 3 hours.

Recommendation: Update deployment to use at least 512Mi memory limit.

SSH Anywhere

Securely execute commands on any server through SSH. Aurora manages credentials and runs commands in isolated environments.

Aurora Chat

Check disk space on prod-server-01 and restart nginx if needed

Connecting to prod-server-01 via SSH and checking system resources...

ssh user@prod-server-01 'df -h'

ssh user@prod-server-01 'sudo rm -rf /var/log/*.old && sudo systemctl restart nginx'

Action Completed

Disk usage at 94% on /var partition. Cleared old logs (freed 12GB). Nginx restarted successfully.

Status: Service is now running normally.

Multi-Cloud Intelligence

Query AWS, Azure, and GCP simultaneously. Aurora correlates data across providers to give you the complete picture.

Aurora Chat

Why is my production API returning 502 errors?

Investigating your load balancer and backend services across AWS and Azure...

aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/prod-api/abc123

az network lb show --resource-group production-rg --name prod-api-lb --query 'backendAddressPools[0].backendIPConfigurations[*].id' -o table

az network nic show-effective-nsg --resource-group production-rg --name prod-vm-1-nic

Root Cause Found

Your AWS Application Load Balancer is healthy, but the Azure backend pool has no healthy instances.

Azure VM Details:

All 3 VMs in the backend pool are marked unhealthy
Health probe failing on port 8080
VMs are running but application not responding

Impact: 50% of traffic routed to Azure is failing, causing intermittent 502 errors for users.

Ask Aurora about any incident in natural language. It automatically queries logs, checks metrics, and correlates events to find root causes across AWS, Azure, and GCP.

AI that's built with enterprise-grade
security and privacy.

We provision secure, sandboxed Kubernetes environments on-demand to execute operations with complete isolation. Network policies block internal cluster access, and all pods run with dropped capabilities.