Aurora

Agentic Interface to the Cloud

AWS
Azure
Google Cloud
Explore Features Below
The challenge

Root cause analysis takes too long

SREs spend hours digging through logs, metrics, and cloud consoles trying to find the root cause of incidents. Aurora gives you instant answers.

What caused this production incident at 3 AM?

Why did our API response times suddenly spike?

What's the root cause of this service degradation?

Why are we seeing intermittent timeout errors?

Which infrastructure change broke our deployment?

What triggered this security alert in our cloud environment?

Why is our database experiencing connection pool exhaustion?

How can we prevent this incident from happening again?

Why did our Kubernetes pods crash during the deployment?

What's causing the memory leak in our application?

Why is our cache hit rate dropping suddenly?

What's causing these sporadic 503 errors?

The SRE Flow

Aurora fits into your stack

Aurora sits between incident alerts and your postmortems — transforming hours of manual investigation into instant root cause analysis.

Incident

Production
issue detected

Monitoring

PagerDuty,
DataDog, Grafana

Aurora

Automatically investigate incidents

Query logs and metrics

Identify root causes instantly

Resolution

Postmortem
& prevention

Most SRE teams have great monitoring platforms and solid postmortem processes. But between the alert firing and understanding what happened, there's a critical gap of manual investigation. That's where Aurora lives.

See Aurora in Action

Watch how Aurora investigates infrastructure issues automatically, querying your cloud providers and analyzing logs in real-time.

Aurora Chat
I have an issue with app-server-prod-01. Can you investigate it?

I'll investigate the app-server-prod-01 VM for you. Let me start by checking its current status and configuration.

GCPgcloud compute instances describe app-server-prod-01 --zone us-central1-a

Now let me check the firewall rules that might be affecting this instance:

GCPgcloud compute firewall-rules list --filter="targetTags:web-server OR name~web"

Let me also check if there are any Cloud Logging errors for this instance:

GCPgcloud logging read "resource.type=gce_instance AND resource.labels.instance_id=5428493216708587129" --limit 20 --format="table(timestamp,severity,textPayload)"

Let me test connectivity to the instance:

timeout 5 curl -v http://34.122.18.110 2>&1 | head -30

Investigation Complete - Critical Issues Found

I've identified three major problems with app-server-prod-01:


ISSUE #1: FIREWALL BLOCKING ALL WEB TRAFFIC (CRITICAL)

Problem: A deny firewall rule is actively blocking all web access to your server.

Details:

  • Rule Name: web-app-deny
  • Action: DENY (blocks traffic)
  • Priority: 900 (higher priority = takes precedence over allow rules)
  • Blocked Ports: 80, 443, 8080, 3000
  • Target: Instances with tag web-server (your VM has this exact tag!)
  • Source: 0.0.0.0/0 (blocks from everywhere)

Evidence: Connection timeout when testing http://34.122.18.110 - the firewall is dropping packets before they reach the VM.

Impact: Your application is completely unreachable from the internet.


ISSUE #2: BROKEN STARTUP SCRIPT (CRITICAL)

Problem: The startup script contains multiple errors that prevent proper application startup.

Script errors: apt-get install -y nonexistent-package-xyz (Package doesn't exist - install will fail) systemctl start application-service (Service doesn't exist - will fail) echo "Application started" > /root/app.log (May have permission issues) python3 -m http.server 80 (Requires root, conflicts with other services)

Impact: Application fails to start properly on boot. Even if the firewall was open, nothing would be listening.


ISSUE #3: UNDERSIZED INFRASTRUCTURE (WARNING)

Problem: VM is too small for production workloads.

Current Specs:

  • Machine Type: e2-micro (0.25-1 vCPU shared, 1GB RAM)
  • Disk Size: 10GB
  • Labels: environment=production, tier=frontend

Impact: Performance degradation, potential out-of-memory crashes, disk space constraints for a production frontend server.


Current Status Summary

  • VM Status: RUNNING (but not functional)
  • External IP: 34.122.18.110
  • Zone: us-central1-a
  • Network: default VPC
  • OS: Debian 11 (Bullseye)

Recommended Actions

Would you like me to fix these issues? I can:

  1. Delete the blocking firewall rule (web-app-deny)
  2. Fix the startup script (remove broken commands, add working application setup)
  3. Upgrade the VM to appropriate production specs (e.g., e2-medium or e2-standard-2)

Which would you like me to tackle first, or should I fix all three?

Aurora automatically investigates incidents, queries cloud providers, and provides root cause analysis in seconds.

Root Cause Analysis Across All Clouds

Ask Aurora about incidents across AWS, Azure, and GCP in natural language. Get instant answers without switching between provider consoles.

AWS
AWS
Azure
Azure
GCP
Google Cloud
Multi-Cloud Queries
Ask questions about resources across all your cloud providers at once
Instant Incident Analysis
Quickly identify root causes across AWS, Azure, and GCP infrastructure
Cross-Cloud Context
Understand how resources interact across different cloud providers
Natural Language RCA
Simply ask Aurora what caused an issue, regardless of which cloud it's on

Complete Infrastructure Coverage

Aurora automatically investigates incidents across your infrastructure, giving you instant answers when things go wrong.

Kubernetes Support

Debug pods, check logs, and analyze resource usage across all your clusters with natural language commands.

Aurora Chat
Why did my Kubernetes pods crash?
Checking pod status and resource usage across your clusters...
Kuberneteskubectl describe pod app-deployment-xyz --namespace production
Kuberneteskubectl top pods --namespace production

Root Cause Found

Memory limit set to 128Mi but application requires minimum 512Mi. OOMKilled events detected in last 3 hours.

Recommendation: Update deployment to use at least 512Mi memory limit.

SSH Anywhere

Securely execute commands on any server through SSH. Aurora manages credentials and runs commands in isolated environments.

Aurora Chat
Check disk space on prod-server-01 and restart nginx if needed
Connecting to prod-server-01 via SSH and checking system resources...
ssh user@prod-server-01 'df -h'
ssh user@prod-server-01 'sudo rm -rf /var/log/*.old && sudo systemctl restart nginx'

Action Completed

Disk usage at 94% on /var partition. Cleared old logs (freed 12GB). Nginx restarted successfully.

Status: Service is now running normally.

Multi-Cloud Intelligence

Query AWS, Azure, and GCP simultaneously. Aurora correlates data across providers to give you the complete picture.

Aurora Chat
Why is my production API returning 502 errors?
Investigating your load balancer and backend services across AWS and Azure...
AWSaws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/prod-api/abc123
Azureaz network lb show --resource-group production-rg --name prod-api-lb --query 'backendAddressPools[0].backendIPConfigurations[*].id' -o table
Azureaz network nic show-effective-nsg --resource-group production-rg --name prod-vm-1-nic

Root Cause Found

Your AWS Application Load Balancer is healthy, but the Azure backend pool has no healthy instances.

Azure VM Details:

  • All 3 VMs in the backend pool are marked unhealthy
  • Health probe failing on port 8080
  • VMs are running but application not responding

Impact: 50% of traffic routed to Azure is failing, causing intermittent 502 errors for users.

Ask Aurora about any incident in natural language. It automatically queries logs, checks metrics, and correlates events to find root causes across AWS, Azure, and GCP.

AI that's built with enterprise-grade
security and privacy.

We provision secure, sandboxed Kubernetes environments on-demand to execute operations with complete isolation. Network policies block internal cluster access, and all pods run with dropped capabilities.

01

Sandboxed Execution

Commands run in isolated Kubernetes pods with gVisor sandboxing

02

Complete Isolation

Every user operates with isolated credentials in ephemeral infrastructure

03

Network Policies

Strict network policies block internal cluster access from user pods

Backed By

Panache Ventures
Front Row Ventures

Some of our Clients

McGill University
Harbor Fab
I-MED Pharma
Canoe Intepretation
Guzzo Nano Research
MGN Logistisc.
McGill University
Harbor Fab
I-MED Pharma
Canoe Intepretation
Guzzo Nano Research
MGN Logistisc.