Skip to main content

Alert Correlation

Overview

When a single issue causes multiple alerts, OpsWorker correlates them to avoid redundant investigations and provide a unified view of the problem.

How Correlation Works

Topology-Based Correlation

During investigation, OpsWorker discovers the dependency chain for the alerting resource. If multiple alerts fire for resources in the same chain, they're connected:

graph TD
A["Alert: Pod CrashLoopBackOff"] --> RC[Root Cause]
B["Alert: Service 503 errors"] --> RC
C["Alert: Ingress timeout"] --> RC
RC["Deployment misconfiguration<br/>(single root cause)"]

A pod crash causes service endpoint loss, which causes ingress errors. Three alerts, one root cause.

Timeline Correlation

Alerts that fire within a similar timeframe for related resources are identified as potentially linked. This helps the AI focus on a common root cause rather than investigating each symptom independently.

Benefits

Without CorrelationWith Correlation
3 separate alerts → 3 investigations3 alerts → 1 investigation covering all resources
3 Slack notifications1 comprehensive notification
Engineer investigates each separatelySingle root cause identified with full context

Correlation in Practice

During Automatic Investigation

When an alert triggers an investigation, the topology discovery step finds related resources. If other alerts have fired for those resources, the investigation covers them all.

In the Portal

The investigation detail page shows all affected resources in the topology view, making it clear how alerts are connected.

In the Daily Digest

The daily summary groups related alerts to show actual incident count rather than raw alert count.

Next Steps