Alert Correlation

Overview

When a single issue causes multiple alerts, OpsWorker correlates them to avoid redundant investigations and provide a unified view of the problem.

How Correlation Works

Topology-Based Correlation

During investigation, OpsWorker discovers the dependency chain for the alerting resource. If multiple alerts fire for resources in the same chain, they're connected:

graph TD
    A["Alert: Pod CrashLoopBackOff"] --> RC[Root Cause]
    B["Alert: Service 503 errors"] --> RC
    C["Alert: Ingress timeout"] --> RC
    RC["Deployment misconfiguration<br/>(single root cause)"]

A pod crash causes service endpoint loss, which causes ingress errors. Three alerts, one root cause.

Timeline Correlation

Alerts that fire within a similar timeframe for related resources are identified as potentially linked. This helps the AI focus on a common root cause rather than investigating each symptom independently.

Benefits

Without Correlation	With Correlation
3 separate alerts → 3 investigations	3 alerts → 1 investigation covering all resources
3 Slack notifications	1 comprehensive notification
Engineer investigates each separately	Single root cause identified with full context

Correlation in Practice

During Automatic Investigation

When an alert triggers an investigation, the topology discovery step finds related resources. If other alerts have fired for those resources, the investigation covers them all.

In the Portal

The investigation detail page shows all affected resources in the topology view, making it clear how alerts are connected.

In the Daily Digest

The daily summary groups related alerts to show actual incident count rather than raw alert count.

Next Steps

How Investigations Work — Topology discovery in detail
Noise Reduction — Reduce duplicate alerts

Overview​

How Correlation Works​

Topology-Based Correlation​

Timeline Correlation​

Benefits​

Correlation in Practice​

During Automatic Investigation​

In the Portal​

In the Daily Digest​

Next Steps​