Skip to main content

Investigations

What is an Investigation

An investigation is the core unit of work in OpsWorker. It represents a complete AI-powered analysis of an alert, from receiving the signal to delivering a root cause analysis with remediation steps.

Each investigation contains:

  • Trigger: the alert or request that started the investigation.
  • Topology: discovered Kubernetes resources and their relationships.
  • Collected data: logs, events, and configurations gathered from the cluster.
  • Analysis: AI-generated root cause identification with a confidence level.
  • Recommendations: immediate actions and preventive measures, including specific kubectl commands.

Lifecycle

Investigations move through the following states:

stateDiagram-v2
[*] --> Pending: Alert received
Pending --> InProgress: Investigation starts
InProgress --> Completed: Analysis finished
InProgress --> Failed: Error occurred
Completed --> [*]
Failed --> [*]
StateDescription
PendingAlert received and investigation queued
In ProgressThe agent graph is actively discovering topology, collecting data, and analyzing
CompletedRoot cause identified, recommendations generated, notification sent
FailedInvestigation could not complete (for example, cluster connectivity lost or agent unavailable)

A completed investigation can also be a partial result: when the analysis cannot fully converge, OpsWorker still returns its best low-confidence findings rather than failing outright. See Investigation Lifecycle for details.

Most investigations complete in under 2 minutes from alert arrival to Slack notification.

Viewing Investigations

Investigations are available in two places:

  • OpsWorker Portal: full investigation detail with topology view, collected data, AI analysis, recommendations, and conversation log.
  • Slack: summary notification with root cause, recommendations, and feedback buttons.

Interacting with Investigations

After an investigation completes, you can:

  • Chat: ask follow-up questions about the investigation ("Why did this specific pod crash?", "Show me the relevant logs").
  • Provide feedback: rate the investigation from the Slack notification to help improve future investigations.
  • Share: link to the investigation in the portal for team review.

Investigation Types

TypeTriggerDescription
AutomaticAlert matches an alert ruleFully automated, no manual intervention
ManualFree-text "Start investigation"You describe a problem in your own words and OpsWorker investigates it
Chat-initiatedAI Chat queryStarted from a question in the chat interface
TestSynthetic test alertA synthetic alert used to verify setup

Next Steps