Investigations
What is an Investigation
An investigation is the core unit of work in OpsWorker. It represents a complete AI-powered analysis of an alert, from receiving the signal to delivering a root cause analysis with remediation steps.
Each investigation contains:
- Trigger: the alert or request that started the investigation.
- Topology: discovered Kubernetes resources and their relationships.
- Collected data: logs, events, and configurations gathered from the cluster.
- Analysis: AI-generated root cause identification with a confidence level.
- Recommendations: immediate actions and preventive measures, including specific kubectl commands.
Lifecycle
Investigations move through the following states:
stateDiagram-v2
[*] --> Pending: Alert received
Pending --> InProgress: Investigation starts
InProgress --> Completed: Analysis finished
InProgress --> Failed: Error occurred
Completed --> [*]
Failed --> [*]
| State | Description |
|---|---|
| Pending | Alert received and investigation queued |
| In Progress | The agent graph is actively discovering topology, collecting data, and analyzing |
| Completed | Root cause identified, recommendations generated, notification sent |
| Failed | Investigation could not complete (for example, cluster connectivity lost or agent unavailable) |
A completed investigation can also be a partial result: when the analysis cannot fully converge, OpsWorker still returns its best low-confidence findings rather than failing outright. See Investigation Lifecycle for details.
Most investigations complete in under 2 minutes from alert arrival to Slack notification.
Viewing Investigations
Investigations are available in two places:
- OpsWorker Portal: full investigation detail with topology view, collected data, AI analysis, recommendations, and conversation log.
- Slack: summary notification with root cause, recommendations, and feedback buttons.
Interacting with Investigations
After an investigation completes, you can:
- Chat: ask follow-up questions about the investigation ("Why did this specific pod crash?", "Show me the relevant logs").
- Provide feedback: rate the investigation from the Slack notification to help improve future investigations.
- Share: link to the investigation in the portal for team review.
Investigation Types
| Type | Trigger | Description |
|---|---|---|
| Automatic | Alert matches an alert rule | Fully automated, no manual intervention |
| Manual | Free-text "Start investigation" | You describe a problem in your own words and OpsWorker investigates it |
| Chat-initiated | AI Chat query | Started from a question in the chat interface |
| Test | Synthetic test alert | A synthetic alert used to verify setup |
Next Steps
- How Investigations Work - Detailed investigation flow
- Investigation Lifecycle - Deep dive into each stage
- Investigation Chat - Follow-up conversations