Investigation Lifecycle

Overview

Every investigation moves through a defined set of stages from alert arrival to result delivery. Understanding the lifecycle helps you interpret investigation status and troubleshoot issues.

Stages

stateDiagram-v2
    [*] --> Pending: Alert matches rule
    Pending --> InProgress: Investigation starts
    InProgress --> Extracting: Field extraction
    Extracting --> Discovering: Topology discovery
    Discovering --> Collecting: Data collection
    Collecting --> Analyzing: AI analysis
    Analyzing --> Completed: Results ready
    InProgress --> Failed: Error occurred
    Completed --> [*]
    Failed --> [*]

Pending

The alert has been received and matched an alert rule. The investigation is queued for processing.

Duration: Seconds
Visible in: Portal investigation list (status: Pending)

In Progress

The investigation is actively running. AI agents are working through the investigation steps:

Field Extraction

Identifies the affected namespace, pod, severity, and other key fields from the alert payload. Uses fast regex-based extraction first, with AI-based fallback for non-standard formats.

Topology Discovery

Starting from the identified resource, maps the dependency chain (pod → service → ingress). Discovers related resources that may contain the root cause.

Data Collection

Gathers data from all discovered resources:

Pod logs (recent container output)
Kubernetes events (scheduling, state changes, errors)
Resource configurations (specs, limits, selectors)
Service endpoint health

AI Analysis

Analyzes all collected data to:

Validate configurations (reference integrity, port alignment, selector matching)
Classify the issue (configuration vs. runtime)
Identify the root cause with a confidence level
Generate specific remediation recommendations

Completed

The investigation has finished. Results include:

Root cause analysis with confidence level
Affected resources and their topology
Immediate action recommendations with kubectl commands
Preventive measure recommendations
Complete conversation log (AI's reasoning process)

Notifications are sent to configured Slack channels and the investigation is available in the portal.

Failed

The investigation could not complete. Common reasons:

Reason	Resolution
Cluster agent disconnected	Check agent pod status and connectivity
Timeout	Agent may be overloaded — check resource limits
Insufficient permissions	Agent RBAC may not cover the affected namespace

Failed investigations are visible in the portal with error details.

Viewing Investigation Status

Portal: Navigate to Investigations — each investigation shows its current status
Slack: Notifications are sent only when investigations complete
Investigation detail page: Shows the full timeline of each stage

Data Retention

Completed investigations and their collected data are retained in the portal for historical review and trend analysis.

Next Steps

Investigation Chat — Ask follow-up questions
How Investigations Work — Technical deep dive

Overview​

Stages​

Pending​

In Progress​

Field Extraction​

Topology Discovery​

Data Collection​

AI Analysis​

Completed​

Failed​

Viewing Investigation Status​

Data Retention​

Next Steps​