Skip to main content

Data Processing

Overview

All data processing happens in OpsWorker's cloud — not in your cluster. The in-cluster agent only collects raw data; analysis is performed by the investigation engine.

Processing Pipeline

1. Field Extraction

  • Regex-based (fast): Parses alert labels and annotations for namespace, pod, severity
  • AI-based (fallback): For non-standard alert formats, an LLM extracts fields from free-text descriptions

2. Topology Construction

  • Builds a dependency graph from discovered resources
  • Breadth-first traversal: Pod → Service → Ingress, Pod → Deployment → ReplicaSet
  • Identifies which resources may contain the root cause

3. Configuration Validation

Automated checks on resource configurations:

CheckDescription
Reference integrityDo service selectors match pod labels?
Contract matchingDo service ports align with container ports?
Fitness checksAre resource limits reasonable for the workload?

4. Issue Classification

Categorizes the problem:

  • Configuration issue — Mismatched selectors, incorrect limits, missing env vars
  • Runtime issue — Application crash, memory leak, external dependency failure

5. Root Cause Analysis

Multi-model AI analyzes all data:

  • Correlates signals across logs, events, and configurations
  • Identifies the underlying cause (not just the symptom)
  • Assesses confidence level based on evidence strength

6. Recommendation Generation

Produces:

  • Root cause statement with supporting evidence
  • Immediate actions with specific kubectl commands
  • Preventive measures for long-term fixes

Multi-Model Strategy

OpsWorker uses different AI models optimized for each stage:

StageModel TypeOptimization
Field extractionFast (Amazon Nova)Speed — extract fields in milliseconds
Analysis & recommendationsReasoning (Claude)Depth — complex correlation and root cause identification

Next Steps