Data Processing
Overview
All data processing happens in OpsWorker's cloud — not in your cluster. The in-cluster agent only collects raw data; analysis is performed by the investigation engine.
Processing Pipeline
1. Field Extraction
- Regex-based (fast): Parses alert labels and annotations for namespace, pod, severity
- AI-based (fallback): For non-standard alert formats, an LLM extracts fields from free-text descriptions
2. Topology Construction
- Builds a dependency graph from discovered resources
- Breadth-first traversal: Pod → Service → Ingress, Pod → Deployment → ReplicaSet
- Identifies which resources may contain the root cause
3. Configuration Validation
Automated checks on resource configurations:
| Check | Description |
|---|---|
| Reference integrity | Do service selectors match pod labels? |
| Contract matching | Do service ports align with container ports? |
| Fitness checks | Are resource limits reasonable for the workload? |
4. Issue Classification
Categorizes the problem:
- Configuration issue — Mismatched selectors, incorrect limits, missing env vars
- Runtime issue — Application crash, memory leak, external dependency failure
5. Root Cause Analysis
Multi-model AI analyzes all data:
- Correlates signals across logs, events, and configurations
- Identifies the underlying cause (not just the symptom)
- Assesses confidence level based on evidence strength
6. Recommendation Generation
Produces:
- Root cause statement with supporting evidence
- Immediate actions with specific kubectl commands
- Preventive measures for long-term fixes
Multi-Model Strategy
OpsWorker uses different AI models optimized for each stage:
| Stage | Model Type | Optimization |
|---|---|---|
| Field extraction | Fast (Amazon Nova) | Speed — extract fields in milliseconds |
| Analysis & recommendations | Reasoning (Claude) | Depth — complex correlation and root cause identification |
Next Steps
- How Investigations Work — Full investigation details
- Root Cause Analysis — Analysis depth