Investigation Analytics
Overview
Investigation Analytics provides detailed data on how OpsWorker investigations are performing — what types of issues are being found, which areas of your infrastructure generate the most investigations, and how accurate the AI analysis is.
Analytics Available
Investigation Outcomes
- Root cause types: Configuration issues vs. runtime issues
- Most common alert types: Which alerts trigger the most investigations
- Recurring issues: Alerts that fire repeatedly for the same resources — candidates for permanent fixes
Performance
- Investigation duration: Distribution of completion times
- Success rate: Percentage of investigations that completed with a root cause vs. failed
- Agent response time: How quickly the in-cluster agent responds to data collection requests
Coverage
- Top investigated namespaces: Which namespaces generate the most investigation work
- Top investigated services: Which services are most frequently involved
- Alert-to-investigation ratio: What percentage of alerts trigger investigations
Feedback Analysis
- Accuracy by alert type: Which types of alerts get the most accurate investigations
- Accuracy trend: Is investigation quality improving over time?
- Feedback distribution: Accurate vs. Partially Accurate vs. Needs Improvement
Using Investigation Analytics
Find Recurring Issues
If the same alert type keeps generating investigations for the same resource, it indicates a persistent issue that needs a permanent fix rather than repeated investigation.
Optimize Alert Rules
If investigations for certain alert types consistently receive poor feedback ratings, consider:
- Adjusting the alert rule filters
- Providing additional context via alert annotations
- Tuning the monitoring threshold
Capacity Planning
High investigation volumes for specific namespaces may indicate:
- Under-provisioned resources
- Application stability issues
- Need for architectural improvements
Next Steps
- Key Metrics — Full metrics reference
- Team Impact — Measure team-level improvement