Skip to main content

Investigation Analytics

Overview

Investigation Analytics provides detailed data on how OpsWorker investigations are performing — what types of issues are being found, which areas of your infrastructure generate the most investigations, and how accurate the AI analysis is.

Analytics Available

Investigation Outcomes

  • Root cause types: Configuration issues vs. runtime issues
  • Most common alert types: Which alerts trigger the most investigations
  • Recurring issues: Alerts that fire repeatedly for the same resources — candidates for permanent fixes

Performance

  • Investigation duration: Distribution of completion times
  • Success rate: Percentage of investigations that completed with a root cause vs. failed
  • Agent response time: How quickly the in-cluster agent responds to data collection requests

Coverage

  • Top investigated namespaces: Which namespaces generate the most investigation work
  • Top investigated services: Which services are most frequently involved
  • Alert-to-investigation ratio: What percentage of alerts trigger investigations

Feedback Analysis

  • Accuracy by alert type: Which types of alerts get the most accurate investigations
  • Accuracy trend: Is investigation quality improving over time?
  • Feedback distribution: Accurate vs. Partially Accurate vs. Needs Improvement

Using Investigation Analytics

Find Recurring Issues

If the same alert type keeps generating investigations for the same resource, it indicates a persistent issue that needs a permanent fix rather than repeated investigation.

Optimize Alert Rules

If investigations for certain alert types consistently receive poor feedback ratings, consider:

  • Adjusting the alert rule filters
  • Providing additional context via alert annotations
  • Tuning the monitoring threshold

Capacity Planning

High investigation volumes for specific namespaces may indicate:

  • Under-provisioned resources
  • Application stability issues
  • Need for architectural improvements

Next Steps