Skip to main content

Health Checks

Overview

Verify that OpsWorker is operating correctly across all components.

Verification Checklist

Agent Health

# Check agent pod status
kubectl get pods -n opsworker

# Check agent logs for errors
kubectl logs -n opsworker -l app=opsworker-agent --tail=50

Portal Health

  • Cluster status: Navigate to your cluster — should show Connected
  • Signal flow: Check Alerts for recent signals from your monitoring systems
  • Investigation flow: Check Investigations for recent completed investigations

End-to-End Test

Click Test Integration in cluster settings to verify the complete pipeline:

  1. Synthetic alert is created
  2. Investigation runs against your cluster
  3. Results appear in the portal
  4. Slack notification is delivered (if configured)

Integration Health

ComponentHow to Check
Agent connectivityCluster status in portal (Connected/Disconnected)
Alert ingestionNew signals appearing in Alerts timeline
Investigation engineInvestigations completing successfully
SlackTest notification delivery

Periodic Checks

CheckFrequencyWhat to Look For
Agent pod runningDailyPod status is Running, no excessive restarts
Investigations completingDailyNo stuck investigations in "In Progress"
Slack deliveryWeeklyTest Integration succeeds
Agent versionMonthlyAgent is on a recent version

Next Steps