Health Checks
Overview
Verify that OpsWorker is operating correctly across all components.
Verification Checklist
Agent Health
# Check agent pod status
kubectl get pods -n opsworker
# Check agent logs for errors
kubectl logs -n opsworker -l app=opsworker-agent --tail=50
Portal Health
- Cluster status: Navigate to your cluster — should show Connected
- Signal flow: Check Alerts for recent signals from your monitoring systems
- Investigation flow: Check Investigations for recent completed investigations
End-to-End Test
Click Test Integration in cluster settings to verify the complete pipeline:
- Synthetic alert is created
- Investigation runs against your cluster
- Results appear in the portal
- Slack notification is delivered (if configured)
Integration Health
| Component | How to Check |
|---|---|
| Agent connectivity | Cluster status in portal (Connected/Disconnected) |
| Alert ingestion | New signals appearing in Alerts timeline |
| Investigation engine | Investigations completing successfully |
| Slack | Test notification delivery |
Periodic Checks
| Check | Frequency | What to Look For |
|---|---|---|
| Agent pod running | Daily | Pod status is Running, no excessive restarts |
| Investigations completing | Daily | No stuck investigations in "In Progress" |
| Slack delivery | Weekly | Test Integration succeeds |
| Agent version | Monthly | Agent is on a recent version |
Next Steps
- Logs & Diagnostics — Dig deeper into agent logs
- Troubleshooting — Fix common issues