Troubleshooting FAQ
My agent shows as Disconnected — what should I check?
- Check the pod is running:
kubectl get pods -n opsworker - Check logs for errors:
kubectl logs -n opsworker -l app=opsworker-agent - Verify outbound HTTPS to
*.amazonaws.comis not blocked - Confirm the cluster token is correct:
helm get values opsworker-agent -n opsworker
See Connectivity Troubleshooting.
Investigations aren't triggering — why?
First, check if signals are arriving in the portal under Alerts. If not, your webhook URL may be incorrect. If signals arrive but no investigations start, check that you have an alert rule matching the signal with auto-investigation enabled. See Configure Alert Rules.
Investigation results seem incomplete — what's wrong?
The most common cause is insufficient RBAC permissions. Verify the agent can access the affected namespace: kubectl auth can-i get pods -n NAMESPACE --as=system:serviceaccount:opsworker:opsworker-agent. See Data Collection Troubleshooting.
Slack notifications aren't arriving
Check: (1) Slack integration shows as connected in the portal, (2) a notification channel is configured in notification routing, (3) the OpsWorker bot has been invited to the target channel. Try disconnecting and reconnecting the Slack integration.
AI Chat isn't responding
Ensure your cluster is connected (check status in the portal). AI Chat requires an active agent connection to query your cluster. If the cluster shows Disconnected, resolve the connectivity issue first.
An investigation is stuck in "In Progress"
This usually means the agent can't respond to data collection requests. Check agent connectivity and pod health. If the agent is healthy but the investigation is stuck, it may have timed out — check the investigation details in the portal.
I'm getting too many / too few investigations
Adjust your alert rules. To reduce: narrow the namespace filter, raise severity to critical only, or disable auto-investigation on broad rules. To increase: add rules for more namespaces or lower severity levels. See Alert Rules.
How do I reset my cluster connection?
Regenerate the cluster token in the portal (cluster settings), then update the agent: helm upgrade opsworker-agent opsworker/opsworker-agent -n opsworker --set clusterToken=NEW_TOKEN.