Skip to main content

Troubleshooting

Common Issues

Agent Not Connecting

Symptoms: Cluster shows Disconnected in portal.

  1. Check pod status: kubectl get pods -n opsworker
  2. Check logs: kubectl logs -n opsworker -l app=opsworker-agent
  3. Verify outbound connectivity to *.amazonaws.com (port 443)
  4. Check cluster token is correct: helm get values opsworker-agent -n opsworker
  5. Check for NetworkPolicy blocking outbound traffic

See Connectivity Troubleshooting for details.

Investigations Not Triggering

Symptoms: Alerts fire but no investigations start.

  1. Check Alerts in portal — are signals arriving?
  2. If no signals: verify your webhook URL is correct in the monitoring system
  3. If signals arrive but no investigations: check Alert Rules — does a matching rule exist with auto-investigation enabled?
  4. Verify the alert's namespace/severity/labels match rule filters

Investigations Returning Incomplete Results

Symptoms: Investigation completes but data is missing.

  1. Check agent RBAC permissions for the affected namespace
  2. Verify with: kubectl auth can-i get pods -n NAMESPACE --as=system:serviceaccount:opsworker:opsworker-agent
  3. Check if the namespace is within the agent's scope

See Data Collection Troubleshooting.

Slack Notifications Not Arriving

Symptoms: Investigation completes but no Slack message.

  1. Check Slack integration status in Integrations
  2. Verify notification routing — is a channel configured?
  3. Check that the OpsWorker bot is in the target Slack channel
  4. Try disconnecting and reconnecting the Slack integration

Investigation Stuck in "In Progress"

Symptoms: Investigation doesn't complete.

  1. Check agent connectivity (Disconnected agent blocks data collection)
  2. Check agent resource limits (OOM kills interrupt investigations)
  3. If persistent, the investigation may have timed out — check portal for error details

Portal Access Issues

  1. Clear browser cache and cookies
  2. Try a different browser
  3. Check SSO configuration if using Google or Azure AD sign-in

Getting Help

If these steps don't resolve your issue, contact OpsWorker support with:

  • Cluster name and ID
  • Agent logs: kubectl logs -n opsworker -l app=opsworker-agent
  • Description of the issue and steps you've tried