Skip to main content

Troubleshooting Connectivity

Agent Shows as Disconnected

Check Pod Status

kubectl get pods -n opsworker
Pod StatusLikely CauseAction
RunningAgent running but can't reach SQSCheck network/proxy
CrashLoopBackOffConfiguration error or resource issueCheck logs
ImagePullBackOffCan't pull agent imageCheck image registry access
PendingScheduling issueCheck node resources, tolerations
Not foundAgent not installedRun Helm install

Check Pod Logs

kubectl logs -n opsworker -l app=opsworker-agent

Look for:

  • Connection errors → Network/proxy issue
  • Authentication errors → Invalid cluster token
  • Timeout errors → SQS endpoint unreachable

Verify Outbound Connectivity

Test that the agent can reach AWS SQS:

kubectl exec -n opsworker deploy/opsworker-agent -- \
wget -q -O /dev/null https://sqs.us-east-1.amazonaws.com

If this fails, outbound HTTPS is blocked. Check:

  • Security groups (EKS)
  • Firewall rules (GKE)
  • NSG rules (AKS)
  • NetworkPolicy resources

Check Cluster Token

Verify the correct token was used during installation:

helm get values opsworker-agent -n opsworker

If the token is incorrect, update it:

helm upgrade opsworker-agent opsworker/opsworker-agent \
-n opsworker \
--set clusterToken=CORRECT_TOKEN

You can regenerate the token from the OpsWorker portal if needed.

Proxy Configuration

If your cluster is behind a proxy:

helm upgrade opsworker-agent opsworker/opsworker-agent \
-n opsworker \
--set proxy.https=http://proxy.example.com:3128

Agent Connects but Goes Offline Intermittently

Resource Limits

Check if the pod is being OOM-killed:

kubectl describe pod -n opsworker -l app=opsworker-agent | grep -A5 "Last State"

If you see OOMKilled, increase memory limits:

helm upgrade opsworker-agent opsworker/opsworker-agent \
-n opsworker \
--set resources.limits.memory=512Mi

Node Stability

Check if the node hosting the agent is stable:

kubectl get events -n opsworker --sort-by='.lastTimestamp'

Network Intermittency

Intermittent SQS connectivity can cause temporary disconnections. The agent will automatically reconnect.

Next Steps