Suggested Fixes
Overview
Every OpsWorker investigation includes actionable recommendations organized into two categories: immediate actions to resolve the current issue, and preventive measures to stop it from recurring.
Recommendation Categories
Immediate Actions
Steps to fix the problem right now:
- Specific to the current issue and environment
- Include resource names, namespaces, and values from your cluster
- Designed to restore service as quickly as possible
- Typically include kubectl commands you can copy and execute
Example:
Immediate Action: Restart the deployment to recover from the OOM condition:
kubectl rollout restart deployment/api-gateway -n productionThen increase the memory limit to prevent immediate recurrence:
kubectl set resources deployment/api-gateway -n production --limits=memory=512Mi
Preventive Measures
Longer-term fixes to address the root cause:
- Address the underlying issue, not just the symptom
- May involve code changes, configuration updates, or architectural adjustments
- Help break the cycle of recurring alerts
Example:
Preventive Measure: Investigate the connection pool leak identified in the logs. The
maxConnectionssetting in the application config should be bounded. Consider adding a connection pool metrics exporter to catch this trend earlier.
Contextual Recommendations
Recommendations are tailored to your specific environment:
- Uses actual resource names, namespaces, and configuration values
- Accounts for your cluster's topology (e.g., related services that may be affected)
- Considers the specific alert type and observed failure pattern
- References real data from the investigation (log lines, event details)
Viewing Recommendations
Recommendations appear in:
- Slack notification — Summary with key actions and commands
- Portal investigation detail — Full recommendations with context
- Investigation chat — Ask follow-up questions about any recommendation
Next Steps
- Command Generation — How kubectl commands are generated
- Safe Execution Model — How OpsWorker ensures safety
- PR Creation — Code-level fix suggestions