Skip to main content

Data Collection

Overview

During an investigation, OpsWorker collects data from your Kubernetes cluster through the in-cluster agent. All collection is read-only and scoped to the resources relevant to the investigation.

What Is Collected

Data TypeSourcePurpose
Pod statuskubectl get podCurrent state, restart counts, container statuses
Pod logskubectl logsApplication errors, stack traces, crash output
Kubernetes eventskubectl get eventsScheduling, state transitions, errors, warnings
Deployment specskubectl get deploymentReplica count, strategy, resource limits
Service specskubectl get serviceSelectors, ports, type
Endpointskubectl get endpointsHealthy/unhealthy backends
Ingress ruleskubectl get ingressRouting rules, TLS configuration
ConfigMap contentskubectl get configmapApplication configuration
Secret metadatakubectl get secretNames, labels, annotations only (values are never read)
Node statuskubectl get nodeConditions, capacity, allocatable resources

How Collection Works

  1. The investigation engine determines what data is needed based on the alert and discovered topology
  2. Commands are sent to the in-cluster agent via SQS
  3. The agent executes the kubectl-equivalent query against the Kubernetes API
  4. Results are returned via SQS to the investigation engine

What Is NOT Collected

  • Secret values — Only metadata (names, labels) is accessed
  • Container filesystem — No exec or file access inside containers
  • Network traffic — No packet capture or network monitoring
  • Metrics — Kubernetes metrics are not collected directly (use Datadog/Grafana integration for metrics)

Data Minimization

OpsWorker only collects data relevant to the specific investigation:

  • Only resources in the discovered topology are queried
  • Log collection is limited to recent entries
  • Collection stops once sufficient data is gathered for analysis

Data in Transit

All data between the agent and OpsWorker cloud is transmitted via AWS SQS over TLS-encrypted connections.

Next Steps