Skip to main content

Datadog AI Agent

Overview

The Datadog AI Agent queries your Datadog account for metrics, traces, and monitor status during investigations and chat sessions. It enriches Kubernetes investigation data with application performance monitoring (APM) context.

Requirements

  • Datadog integration configured for the cluster (API key and Application key)
  • See Datadog Integration

Capabilities

CapabilityDescription
Query metricsRetrieve time-series metric data (CPU, memory, request rate, error rate)
List monitorsCheck which Datadog monitors are in alert, warn, or OK state
Search tracesFind APM traces for specific services and time ranges
Get dashboard dataRetrieve data from Datadog dashboards

Use Cases

During Investigations

The Datadog Agent adds metric context to Kubernetes investigations:

  • Correlates application metrics with pod failures (e.g., error rate spike → pod crash)
  • Checks Datadog monitors related to the alerting resource
  • Identifies latency patterns that Kubernetes data alone can't show

In AI Chat

Ask metric-related questions:

What's the error rate for service api-gateway in the last hour?
Are there any Datadog monitors currently alerting?
Show me the latency trend for the checkout service
What's the request rate for namespace production?

Enriched Root Cause Analysis

When both Kubernetes and Datadog data are available, OpsWorker can provide deeper analysis:

  • "Pod OOMKilled at 14:23 UTC — Datadog shows memory usage ramping linearly from 12:00 UTC, correlating with a traffic spike visible in request rate metrics"

Next Steps