Send Kubernetes metrics, logs, and events with Grafana Agent Operator

Note
Grafana Agent Operator has been deprecated. Configuration with the Grafana Kubernetes Monitoring Helm chart is the recommended method.

When you set up Kubernetes Monitoring using Grafana Agent Operator, Agent Operator deploys and configures Grafana Agent automatically using Kubernetes custom resource definition (CRD) objects. This configuration method provides you with preconfigured alerts.

The telemetry data collected includes:

Kubernetes cluster metrics
- kubelet and cAdvisor Cluster metrics
- kube-state-metrics
Container logs
Kubernetes events

Before you begin

To deploy Kubernetes Monitoring, you need:

A Kubernetes Cluster, environment, or fleet you want to monitor
Command-line tools kubectl and Helm (if you choose to install Agent Operator using Helm)

Note
Make sure you deploy the required resources in the same namespace to avoid any missing data.

Install CRDs and deploy Agent Operator

You can set up Agent Operator with or without Helm.

Set up with Helm

Run the following command to deploy Grafana Agent Operator and its associated CRDs:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana-agent-operator --create-namespace grafana/grafana-agent-operator -n "NAMESPACE"

Set up without Helm

If you don’t want to use Helm, you must install the CRDs and Grafana Agent Operator separately. To understand the architecture, refer to Architecture.

Install Grafana Agent Operator by following these instructions in Install the Operator.
Deploy custom resources into your Cluster by following the steps in Deploy Operator resources.

Deploy custom resources to collect cost metrics

Save the following to a file, and replace within the file the following:

NAMESPACE with the namespace for Grafana Agent
CLUSTER_NAME with the name of your Cluster
METRICS_HOST with the hostname for your Prometheus instance
METRICS_USERNAME with the username for your Prometheus instance
METRICS_PASSWORD with your Access Policy Token from earlier

Then deploy the file using kubectl apply -f <metrics.yaml>.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: opencost
  name: opencost
  namespace: NAMESPACE
automountServiceAccountToken: true
---
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/name: opencost
  name: opencost
  namespace: NAMESPACE
stringData:
  DB_BASIC_AUTH_USERNAME: "METRICS_USERNAME"
  DB_BASIC_AUTH_PW: "METRICS_PASSWORD"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: opencost
  name: opencost
  namespace: NAMESPACE
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - deployments
      - nodes
      - pods
      - services
      - resourcequotas
      - replicationcontrollers
      - limitranges
      - persistentvolumeclaims
      - persistentvolumes
      - namespaces
      - endpoints
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - daemonsets
      - deployments
      - replicasets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - statefulsets
      - deployments
      - daemonsets
      - replicasets
    verbs:
      - list
      - watch
  - apiGroups:
      - batch
    resources:
      - cronjobs
      - jobs
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - storage.k8s.io
    resources:
      - storageclasses
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: opencost
  name: opencost
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: opencost
subjects:
  - kind: ServiceAccount
    name: opencost
    namespace: NAMESPACE
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: opencost
  name: opencost
  namespace: NAMESPACE
spec:
  selector:
    name: opencost
  type: ClusterIP
  ports:
    - name: http
      port: 9003
      targetPort: 9003
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: opencost
  name: opencost
  namespace: NAMESPACE
spec:
  replicas: 1
  selector:
    matchLabels:
      name: opencost
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: opencost
    spec:
      securityContext: {}
      serviceAccountName: opencost
      tolerations: []
      containers:
        - image: quay.io/kubecost1/kubecost-cost-model:prod-1.103.1
          name: opencost
          resources:
            limits:
              cpu: 999m
              memory: 1Gi
            requests:
              cpu: 10m
              memory: 55Mi
          readinessProbe:
            httpGet:
              path: /healthz
              port: 9003
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 200
          livenessProbe:
            httpGet:
              path: /healthz
              port: 9003
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 10
          ports:
            - containerPort: 9003
              name: http
          securityContext: {}
          env:
            - name: PROMETHEUS_SERVER_ENDPOINT
              value: METRICS_HOST/api/prom
            - name: CLUSTER_ID
              value: CLUSTER_NAME
            - name: DB_BASIC_AUTH_USERNAME
              valueFrom:
                secretKeyRef:
                  name: opencost
                  key: DB_BASIC_AUTH_USERNAME
            - name: DB_BASIC_AUTH_PW
              valueFrom:
                secretKeyRef:
                  name: opencost
                  key: DB_BASIC_AUTH_PW
            - name: CLOUD_PROVIDER_API_KEY
              value: AIzaSyD29bGxmHAVEOBYtgd8sYM2gM2ekfxQX4U
            - name: EMIT_KSM_V1_METRICS
              value: "false"
            - name: EMIT_KSM_V1_METRICS_ONLY
              value: "true"
            - name: PROM_CLUSTER_ID_LABEL
              value: cluster
          imagePullPolicy: Always
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: opencost
  namespace: NAMESPACE
spec:
  endpoints:
    - honorLabels: true
      interval: 60s
      path: /metrics
      port: http
      relabelings:
        - action: replace
          replacement: integrations/kubernetes/opencost
          targetLabel: job
      metricRelabelings:
        - action: keep
          regex: container_cpu_allocation|container_gpu_allocation|container_memory_allocation_bytes|deployment_match_labels|kubecost_cluster_info|kubecost_cluster_management_cost|kubecost_cluster_memory_working_set_bytes|kubecost_http_requests_total|kubecost_http_response_size_bytes|kubecost_http_response_time_seconds|kubecost_load_balancer_cost|kubecost_network_internet_egress_cost|kubecost_network_region_egress_cost|kubecost_network_zone_egress_cost|kubecost_node_is_spot|node_cpu_hourly_cost|node_gpu_count|node_gpu_hourly_cost|node_ram_hourly_cost|node_total_hourly_cost|opencost_build_info|pod_pvc_allocation|pv_hourly_cost|service_selector_labels|statefulSet_match_labels
          sourceLabels:
            - __name__
      scheme: http
  namespaceSelector:
    matchNames:
      - NAMESPACE
  selector:
    matchLabels:
      name: opencost

Install integrations

To install integrations, follow the steps in Set up integrations.

Done with setup

To finish up:

Navigate to Kubernetes Monitoring, and click Configuration on the main menu.
Click the Metrics status tab to view the data status. Your data becomes populated as the system components begin scraping and sending data to Grafana Cloud.
Metrics status tab with status indicators for one Cluster
Click Kubernetes Monitoring on the main menu to view the home page and see any issues currently highlighted. You can drill into the data from here.
Explore your Kubernetes infrastructure:
- Click Cluster navigation in the menu, then click your namespace to view the grafana-agent StatefulSet, the grafana-agent-logs DaemonSet, and the ksm-kube-state-metrics deployment. Click the kube-system namespace to see Kubernetes-specific resources.
- Click the Nodes tab, then click the Nodes of your cluster to view their condition, utilization, and pod density.