Send Kubernetes metrics, logs, and events with Grafana Agent Operator
Note
Grafana Agent Operator has been deprecated. Configuration with the Grafana Kubernetes Monitoring Helm chart is the recommended method.
When you set up Kubernetes Monitoring using Grafana Agent Operator, Agent Operator deploys and configures Grafana Agent automatically using Kubernetes custom resource definition (CRD) objects. This configuration method provides you with preconfigured alerts.
The telemetry data collected includes:
- Kubernetes cluster metrics
- kubelet and cAdvisor Cluster metrics
- kube-state-metrics
- Container logs
- Kubernetes events
Before you begin
To deploy Kubernetes Monitoring, you need:
- A Kubernetes Cluster, environment, or fleet you want to monitor
- Command-line tools kubectl and Helm (if you choose to install Agent Operator using Helm)
Note
Make sure you deploy the required resources in the same namespace to avoid any missing data.
Install CRDs and deploy Agent Operator
You can set up Agent Operator with or without Helm.
Set up with Helm
Run the following command to deploy Grafana Agent Operator and its associated CRDs:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana-agent-operator --create-namespace grafana/grafana-agent-operator -n "NAMESPACE"
Set up without Helm
If you don’t want to use Helm, you must install the CRDs and Grafana Agent Operator separately. To understand the architecture, refer to Architecture.
Install Grafana Agent Operator by following these instructions in Install the Operator.
Deploy custom resources into your Cluster by following the steps in Deploy Operator resources.
Deploy custom resources to collect cost metrics
Save the following to a file, and replace within the file the following:
NAMESPACE
with the namespace for Grafana AgentCLUSTER_NAME
with the name of your ClusterMETRICS_HOST
with the hostname for your Prometheus instanceMETRICS_USERNAME
with the username for your Prometheus instanceMETRICS_PASSWORD
with your Access Policy Token from earlier
Then deploy the file using kubectl apply -f <metrics.yaml>
.
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: opencost
name: opencost
namespace: NAMESPACE
automountServiceAccountToken: true
---
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/name: opencost
name: opencost
namespace: NAMESPACE
stringData:
DB_BASIC_AUTH_USERNAME: "METRICS_USERNAME"
DB_BASIC_AUTH_PW: "METRICS_PASSWORD"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: opencost
name: opencost
namespace: NAMESPACE
rules:
- apiGroups:
- ""
resources:
- configmaps
- deployments
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- deployments
- daemonsets
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- get
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: opencost
name: opencost
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: opencost
subjects:
- kind: ServiceAccount
name: opencost
namespace: NAMESPACE
---
apiVersion: v1
kind: Service
metadata:
labels:
name: opencost
name: opencost
namespace: NAMESPACE
spec:
selector:
name: opencost
type: ClusterIP
ports:
- name: http
port: 9003
targetPort: 9003
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: opencost
name: opencost
namespace: NAMESPACE
spec:
replicas: 1
selector:
matchLabels:
name: opencost
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
name: opencost
spec:
securityContext: {}
serviceAccountName: opencost
tolerations: []
containers:
- image: quay.io/kubecost1/kubecost-cost-model:prod-1.103.1
name: opencost
resources:
limits:
cpu: 999m
memory: 1Gi
requests:
cpu: 10m
memory: 55Mi
readinessProbe:
httpGet:
path: /healthz
port: 9003
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 200
livenessProbe:
httpGet:
path: /healthz
port: 9003
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 10
ports:
- containerPort: 9003
name: http
securityContext: {}
env:
- name: PROMETHEUS_SERVER_ENDPOINT
value: METRICS_HOST/api/prom
- name: CLUSTER_ID
value: CLUSTER_NAME
- name: DB_BASIC_AUTH_USERNAME
valueFrom:
secretKeyRef:
name: opencost
key: DB_BASIC_AUTH_USERNAME
- name: DB_BASIC_AUTH_PW
valueFrom:
secretKeyRef:
name: opencost
key: DB_BASIC_AUTH_PW
- name: CLOUD_PROVIDER_API_KEY
value: AIzaSyD29bGxmHAVEOBYtgd8sYM2gM2ekfxQX4U
- name: EMIT_KSM_V1_METRICS
value: "false"
- name: EMIT_KSM_V1_METRICS_ONLY
value: "true"
- name: PROM_CLUSTER_ID_LABEL
value: cluster
imagePullPolicy: Always
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: opencost
namespace: NAMESPACE
spec:
endpoints:
- honorLabels: true
interval: 60s
path: /metrics
port: http
relabelings:
- action: replace
replacement: integrations/kubernetes/opencost
targetLabel: job
metricRelabelings:
- action: keep
regex: container_cpu_allocation|container_gpu_allocation|container_memory_allocation_bytes|deployment_match_labels|kubecost_cluster_info|kubecost_cluster_management_cost|kubecost_cluster_memory_working_set_bytes|kubecost_http_requests_total|kubecost_http_response_size_bytes|kubecost_http_response_time_seconds|kubecost_load_balancer_cost|kubecost_network_internet_egress_cost|kubecost_network_region_egress_cost|kubecost_network_zone_egress_cost|kubecost_node_is_spot|node_cpu_hourly_cost|node_gpu_count|node_gpu_hourly_cost|node_ram_hourly_cost|node_total_hourly_cost|opencost_build_info|pod_pvc_allocation|pv_hourly_cost|service_selector_labels|statefulSet_match_labels
sourceLabels:
- __name__
scheme: http
namespaceSelector:
matchNames:
- NAMESPACE
selector:
matchLabels:
name: opencost
Install integrations
To install integrations, follow the steps in Set up integrations.
Done with setup
To finish up:
Navigate to Kubernetes Monitoring, and click Configuration on the main menu.
Click the Metrics status tab to view the data status. Your data becomes populated as the system components begin scraping and sending data to Grafana Cloud.
Metrics status tab with status indicators for one Cluster Click Kubernetes Monitoring on the main menu to view the home page and see any issues currently highlighted. You can drill into the data from here.
Explore your Kubernetes infrastructure:
- Click Cluster navigation in the menu, then click your namespace to view the grafana-agent StatefulSet, the grafana-agent-logs DaemonSet, and the ksm-kube-state-metrics deployment. Click the kube-system namespace to see Kubernetes-specific resources.
- Click the Nodes tab, then click the Nodes of your cluster to view their condition, utilization, and pod density.