Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog
Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog
Prometheus is an open source systems monitoring and alerting toolkit that is widely adopted as a standard
monitoring tool with self-managed and provider-managed Kubernetes. Prometheus provides many useful
features, such as dynamic service discovery, powerful queries, and seamless alert notification integration. Beyond
certain scale, however, problems arise when basic Prometheus capabilities do not meet requirements such as:
Thanos was built in response to these challenges. Thanos, which is released under the Apache 2.0 license, offers
a set of components that can be composed into a highly available Prometheus setup with long-term storage
capabilities. Thanos uses the Prometheus 2.0 storage format to cost-efficiently store historical metric data in
object storage, such as Amazon Simple Storage Service (Amazon S3), while retaining fast query latencies. In
summary, Thanos is intended to provide:
In this post, we’ll learn how to implement Thanos for HA and long-term storage for Prometheus metrics using
Amazon S3 on an Amazon Elastic Kubernetes Service (Amazon EKS) platform.
Overview of solution
Thanos is an open source project that is capable of integrating with a Prometheus deployment, enabling a highly
available metric system with long-term, scalable storage. For the simpler setup, we can get started with three
new Thanos components:
Thanos SideCar: SideCar runs with every Prometheus instance. The sidecar uploads Prometheus data
every two hours to storage (an S3 bucket in our case). It also serves real-time metrics that are not
uploaded in bucket.
Thanos Store: Store serves metrics from Amazon S3 storage.
Thanos Querier: Querier has a user interface similar to that of Prometheus and it handles Prometheus
query API. Querier queries Store and Sidecar to return the relevant metrics. If there are multiple
Prometheus instances set up for HA, it can also de-duplicate the metrics.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 1/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
We can also install Thanos Compactor, which applies compaction procedure to Prometheus block data stored in
an S3 bucket. It is also responsible for downsampling data.
Prerequisites
An AWS account with adequate permissions to operate IAM roles, IAM policy, Amazon EKS, and Amazon
S3.
Running Amazon EKS cluster (Kubernetes 1.13 or above).
Prometheus or Prometheus Operator Helm Chart installed (v2.2.1+).
Helm 3.x.
Working knowledge of Kubernetes and using kubectl.
AWS Command Line Interface (AWS CLI) with at least version 1.18.86 or 2.0.25.
eksctl version 0.22.0 or above.
Confirm that all Thanos components are installed in the same Kubernetes namespace as Prometheus.
Clone the Kubernetes manifests for Thanos Querier and Store Deployment steps:
All instructions in this document use Prometheus Operator chart version 8.15.6.
Deployment overview
Before beginning with Thanos deployment, we configure an S3 bucket to use as object storage and create IAM
policy required to access this bucket.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 2/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
1. To store metric data, create an S3 bucket in an AWS Region local to the Prometheus environment. Use the
appropriate console or API-based mechanisms.
2. Create an IAM policy to attach to the IAM role to give access to ServiceAccount used by Prometheus POD.
"Version": "2012-10-17",
"Statement": [
"Sid": "Statement",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:DeleteObject",
Next, create an Amazon EKS cluster using the configuration below. Once created, the cluster enables the
following:
Provision the cluster with Kubernetes version 1.16 with one managed node group.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 3/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
IAM OIDC provider to provide fine-grained permission management for an application running on Amazon
EKS that uses other AWS services.
Create monitoring namespace on the provisioned Amazon EKS cluster. Use prometheus-prometheus-
oper-prometheus ServiceAccount to run Prometheus POD.
Map the IAM policy to the ServiceAccount role to provide required permissions on the S3 bucket storing
Thanos metric data.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: thanosdemo
region: us-west-2
version: '1.16'
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: prometheus-prometheus-oper-prometheus
namespace: monitoring
attachPolicyARNs:
- "arn:aws:iam::454014481298:policy/thanos-metrics-s3storage-policy"
managedNodeGroups:
- name: ng0
minSize: 1
maxSize: 3
1. Run the command # eksctl create cluster -f eks-cluster-config.yaml to create the Amazon EKS
cluster with the configuration stored in file eks-cluster-config.yaml.
2. After eksctl completes provisioning the cluster, verify the cluster health using the command
kubectl get nodes :
"https://oidc.eks.us-west-2.amazonaws.com/id/3423376DF7D6CC41B662FC8309BXXXX"
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 4/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
4. Verify the ServiceAccount created for Prometheus POD in monitoring namespace with the command
kubectl describe serviceaccount prometheus-prometheus-oper-prometheus -n monitoring :
Name: prometheus-prometheus-oper-prometheus
Namespace: monitoring
Labels: aws-usage=application
Tokens: prometheus-prometheus-oper-prometheus-token-shzqd
Events: <none>
Before we can get started, let’s install Helm CLI and configure the Helm repository. Complete the following
steps:
1. Get the prometheus-operator chart default configuration values by running the command
helm show values stable/prometheus-operator > values_default.yaml .
2. The prometheus-operator chart creates the Kubernetes resources required to run Prometheus as part of the
installation. We must disable ServiceAccount creation for Prometheus POD as ServiceAccount prometheus-
prometheus-oper-prometheus was created during the cluster install. Configure to create: false and add
the ServiceAccount name under Deploy a Prometheus Instance section in the values_default.yaml file:
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 5/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
##
prometheus:
enabled: true
##
annotations: {}
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-accoun
##
serviceAccount:
create: false
name: "prometheus-prometheus-oper-prometheus"
3. Add the Thanos Sidecar configuration after thanos with the command {} in values_default.yaml :
thanos:
baseImage: quay.io/thanos/thanos
version: v0.12.2
objectStorageConfig:
key: thanos-storage-config.yaml
name: thanos-storage-config
type: s3
config:
encryptsse: true
Note: Learn more about additional object storage configuration options in the Thanos documentation.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 6/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
7. Check the status of Prometheus POD and Thanos Sidecar with the command
kubectl get po -n monitoring -l app=prometheus :
thanos-sidecar:
Args:
sidecar
--prometheus.url=http://127.0.0.1:9090/
--tsdb.path=/prometheus
--grpc-address=[$(POD_IP)]:10901
--http-address=[$(POD_IP)]:10902
--objstore.config=$(OBJSTORE_CONFIG)
--log.level=info
--log.format=logfmt
State: Running
Ready: True
Thanos Querier assists in retrieving metrics from all Prometheus instances. It can be used with Grafana because
of its compatibility with original PromQL and HTTP APIs.
--store=thanos-store.monitoring.svc.cluster.local:10901
--store=prometheus-operated.monitoring.svc.cluster.local:10901
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 7/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
The preceding store configuration adds Thanos Store service to retrieve historical metric data from object
storage (S3 bucket) and Prometheus service for the latest metric data. We will be deploying Thanos Store service
in the next step.
2. Apply the Query deployment, service, and serviceMonitor manifests to create Kubernetes objects:
Thanos Store collaborates with querier for retrieving historical data from the given bucket.
serviceAccountName : prometheus-prometheus-oper-prometheus .
env:
name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos-storage-config.yaml
name: thanos-storage-config
Thanos Compactor completes the downsampling for historical data. The compactor needs a local disk space to
store intermediate data for processing.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 8/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
serviceAccountName: prometheus-prometheus-oper-prometheus
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos-storage-config.yaml
name: thanos-storage-config
To start viewing metric data with Grafana UI, we can add Thanos Querier service as one of the data sources. Do
so by going to Grafana, Configuration, Data Sources, Add data source.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 9/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
Cleaning up
To avoid incurring future charges, delete the resources. Use the following commands to clean up the Thanos
environment:
kubectl get all -n prometheus --no-headers=true | awk '/thanos/{print $1}' |xargs kubectl
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 10/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog
2. Remove Thanos Sidecar by removing the sidecar configuration added during the Thanos configuration process.
Finish by applying changes with:
Costs
Thanos enables users to archive metric data from Prometheus in an object store such as Amazon S3. This
provides virtually unlimited storage for our monitoring system. For cost considerations, Thanos adds the price of
storing and querying data from the object storage and running the store node to existing Prometheus setup.
Compute used by queriers, compactors, and rule nodes require similar compute resources, as they save by not
doing the same work directly on Prometheus servers.
In a typical Prometheus setup, the data that is accessed locally travels over the network in Thanos. Data
transferred within the same AWS Region is free between Amazon S3 object store and Thanos.
For metric data, Prometheus uses and average of one-to-two bytes per sample for storage. If we store around
100,000 samples with a size of two bytes per day using Thanos, the storage consumption is around 196 KB on
Amazon S3. This costs < 0.05 USD per day. The cost of retrievals by the store node depends on individual
querying pattern, and you can add around 20% to the total storage cost to account for retrieval cost estimation.
Applying appropriate downsampling, resolution, and retention policies on Thanos object storage allows further
optimization.
Conclusion
In this blog post, we explored how to transform Prometheus into a robust monitoring system. Using Thanos with
Prometheus enables us to scale Prometheus horizontally. By using open source Thanos components and Amazon
S3, we get a global view, virtually unlimited retention, and potential metric high availability.
TAGS: prometheus
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 11/11