0% found this document useful (0 votes)
203 views11 pages

Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog

This document discusses improving high availability (HA) and long-term storage for Prometheus metrics using Thanos on Amazon EKS with Amazon S3. It provides an overview of deploying Thanos components including Thanos Sidecar, Store, and Querier to integrate with an existing Prometheus deployment on EKS. This enables storing historical metric data in S3 for virtually unlimited retention while querying metrics from Prometheus and long-term storage through Thanos Querier. The document also outlines prerequisites and steps to configure an S3 bucket, create an EKS cluster, deploy Prometheus using Helm, and integrate Thanos for HA and long-term storage of Prometheus metrics in S3.

Uploaded by

rahos39645
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views11 pages

Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog

This document discusses improving high availability (HA) and long-term storage for Prometheus metrics using Thanos on Amazon EKS with Amazon S3. It provides an overview of deploying Thanos components including Thanos Sidecar, Store, and Querier to integrate with an existing Prometheus deployment on EKS. This enables storing historical metric data in S3 for virtually unlimited retention while querying metrics from Prometheus and long-term storage through Thanos Querier. The document also outlines prerequisites and steps to configure an S3 bucket, create an EKS cluster, deploy Prometheus using Helm, and integrate Thanos for HA and long-term storage of Prometheus metrics in S3.

Uploaded by

rahos39645
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source

AWS Open Source Blog

AWS Open Source Blog

Improving HA and long-term storage for Prometheus using


Thanos on EKS with S3
by Raj Bagwe | on 24 SEP 2020 | in Advanced (300), Amazon Elastic Kubernetes Service, Amazon Simple Storage Service (S3), AWS
Identity And Access Management (IAM), Open Source, Technical How-To | Permalink |  Comments |   Share

Prometheus is an open source systems monitoring and alerting toolkit that is widely adopted as a standard
monitoring tool with self-managed and provider-managed Kubernetes. Prometheus provides many useful
features, such as dynamic service discovery, powerful queries, and seamless alert notification integration. Beyond
certain scale, however, problems arise when basic Prometheus capabilities do not meet requirements such as:

Storing petabyte-scale historical data in a reliable and cost-efficient way


Accessing all metrics using a single-query API
Merging replicated data collected via Prometheus high-availability (HA) setups

Thanos was built in response to these challenges. Thanos, which is released under the Apache 2.0 license, offers
a set of components that can be composed into a highly available Prometheus setup with long-term storage
capabilities. Thanos uses the Prometheus 2.0 storage format to cost-efficiently store historical metric data in
object storage, such as Amazon Simple Storage Service (Amazon S3), while retaining fast query latencies. In
summary, Thanos is intended to provide:

Global query view of metrics


Virtually unlimited retention of metrics, including downsampling
High availability of components, including support for Prometheus HA

In this post, we’ll learn how to implement Thanos for HA and long-term storage for Prometheus metrics using
Amazon S3 on an Amazon Elastic Kubernetes Service (Amazon EKS) platform.

Overview of solution

Thanos is an open source project that is capable of integrating with a Prometheus deployment, enabling a highly
available metric system with long-term, scalable storage. For the simpler setup, we can get started with three
new Thanos components:

Thanos SideCar: SideCar runs with every Prometheus instance. The sidecar uploads Prometheus data
every two hours to storage (an S3 bucket in our case). It also serves real-time metrics that are not
uploaded in bucket.
Thanos Store: Store serves metrics from Amazon S3 storage.
Thanos Querier: Querier has a user interface similar to that of Prometheus and it handles Prometheus
query API. Querier queries Store and Sidecar to return the relevant metrics. If there are multiple
Prometheus instances set up for HA, it can also de-duplicate the metrics.

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 1/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

Thanos basic components

We can also install Thanos Compactor, which applies compaction procedure to Prometheus block data stored in
an S3 bucket. It is also responsible for downsampling data.

Prerequisites

This guide has the following requirements:

An AWS account with adequate permissions to operate IAM roles, IAM policy, Amazon EKS, and Amazon
S3.
Running Amazon EKS cluster (Kubernetes 1.13 or above).
Prometheus or Prometheus Operator Helm Chart installed (v2.2.1+).
Helm 3.x.
Working knowledge of Kubernetes and using kubectl.
AWS Command Line Interface (AWS CLI) with at least version 1.18.86 or 2.0.25.
eksctl version 0.22.0 or above.
Confirm that all Thanos components are installed in the same Kubernetes namespace as Prometheus.
Clone the Kubernetes manifests for Thanos Querier and Store Deployment steps:

git clone -b release-0.12 https://github.com/thanos-io/kube-thanos.git


Thanos Compact manifests.

All instructions in this document use Prometheus Operator chart version 8.15.6.

Deployment overview

Before beginning with Thanos deployment, we configure an S3 bucket to use as object storage and create IAM
policy required to access this bucket.

To deploy the Thanos components, we complete the following:

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 2/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

1. Enable Thanos Sidecar for Prometheus.


2. Deploy Thanos Querier with the ability to talk to Sidecar.
3. Confirm that Thanos Sidecar is able to upload Prometheus metrics to our S3 bucket.
4. Deploy Thanos Store to retrieve metrics data stored in long-term storage (in this case, our S3 bucket).
5. Set up Thanos Compactor for data compaction and downsampling.

Configure S3 bucket and IAM policy

1. To store metric data, create an S3 bucket in an AWS Region local to the Prometheus environment. Use the
appropriate console or API-based mechanisms.
2. Create an IAM policy to attach to the IAM role to give access to ServiceAccount used by Prometheus POD.

"Version": "2012-10-17",

"Statement": [

"Sid": "Statement",

"Effect": "Allow",

"Action": [

"s3:ListBucket",

"s3:GetObject",

"s3:DeleteObject",

Create Amazon EKS cluster

Next, create an Amazon EKS cluster using the configuration below. Once created, the cluster enables the
following:

Provision the cluster with Kubernetes version 1.16 with one managed node group.
https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 3/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

IAM OIDC provider to provide fine-grained permission management for an application running on Amazon
EKS that uses other AWS services.
Create monitoring namespace on the provisioned Amazon EKS cluster. Use prometheus-prometheus-
oper-prometheus ServiceAccount to run Prometheus POD.
Map the IAM policy to the ServiceAccount role to provide required permissions on the S3 bucket storing
Thanos metric data.

apiVersion: eksctl.io/v1alpha5

kind: ClusterConfig

metadata:

name: thanosdemo

region: us-west-2

version: '1.16'

iam:

withOIDC: true

serviceAccounts:

- metadata:

name: prometheus-prometheus-oper-prometheus

namespace: monitoring

labels: {aws-usage: "application"}

attachPolicyARNs:

- "arn:aws:iam::454014481298:policy/thanos-metrics-s3storage-policy"

managedNodeGroups:

- name: ng0

minSize: 1

maxSize: 3

Next, we complete the following steps:

1. Run the command # eksctl create cluster -f eks-cluster-config.yaml to create the Amazon EKS
cluster with the configuration stored in file eks-cluster-config.yaml.

2. After eksctl completes provisioning the cluster, verify the cluster health using the command
kubectl get nodes :

NAME STATUS ROLES AGE VERSION

ip-192-168-XX-15.us-west-2.compute.internal Ready <none> 6h v1.16.13-eks-2ba888

ip-192-168-YY-20.us-west-2.compute.internal Ready <none> 6h v1.16.13-eks-2ba888

3. Verify the IAM OIDC provider by running the command


aws eks describe-cluster --name thanosdemo --query "cluster.identity.oidc.issuer" :

"https://oidc.eks.us-west-2.amazonaws.com/id/3423376DF7D6CC41B662FC8309BXXXX"

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 4/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

4. Verify the ServiceAccount created for Prometheus POD in monitoring namespace with the command
kubectl describe serviceaccount prometheus-prometheus-oper-prometheus -n monitoring :

Name: prometheus-prometheus-oper-prometheus

Namespace: monitoring

Labels: aws-usage=application

Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::454014481298:role/eksctl-than


Image pull secrets: <none>

Mountable secrets: prometheus-prometheus-oper-prometheus-token-shzqd

Tokens: prometheus-prometheus-oper-prometheus-token-shzqd

Events: <none>

Installing Helm CLI

Before we can get started, let’s install Helm CLI and configure the Helm repository. Complete the following
steps:

1. Install the Helm CLI:

curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

2. Verify Helm version:

helm version —short

3. Configure the chart repository:

helm repo add stable https://kubernetes-charts.storage.googleapis.com/

Installing and configuring Prometheus and Thanos

1. Get the prometheus-operator chart default configuration values by running the command
helm show values stable/prometheus-operator > values_default.yaml .

2. The prometheus-operator chart creates the Kubernetes resources required to run Prometheus as part of the
installation. We must disable ServiceAccount creation for Prometheus POD as ServiceAccount prometheus-
prometheus-oper-prometheus was created during the cluster install. Configure to create: false and add
the ServiceAccount name under Deploy a Prometheus Instance section in the values_default.yaml file:

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 5/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

## Deploy a Prometheus instance

##

prometheus:

enabled: true

## Annotations for Prometheus

##

annotations: {}

## Service account for Prometheuses to use.

## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-accoun
##

serviceAccount:

create: false

name: "prometheus-prometheus-oper-prometheus"

3. Add the Thanos Sidecar configuration after thanos with the command {} in values_default.yaml :

thanos:

baseImage: quay.io/thanos/thanos

version: v0.12.2

objectStorageConfig:

key: thanos-storage-config.yaml

name: thanos-storage-config

4. Configure objectStorageConfig with the configuration file with the command


thanos-storage-config.yaml :

type: s3

config:

bucket: thanos-metrics-s3storage #S3 bucket name

endpoint: s3.us-west-2.amazonaws.com #S3 Regional endpoint

encryptsse: true

Note: Learn more about additional object storage configuration options in the Thanos documentation.

5. Create Kubernetes secret:

kubectl -n monitoring create secret generic thanos-storage-config —from-file=thanos-storage

6. Install Thanos Sidecar with Prometheus POD:

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 6/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

helm install prometheus stable/prometheus-operator -f values_sa.yaml -n monitoring

7. Check the status of Prometheus POD and Thanos Sidecar with the command
kubectl get po -n monitoring -l app=prometheus :

NAME READY STATUS RESTARTS AGE

prometheus-prometheus-prometheus-oper-prometheus-0 4/4 Running 1 4h49m

prometheus-prometheus-prometheus-oper-prometheus-1 4/4 Running 1 4h49m

8. Check the status of Thanos Sidecar container in Prometheus POD:


kubectl describe pod prometheus-prometheus-prometheus-oper-prometheus-0 -n monitoring

In Prometheus POD, the status of Thanos Sidecar is:

thanos-sidecar:

Container ID: docker://65d6ba0d1de338d671cf75a7888e982b896198eb49c6b9214d2f3004a21f


Image: quay.io/thanos/thanos:v0.12.2

Image ID: docker-pullable://quay.io/thanos/thanos@sha256:bc134406dcfb3cb235a758


Ports: 10902/TCP, 10901/TCP

Host Ports: 0/TCP, 0/TCP

Args:

sidecar

--prometheus.url=http://127.0.0.1:9090/

--tsdb.path=/prometheus

--grpc-address=[$(POD_IP)]:10901

--http-address=[$(POD_IP)]:10902

--objstore.config=$(OBJSTORE_CONFIG)

--log.level=info

--log.format=logfmt

State: Running

Started: Sun, 06 Sep 2020 00:47:12 +0000

Ready: True

Deploy Thanos Querier

Thanos Querier assists in retrieving metrics from all Prometheus instances. It can be used with Grafana because
of its compatibility with original PromQL and HTTP APIs.

1. Add metric store configuration as thanos-query-deployment.yaml under spec.spec.containers args query


section:

--store=thanos-store.monitoring.svc.cluster.local:10901
--store=prometheus-operated.monitoring.svc.cluster.local:10901

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 7/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

The preceding store configuration adds Thanos Store service to retrieve historical metric data from object
storage (S3 bucket) and Prometheus service for the latest metric data. We will be deploying Thanos Store service
in the next step.

2. Apply the Query deployment, service, and serviceMonitor manifests to create Kubernetes objects:

kubectl apply -f thanos-query-deployment.yaml -f thanos-query-service.yaml -f thanos-query-

Deploy Thanos Store

Thanos Store collaborates with querier for retrieving historical data from the given bucket.

Make the following changes in the Thanos Store configuration files:

1. Add ServiceAccountName to spec.template .spec to enable S3 bucket access in


thanos-store-statefulSet.yaml :

serviceAccountName : prometheus-prometheus-oper-prometheus .

2. Change the spec.template.spec.containers.env in thanos-store-statefulSet.yaml to:

env:

name: OBJSTORE_CONFIG

valueFrom:

secretKeyRef:

key: thanos-storage-config.yaml

name: thanos-storage-config

3. Apply the Store statefulSet, service, and serviceMonitor manifests:

kubectl apply -f thanos-store-statefulSet.yaml -f thanos-store-service.yaml -f thanos-store

Deploy Thanos Compactor

Thanos Compactor completes the downsampling for historical data. The compactor needs a local disk space to
store intermediate data for processing.

Make the following changes in Thanos Compactor configuration files:

1. Add the ServiceAccountName to spec.template .spec  to enable S3 bucket access in


thanos-compact-statefulSet.yaml :

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 8/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

serviceAccountName: prometheus-prometheus-oper-prometheus

2. Change the spec.template.spec.containers.env in thanos-compact-statefulSet.yaml to:

env:

- name: OBJSTORE_CONFIG

valueFrom:

secretKeyRef:

key: thanos-storage-config.yaml

name: thanos-storage-config

3. Apply the Compact statefulSet, service, and serviceMonitor manifests:

kubectl apply -f thanos-compact-statefulSet.yaml


-f thanos-compact-service.yaml -f thanos-compact-serviceMonitor.yaml

4. Check the status of all Thanos components:

kubectl get all -n monitoring

Configure Thanos as Grafana data source

To start viewing metric data with Grafana UI, we can add Thanos Querier service as one of the data sources. Do
so by going to Grafana, Configuration, Data Sources, Add data source.

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 9/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

Cleaning up

To avoid incurring future charges, delete the resources. Use the following commands to clean up the Thanos
environment:

1. Remove Thanos Querier, Store, and Compactor:

kubectl get all -n prometheus --no-headers=true | awk '/thanos/{print $1}' |xargs kubectl

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 10/11
12/14/21, 6:32 PM Improving HA and long-term storage for Prometheus using Thanos on EKS with S3 | AWS Open Source Blog

2. Remove Thanos Sidecar by removing the sidecar configuration added during the Thanos configuration process.
Finish by applying changes with:

helm -n monitoring -f values.yaml upgrade prometheus stable/prometheus-operator

3. Delete the S3 bucket being used for storing metric data:

aws s3 rb s3://bucket-name --force

4. To delete the EKS cluster:

eksctl delete cluster --name=thanosdemo

Costs

Thanos enables users to archive metric data from Prometheus in an object store such as Amazon S3. This
provides virtually unlimited storage for our monitoring system. For cost considerations, Thanos adds the price of
storing and querying data from the object storage and running the store node to existing Prometheus setup.
Compute used by queriers, compactors, and rule nodes require similar compute resources, as they save by not
doing the same work directly on Prometheus servers.

In a typical Prometheus setup, the data that is accessed locally travels over the network in Thanos. Data
transferred within the same AWS Region is free between Amazon S3 object store and Thanos.

For metric data, Prometheus uses and average of one-to-two bytes per sample for storage. If we store around
100,000 samples with a size of two bytes per day using Thanos, the storage consumption is around 196 KB on
Amazon S3. This costs < 0.05 USD per day. The cost of retrievals by the store node depends on individual
querying pattern, and you can add around 20% to the total storage cost to account for retrieval cost estimation.

Applying appropriate downsampling, resolution, and retention policies on Thanos object storage allows further
optimization.

Conclusion

In this blog post, we explored how to transform Prometheus into a robust monitoring system. Using Thanos with
Prometheus enables us to scale Prometheus horizontally. By using open source Thanos components and Amazon
S3, we get a global view, virtually unlimited retention, and potential metric high availability.

TAGS: prometheus

https://aws.amazon.com/blogs/opensource/improving-ha-and-long-term-storage-for-prometheus-using-thanos-on-eks-with-s3/ 11/11

You might also like