0% found this document useful (0 votes)
15 views13 pages

Unit 5

Prometheus is a leading tool for monitoring Kubernetes workloads, providing essential monitoring and alerting capabilities through a pull-based model and a powerful query language called PromQL. It integrates with Alertmanager for intelligent alert handling and visualization tools like Grafana for real-time dashboards. Grafana enhances monitoring by allowing users to create customizable dashboards and supports alerting, auto healing, and programmatic interaction via a robust API.

Uploaded by

deekshith0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views13 pages

Unit 5

Prometheus is a leading tool for monitoring Kubernetes workloads, providing essential monitoring and alerting capabilities through a pull-based model and a powerful query language called PromQL. It integrates with Alertmanager for intelligent alert handling and visualization tools like Grafana for real-time dashboards. Grafana enhances monitoring by allowing users to create customizable dashboards and supports alerting, auto healing, and programmatic interaction via a robust API.

Uploaded by

deekshith0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT-5

Introduction to Prometheus
• Prometheus has become the most popular tool for monitoring Kubernetes workloads.
Even though the Kubernetes ecosystem grows more each day, there are certain tools for
specific problems that the community keeps using. Prometheus is one of them. The gap
Prometheus fills is for monitoring and alerting.
• First things first, Prometheus is the second project that graduates, after Kubernetes, from
the Cloud Native Computing Foundation (CNCF). Not many projects have been able to
graduate yet. Having a graduated monitoring project confirms how crucial it is to have
monitoring and alerting in place, especially for distributed systems—which are pretty
often the norm in Kubernetes.
Infrastructure Monitoring

 Prometheus follows a pull-based model, meaning it scrapes metrics from targets that expose
them via an HTTP endpoint. Each target must expose its metrics in a format that Prometheus
understands, usually in plain text. Exporters are used to expose metrics from third-party systems
(e.g., node_exporter for Linux systems, blackbox_exporter for endpoints). Prometheus uses a
powerful query language called PromQL to query and aggregate metrics, which is essential for
creating alerts and dashboards.
 The architecture of Prometheus includes components such as the Prometheus server, exporters,
alertmanager, and visualization tools like Grafana. The Prometheus server is the core
component responsible for data collection and storage. The Alertmanager handles alerts generated
by Prometheus rules and can send notifications via email, Slack, or other channels. Visualization
of data is usually done through Grafana, which integrates seamlessly with Prometheus for real-
time dashboards.
 Prometheus is particularly useful in cloud-native environments such as Kubernetes, where
dynamic infrastructure and container-based deployments are common. It supports service
discovery for automatic detection of services and instances. Prometheus also provides high
reliability by working independently of network storage, making it suitable for mission-critical
systems monitoring.
Alerting and Alert Receivers

 Alerting in Prometheus is a core feature that enables proactive monitoring of infrastructure and systems.
Prometheus uses a rule-based approach to define conditions under which alerts should be triggered. These
are defined in configuration files as alerting rules, written using the PromQL query language. When the
condition of an alert rule is true for a certain period, Prometheus triggers an alert. This helps in detecting
problems like high CPU usage, service downtime, or memory leaks before they severely impact
performance.
 Alerts generated by Prometheus are sent to the Alertmanager, a separate component that manages alert
delivery and notifications. The Alertmanager is responsible for grouping, deduplicating, throttling, and
routing alerts to the appropriate notification channels. It ensures that alerts are not spammed repeatedly and
are sent only when necessary. This makes alert handling more intelligent and manageable, especially in
large-scale environments.
 Alert Receivers are the endpoints where alerts are delivered. Prometheus Alertmanager supports a wide
variety of receivers, such as email, Slack, PagerDuty, Opsgenie, Webhook endpoints, and more. You can
configure routing based on alert labels, severity, or other criteria to send alerts to different teams or systems.
For example, critical alerts may go to on-call engineers via SMS or PagerDuty, while informational alerts
may be sent to a Slack channel.
 The alerting system is highly configurable and supports silencing, which temporarily suppresses alerts
during maintenance windows, and inhibition, which prevents lower-priority alerts from firing when higher-
priority ones are active. These features help avoid alert fatigue and allow teams to focus on the most
important issues.
Introduction to Grafana

 Grafana is an open-source analytics and visualization platform used for monitoring time-series data from various
sources. It is widely used in combination with tools like Prometheus, InfluxDB, Graphite, and Elasticsearch. Grafana
allows users to create interactive and customizable dashboards that display metrics, logs, and other performance data in
the form of graphs, charts, heatmaps, and tables. These dashboards provide real-time insights into the health and
performance of infrastructure, applications, and services.
 Grafana supports a wide range of data sources and can query and visualize data from multiple systems simultaneously,
making it extremely versatile. It also supports templating, which allows users to create dynamic dashboards with
variables, improving reusability and flexibility. Alerting is another key feature of Grafana, allowing users to set
threshold-based alerts on visualized data and send notifications through channels such as email, Slack, or PagerDuty.
 Grafana is especially valuable in DevOps and SRE (Site Reliability Engineering) practices, where real-time monitoring
and quick troubleshooting are critical. Its user-friendly interface, extensive plugin ecosystem, and support for role-based
access control make it a preferred choice for both small teams and large enterprises. In short, Grafana transforms raw data
into visual insights, enabling better decision-making and faster issue resolution.
Grafana Dashboards

 Grafana Dashboards are a central feature of the Grafana monitoring and visualization tool. They provide a
user-friendly and powerful interface for visualizing real-time and historical data collected from a wide variety
of data sources. A dashboard is essentially a collection of panels, each of which displays a different type of
visualization. These visualizations can include graphs, time-series charts, bar charts, pie charts, tables,
heatmaps, gauges, and more. This variety enables users to build comprehensive and interactive monitoring
displays tailored to different types of infrastructure, applications, or business metrics.
 Each panel in a dashboard is backed by a query to a connected data source. Grafana supports many data
sources, including Prometheus, InfluxDB, Elasticsearch, MySQL, PostgreSQL, and Loki, among others.
The query language used in the panel depends on the data source; for example, PromQL is used with
Prometheus, while InfluxQL or Flux is used with InfluxDB. These queries retrieve metrics and data points,
which are then visualized using the selected chart type. Panels also allow the use of transformations, where
data can be processed or combined before visualization, such as computing averages, grouping by labels, or
merging results from multiple queries.
 One of the most powerful features of Grafana dashboards is their use of variables. Variables act as
placeholders or dynamic filters within a dashboard. For example, a variable can represent server names,
regions, or application names, and can be selected from a dropdown at the top of the dashboard. This allows a
single dashboard template to be reused for different services or environments simply by changing the
variable’s value. Variables are configured using queries and can also be chained, meaning one variable’s
options can depend on the value of another.
 Dashboards also support templating, which refers to the use of variables and dynamic content to make
dashboards adaptable and scalable. Combined with time range selectors, users can drill into data from specific
time windows or analyze long-term trends. Dashboards can be configured to auto-refresh at specified intervals
(e.g., every 5 seconds or every minute), making them ideal for monitoring real-time systems such as production
environments, network performance, or service health.
 Alerting from dashboards is another key feature. Users can set up alert rules on supported panel types, such as
time-series graphs, where the system evaluates a metric based on a condition (e.g., CPU usage > 80%). If the
condition is met for a specified duration, Grafana can trigger an alert. These alerts can be routed to notification
channels such as email, Slack, Microsoft Teams, or PagerDuty through Grafana’s alerting engine or via integration
with Prometheus Alert manager.
 Dashboards can be easily shared within teams or externally. They can be shared via a direct link, an embedded
iframe, or as snapshot exports, which are static versions of the dashboard useful for reports and presentations.
Dashboards can also be exported as JSON files, which makes them easy to store in version control systems (like
Git) or migrate between different Grafana instances.
Grafana API and Auto Healing
 Grafana provides a robust RESTful API that allows users to interact programmatically with various components
of the Grafana platform such as dashboards, data sources, users, and alerting mechanisms. Authentication is
typically handled using API tokens, which are generated within the Grafana interface under the API Keys
section, and are required to be passed in the header using the "Authorization: Bearer <API_KEY>" format.
 Auto Healing, on the other hand, refers to the automatic detection and resolution of issues in a system without manual
intervention. In the context of DevOps or cloud infrastructure, auto healing is often implemented using monitoring tools (like
Prometheus and Grafana) combined with orchestration or automation platforms (such as Kubernetes or AWS Auto Scaling).
When a system detects a failure—such as a crashed pod, a dropped server, or threshold breach—it can automatically trigger
corrective actions like restarting services, replacing instances, or sending alerts. For example, Kubernetes uses liveness and
readiness probes to detect unhealthy containers and restarts them automatically, thereby ensuring minimal downtime. Auto
healing significantly improves system resilience and reliability by enabling quick recovery from faults and reducing
dependency on human intervention.
 Auto healing typically works by integrating monitoring tools like Prometheus and Grafana with alerting systems and
automation scripts. Alerts are configured based on specific metrics or thresholds, and when a breach occurs, it triggers
automated responses such as restarting a service, scaling resources, or even executing custom remediation scripts through tools
like Ansible, Terraform, or AWS Lambda. This approach minimizes downtime, improves mean time to recovery (MTTR), and
reduces the need for human intervention during incidents. In essence, auto healing ensures that systems can self-recover and
continue operating smoothly even in the face of unexpected disruptions, making it a vital component of any resilient IT
architecture.
Selenium features
 Selenium is a widely adopted open-source automation testing framework that is designed to help users test
web applications across various browsers and platforms. Selenium is not just a single tool but a set of tools
that helps testers to automate web-based applications more efficiently. This comprehensive tutorial will help
you gain an in-depth understanding of Selenium and how to use it to streamline your testing processes.
Test Driven Development
REPL driven Development
 REPL-driven development in the context of Selenium, to an interactive programming approach where developers can
write and test code in a Read-Eval-Print Loop (REPL) environment. This method allows for immediate feedback as each
line of code is evaluated and executed instantly, which is particularly useful in testing and debugging.
 In Selenium, REPL-driven development helps developers experiment with browser commands, locate web elements, and
execute actions like clicking or typing in real-time without writing a full test script initially. This speeds up the learning
and development process by enabling quick iterations and exploration of the browser’s behavior. It is commonly used in
dynamic programming languages such as Python and Ruby that support REPL environments, making it easier to
prototype and test Selenium commands efficiently.
 For example, a developer can launch a browser using a Selenium WebDriver, locate web elements, send inputs, or
perform actions like clicking a button—all directly from the REPL interface. This helps in quickly identifying how web
elements behave, verifying locators like XPath or CSS selectors, and debugging any issues on the spot. It eliminates the
need to write, save, and execute full test scripts repeatedly, thus saving time and improving productivity.
 This approach is particularly effective when using dynamic programming languages such as Python and Ruby, both of
which support REPL environments natively. In Python, the interactive shell or IPython can be used, while Ruby offers the
IRB (Interactive Ruby Shell). These tools allow for immediate feedback, which is very beneficial during the development
and testing of Selenium scripts. Overall, REPL-driven development enhances learning, speeds up test creation, and helps
in better understanding of how Selenium interacts with web browsers.

You might also like