0% found this document useful (0 votes)
79 views28 pages

Elastic Assignment

Elasticsearch is an open-source, distributed, RESTful search and analytics engine. It allows storage and analysis of structured and unstructured data for use cases like log analysis, application monitoring, and security. Elasticsearch uses an inverted index and sharding to provide fast search of large amounts of heterogeneous data across clusters of nodes.

Uploaded by

luqman mahama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views28 pages

Elastic Assignment

Elasticsearch is an open-source, distributed, RESTful search and analytics engine. It allows storage and analysis of structured and unstructured data for use cases like log analysis, application monitoring, and security. Elasticsearch uses an inverted index and sharding to provide fast search of large amounts of heterogeneous data across clusters of nodes.

Uploaded by

luqman mahama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

ELASTICSEARCH

Presented by
Eugene Boadu Anang - 1695817055
Luqman Mahama - 1703243729

GHANA COMMUNICATION TECHNOLOGY


UNIVERSITY

CIET 575 ADVANCED BIG DATA MANAGEMENT


APRIL 2024
WHAT IS ELASTICSEARCH

01 02 03
Elasticsearch is a distributed, Elasticsearch was originally Elasticsearch was developed
open-source, RESTful, Highly opensource software and was in Java making it highly
Scalable search and analytics licensed under Apache portable, allowing it to be run
engine based on the Apache license v2 until v7. across different platforms
Lucene Library and works for It was developed in 2009 by from Unix mainframes to
all types of data, including Shay Bannon, after he was Windows laptops.
textual, numerical, geospatial, inspired to build a recipe
structured and application for his wife.
unstructured.[1]
ELASTICSEARCH USE CASES
Application and Infrastructure Monitoring
• Easily store and analyse log data and set automated alerts for
underperformance. Elasticsearch enables the storage and analysis of
log and machine data and allows administrators to set thresholds for
automated alerts for over utilisation of resources or underperformance.

Fast, Scalable Security and Event Information Management


Full-text Search • Elasticsearch enables the centralization of logs to for real-time security
monitoring and forensic analysis.
• Elasticsearch can be
leveraged to provide
search functionality in Operational Health Tracking
applications,
• Administrators can leverage observability logs, metrics, and traces to
websites, or data lake monitor applications and electronic transactions in real time.
catalogue.
COMPONENTS OF ELASTICSEARCH

DOCUMENT INDEX SHARD REPLICA CLUSTER NODE


•Elasticsearch organizes •Groups of documents are •Indexes can be •These are identical copies •This refers to a networked •Eash server in the cluster
data into documents, referred to as indices horizontally partitioned of a shard or an index that group of elasticsearch is referred to as a node.
which are JSON-based (index), which are a close piece of an index and are stored in an alternate servers which are Master nodes refer to
units of information equivalent to traditional contain all the properties node other than that configured to store, index nodes that manage cluster
representing entities. [2] databases, based on their of an index. However they where the primary and search indices and operations such as
characteristics. [2] have less JSON objects shard/index is stored. It shards. [2] adding, deleting and
than the index and they provides fault tracking of cluster and
can be stored on any node tolerance.[2] shard allocations. Data
in a clustered system. [2] nodes hold shards. [2]
INVERTED INDEX
Document 1 Inverted Index
Document/Term Document ID
GCTU is the premier
technology Stop List GCTU 1 3
univeristy in Ghana
since independence premier 1 2 The inverted index is a
and graduates
thousands each
technology 1 central component of
year. … university 1 search engines it is a
is Ghana 1 2 3 datastructure that maps
Document 2 the independence 1 word to reference
The premier
league in Ghana in graduates 1 3
locations within
thousands 1 3
is adopting and documents enabling the
technology. year 1
a fast indexing and
league 2 3
Document 3 it searching of
adopting 2
of heteregeneous data.[9]
GCTU graduates technology 1 2
have helped … economy 3
Ghana become
the premier Africa 3
economy in thousands 1 3
Africa, putting it
in a league of its wealthy 3
own with citizens 3
thousands of
wealthy
citizens.
HIGH LEVEL ARCHITECTURE
CLUSTER
QUERY REQUEST

KIBANA QUERY RESPONSE NODE DOCUMENT


(JSON)

Data analyzed
Clients issue requests and stored by
and get response from the the engine
server. during
INGESTION ENGINE
indexing.
LOGSTASH A custom ingestion Stream

service ensures APPLICATION


FILEBEAT data is normalized NETWORK FILE STORE DATABASE
before indexed into
elasticsearch.
Data flows in elasticsearch from multiple sources
USE CASE 1 – WELLS FARGO BANK
As the bank's infrastructure evolves, so must
the observability solutions that gather and
monitor this information. Wells Fargo IT team
selected Elastic Observability for distributed
tracing — the ability to follow a user request
at every stage of its journey across complex
and distributed services within an application
and its dependent sub-systems.

Elastic APM helped Wells


Fargo analyze application
flows in near real time while
WELLS FARGO DEPLOYS providing visibility to
ELASTIC OBSERVABILITY development and operations
teams to quickly identify root
TO MONITOR cause, slow performing code,
and resolve issues faster.
APPLICATION HEALTH
AND PERFORMANCE Elastic's flexible deployment model,
ACROSS ITS COMPLEX data lifecycle management
capabilities, and the ability to do
TECHNOLOGY federated search across clusters in the
INFRASTRUCTURE.[4] data center or cloud, delivers a solid
long-term solution given the bank's
plans to migrate to a multi-cloud
environment over the next decade.

(Wells Fargo Deploys Elastic Observability for Distributed Tracing and APM, n.d.)
USE CASE 2 – DISH MEDIA (SATELLITE TV
PROVIDER)
Elastic Security helps safeguard systems from
internal breaches and external threats such as zero-
day attacks. This came with no additional
implementation cost because of the Elastic single
agent. Machine learning plays a key role here too,
enabling the engineering team to identify and rectify
anomalous behaviour before it impacts the business.
DISH MEDIA BOOSTS
AD REVENUE,
ENGINEERING Anomalies across millions of systems and customer devices are
EFFICIENCY, AND spotted much more quickly, accelerating root cause analysis
and remediation from hours to near real time.

CUSTOMER
SATISFACTION WITH
ELASTIC
OBSERVABILITY[5] Eliminated manual analysis and toil that previously
took hours; dashboards and data are now available in
a single pane of glass across the org for quick
analysis
USE CASE 3 – CISCO
Cisco built its search platform using
elasticsearch. It supports internally- Cisco’s Re-imagined Enterprise
and externally- facing search Search Platform powered by AI and
applications and helps increase Elasticsearch ensures cisco.com
employee productivity leading to users receive detailed, easy-to-
customer satisfaction.[6] consume results with direct links to
where relevant content appears to
keep them engaged.

Using Elasticsearch Cisco engineers can


quickly find similar case information,
product bugs, and knowledge articles to
accelerate the resolution of customer
issues.
ELASTIC TERMINOLOGY AS COMPARED TO
TRADITIONAL SQL

MySQL Elasticsearch
Database Index
Table Туре
Row Document
Column Field
Schema Mapping
Index Everything is indexed
SQL Query DSL
SELECT * FROM table ... GET http://...
UPDATE table SET ... PUT http: //...
THE ELASTICSEARCH, LOGSTASH AND KIBANA STACK

Kibana is a powerful visualisation front-end


Elasticsearch is often installed as part of a for the indexed data from elasticsearch
stack of applications known as ELK, which cluster. It provides an interface for searching
stands for Elasticsearch, Logstash and indexed data in elasticsearch and represent
Kibana. data as time series in the form of histograms,
graphs etc.

Logstash is a log management engine with 3


components

• input: collects and parses logs into machine readable

• filters: they normalize logs after ingestion using


conditionalities

•output: forward normalized logs to elasticsearch for indexing


BENEFITS OF ELASTICSEARCH

•Elasticsearch is •Elasticsearch •Elasticsearch •Multiple nodes in •The core search, •Subscription

FAULT TOLERANCE

MACHINE LEARNING
HIGHLY SCALABLE ARCHITECTURE
RESTFUL API
IT IS EASY TO DEPLOY

IT IS FREE
cross platform can be queried can be scaled a cluster hold observability and version of
and can be using a variety of vertically by replicas of shards security Elasticsearch
deployed on programming applying very or indexes, monitoring exposes machine
different types of languages, hence powerful allowing for the functionality of learning
Operating integration into resources to "fat" possibility of Elasticsearch is capabilities for
Systems and applications is machines to nodes to go provided free of diagnostic
computer made easy. increase offline without charge. insights and
architectures. •Interacting with processing and impact to the •However, it is to alerting.
•It can be Elasticsearch storage capacity. data processing be noted that
deployed on server over the •Elasticsearch capability of an since v8 of
premises and in network is made can be scaled elasticsearch Elasticsearch
the cloud and easy by the horizontally by cluster. managed service
supports exposed API clustering providers may
federated endpoint. modestly not modify
clustering Supports Create, powered Elasticsearch
between cloud Read, Update commodity code for their
and on premises and Delete servers. own usage.
nodes. operations via
API.
DRAWBACKS OF ELASTICSEARCH
Elastic search is computationally
expensive, especially when doing
resource intensive tasks such as
indexing, searching and
aggregating data.
Elasticsearch is not a good data store as
other options such as MongoDB, Hadoop,
etc. It performs well for small use cases, but
in case of streaming of TB's data per day, it
either chokes or loses the data.

Elasticsearch is susceptible to the “split brain” problem where due


to interruptions in communications between two master nodes in
a cluster both elect themselves as master to meet the quorum
requirement for the cluster to come online.

Opensearch which is a fork of elastic search maintains the ability to leverage machine
learning capabilities for free whereas elasticsearch offers paid subscription model.
SIMULATION DESIGN AND METHODOLOGY

OBJECTIVES EXPERIMENTAL SETUP OUTCOMES


•We were able to export our “big data” in this instance
• Simulate an enterprise network consisting of IP • Dell MicroPC: Baremetal virtualization host network traffic logs and flows from the router to
core network, a virtualized datacenter and running proxmox virtual environment to host 3 logstash virtual server.
wireless access network with client. virtual machines. •The logstash virtual server was able to ingest and
• Ingest and analyze data generated from • Logstash- Log ingestions and normalization normalize the logs forwarding them in turn to
network flows between clients, servers and elasticsearch server.
• Kibana - Visualization
destinations on the internet. •Logs were successfully stored in elasticsearch as
• Elasticsearch – Indexing and storage of logs index “net-6” and “ds-filebeat-8.12.2-2024.03.17-
• Mikrotik RB5009: Routing network traffic, 000001”
exporting network flows. •We successfully queried the elasticsearch server via
• Raspberry Pi 4B: Simulate internet traffic by command line using the cURL and kibana frontend and
saw the JSON output.
running automated speedtests.
•We successfully visualized data in “ds-filebeat-8.12.2-
2024.03.17-000001” index to gain insight into traffic
flows on our simulated IP network.
SIMULATION NETWORK TO TEST
INFRASTRUCTURE MONITORING USE CASE
QUERYING OUR ELASTICSEARCH CLUSTER FOR
THE FIRST TIME
CONFIGURE FILEBEAT TO RECEIVE
NETFLOW FROM ROUTER
CONFIGURE LOGSTASH TO RECEIVE LOGS
FROM NETWORK DEVICES
QUERY AN INDEX IN ELASTISEARCH
INDEX CREATED FROM LOGSTASH OUTPUT
INDEX CREATED FROM FILEBEAT OUTPUT
VIEWING AN INDEX IN KIBANA
VIEWING AN INDEX IN KIBANA
VIEWING DOCUMENTS IN KIBANA
VISUALIZATION OF TRAFFIC DESTINATION AND SOURCES IN
SIM NETWORK
VISUALIZATION OF GEOGRAPHIC SOURCES AND DESTINATIONS OF TRAFFIC IN SIM
NETWORK
VISUALIZATION OF DESTINATION AND SOURCE ASN IN
SIM NETWORK
REFERENCES
• [1]Elastic, “GitHub - elastic/elasticsearch: Free and Open, Distributed, RESTful Search Engine,” GitHub. [Online]. Available:
https://github.com/elastic/elasticsearch
• [2]R. Kuć and M. Rogoziński, Mastering Elasticsearch - Second Edition. Packt Publishing Ltd, 2015. [Online]. Available:
http://books.google.ie/books?id=8kLfBgAAQBAJ&printsec=frontcover&dq=978-1-78328-143-5&hl=&cd=3&source=gbs_api
• [3]M. Konda, Elasticsearch in Action, Second Edition. Simon and Schuster, 2024. [Online]. Available:
http://books.google.ie/books?id=qfjYEAAAQBAJ&pg=PR6&dq=9781617299858&hl=&cd=1&source=gbs_api
• [4]“Wells Fargo deploys Elastic Observability for distributed tracing and APM,” Elastic Customers. [Online]. Available:
https://www.elastic.co/customers/wells-fargo
• [5]“DISH Media boosts targeted ad revenue with Elastic Observability,” Elastic Customers.
https://www.elastic.co/customers/dish-media
• [6]“Cisco Transforms Enterprise Search with AI and Elastic,” Elastic Customers. https://www.elastic.co/customers/cisco
• [7]“Machine learning,” OpenSearch Documentation, Apr. 03, 2024. https://opensearch.org/docs/latest/ml-commons-
plugin/
• [8]“Official Elasticsearch Pricing: Elastic Cloud, Managed Elasticsearch,” Elastic. https://www.elastic.co/pricing
• [9]“A first take at building an inverted index.” https://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-
inverted-index-1.html

You might also like