ElasticSearch IEEE Format1
ElasticSearch IEEE Format1
Shrasti gupta,[email protected]
Abstract— ElasticsSearch is an open-source, RESTful, the context of an e-commerce website, for example, you can
distributed search and analytics engine built on Apache have an index for Customers, one for Products, one for Orders,
Lucene. Elasticsearch is best known for the expansive and and so on. An index is identified by a name that is used to
versatile RESTAPI experience it provides, including efficient refer to the index while performing indexing, search, update,
wrappers for full-text search, sorting and aggregation tasks, and delete operations against the documents in it.
making it a lot easier to implement such capabilities in
existing backends without the need for complex re- D. INVERTED INDEX
engineering.
An index in Elasticsearch is actually whats called an
KeywordsElasticSearch;Kibana;Logash;Beats;AmazonKinesis inverted index, which is the mechanism by which all search
Firehose engines work. It is a data structure that stores a mapping from
content, such as words or numbers, to its locations in a
I. INTRODUCTION document or a set of documents. Basically, it is a hashmap-
like data structure that directs you from a word to a document.
Elasticsearch was released in 2010, it has quickly become
An inverted index doesnt store strings directly and instead
the most popular search engine, and is commonly used for
splits each document up to individual search terms (i.e. each
log analytics, full-text search, and operational intelligence
word) then maps each search term to the documents those
use cases.
search terms occur within. For example, in the image below,
When coupled with Kibana, a visualization tool, Elasticsearch
the term best occurs in document 2, so it is mapped to that
can be used to provide near-real time analytics using large
document. This serves as a quick look-up of where to find
volumes of log data. Elasticsearch is also popular because of
search terms in a given document. By using distributed
its easy-to-use search APIs which allow you to easily add
inverted indices, Elasticsearch quickly finds the best matches
powerful search capabilities to your applications.
for full-text searches from even very large data sets.
II. BASIC CONCEPT
E. BACKEND COMPONENTS
A. LOGICAL CONCEPTS
To better understand how Elasticsearch works, lets cover Cluster
some basic concepts of how it organizes data and its backend An Elasticsearch cluster is a group of one or more node
components. instances that are connected together. The power of an
Elasticsearch cluster lies in the distribution of tasks,
B. DOCUMENTS searching, and indexing, across all the nodes in the cluster.
Documents are the basic unit of information that can be
indexed in Elasticsearch expressed in JSON, which is the global Node
internet data interchange format. You can think of a A node is a single server that is a part of a cluster. A node
document like a row in a relational database, representing a stores data and participates in the clusters indexing and
given entity the thing youre searching for. In Elasticsearch, search capabilities. An Elasticsearch node can be configured
a document can be more than just text, it can be any in different ways:
structured data encoded in JSON. That data can be things like
numbers, strings, and dates. Each document has a unique ID
and a given data type, which describes what kind of entity the Master Node
document is. For example, a document can represent an Controls the Elasticsearch cluster and is responsible for all
encyclopedia article or log entries from a web server. cluster-wide operations like creating/deleting an index and
adding/removing nodes.
C. INDICS
An index is a collection of documents that have similar
characteristics. An index is the highest level entity that you Data Node
can query against in Elasticsearch. You can think of the index
as being similar to a database in a relational database schema.
Any documents in an index are typically logically related. In
Stores data and executes data-related operations such as B. LOGASTH
search and aggregation. Logstash is used to aggregate and process data and send it
Client Node to Elasticsearch. It is an open-source, server-side data
processing pipeline that ingests data from a multitude of
Forwards cluster requests to the master node and data-
sources simultaneously, transforms it, and then sends it to
related requests to data nodes . collect. It also transforms and prepares data regardless of
format by identifying named fields to build structure, and
III. HOW DOES IT WORK transform them to converge on a common format. For
You can send new data, called documents, to example, since data is often scattered across different systems
in various formats, Logstash allows you to tie different systems
Elasticsearch using the API or ingestion tools such as
together like web servers, databases, Amazon services, etc.
Logstash and Amazon Kinesis Firehose.
and publish data to wherever it needs to go in a continuous
Elasticsearch automatically stores the original document
and adds a searchable reference to the document in the streaming fashion .
clusters index.
You can then search and retrieve the document using the C. BEATS
Elasticsearch API which is very easy-to-use. Beats is a collection of lightweight, single-purpose data
You can also use Kibana, an open-source analytics and shipping agents used to send data from hundreds or thousands
visualization tool, to search, analyze, and dashboard your of machines and systems to Logstash or Elasticsearch. Beats
data. are great for gathering data as they can sit on your servers,
with your containers, or deploy as functions then centralize
IV. THE ELASTIC STACK data in Elasticsearch. For example, Filebeat can sit on your
Elasticsearch is the central component of the Elastic Stack, server, monitor log files as they come in, parses them, and
a set of open-source tools for data ingestion, enrichment, import into Elasticsearch in near-real-timesentence.
storage, analysis, and visualization. It is commonly referred to
as the ELK stack after its components Elasticsearch, VI . BENEFITS
Logstash, and Kibana and now also includes Beats. Although a
search engine at its core, users started using Elasticsearch for Query and Analyze
log data and wanted a way to easily ingest and visualize that a. QUERY
data . BE CURIOUS. ASK YOUR DATA QUESTIONS OF ALL KINDS.
Perform and combine many types of searches
A. KIBANA structured, unstructured, geo, metric any way you
Kibana is a data visualization and management tool for want.
Elasticsearch that provides real-time histograms, line graphs, All data types are welcome.
pie charts, and maps. It lets you visualize your Elasticsearch b. ANALYZE
data and navigate the Elastic Stack. You can select the way
you give shape to your data by starting with one question to STEP BACK AND UNDERSTAND THE BIGGER PICTURE.
find out where the interactive visualization will lead you. For
example, since Kibana is often used for log analysis, it allows Its one thing to find the 10 best documents to match
you to answer questions about where your web hits are your query. But how do you make sense of, say, a
coming from, your distribution URLs, and so on. If you re not billion log lines?
building your own application on top of Elasticsearch, Kibana Elasticsearch aggregations let you zoom out to
is a great way to search and visualize your index with a explore trends and patterns in your data.
powerful and flexible UI. However, a major drawback is that Speed and Scalability
every visualization can only work against a single index/index
pattern. So if you have indices with strictly different data, c. SPEED
youll have to create separate visualizations for each. For
more advanced use cases, Knowi is a good option. It allows ELASTICSEARCH IS FAST. REALLY, REALLY FAST.
you to join your Elasticsearch data across multiple indexes
and blend it with other SQL/NoSQL/REST-API data sources, When you get answers instantly, your relationship
then create visualizations from it in a business-user friendly with your data changes. You can afford to iterate and
UI. cover more ground.
And since everything is indexed, you're never left Quora
with index envy. You can leverage and access all of Reverb
your data at ludicrously awesome speeds.
SeatGeek
d. SCALABILITY Slurm Workload Manager
RUN IT ON YOUR LAPTOP. OR HUNDREDS OF SERVERS WITH PETABYTES SoundCloud
OF DATA. Stack Exchange
Go from prototype to production seamlessly; you talk StumbleUpon
to Elasticsearch running on a single node the same Team Foundation Server
way you would in a 300-node cluster. Vimeo
It scales horizontally to handle kajillions of events Wikimedia Foundation
per second, while automatically managing how
indices and queries are distributed across the cluster
for oh-so smooth operations.
VII. USES
ACKNOWLEDGMENT
The preferred spelling of the word “acknowledgment” in
Amazon Web Services America is without an “e” after the “g.” Avoid the stilted
Adobe Systems expression one of us (R. B. G.) thanks .... Instead, try “R. B.
G. thanks...”. Put sponsor acknowledgments in the
Center for Open Science unnumbered footnote on the first page.
CERN
Facebook
Foursquare REFERENCES
GitHub
Lichess
[1] https://www.knowi.com/blog/what-is-elastic-search/
Mozilla [2] https://prezi.com/p/i3radghwlytn/elasticsearch-ppt/
Netflix [3] https://towardsdatascience.com/elasticsearch-for-data-science-just-
got-way-easier-95912d724636.
Oracle Corporation
[4] https://www.elastic.co/start?ultron=MS-EL-B-Trials-AMER-NA-
Pixabay BMM&Device=c&thor=
%2Belasticsearch&msclkid=efe1ef65517a1b875f88b0118a2d5a77.
Quizlet