Elasticsearch: Getting Started With Elasticsearch
Elasticsearch: Getting Started With Elasticsearch
What is Elasticsearch?
Elasticsearch Features
Elasticsearch Architecture
Advantages of Elasticsearch
Elasticsearch Use-cases
Elasticsearch Vs. RDBMS
Elasticsearch Vs. MongoDB
Elasticsearch Vs. Solr
Current Demand and Future of Elasticsearch
What is Elasticsearch?
First, let us understand why Elasticsearch was invented. Consider one example where customers are
looking for some product information from huge product volume. But the system is taking too much
time for information retrieval due to large volume of data. This in turn leads to poor user experience and
there may be the chances to lose the potential customer due to same. RDBMS works slow when it
comes to large amount of data. To overcome this problem, Elasticsearch was invented.
Elasticsearch is a document-based system which stores, manages and retrieves document oriented or
semi-structured data. Data is stored in JSON document format in Elasticsearch. It is also schema-less. It
is a NoSQL database which uses Lucene search engine
Elasticsearch uses Query Domain Specific Language to interact with data. Here queries are written in
JSON format. With the help of Query DSL, we can accommodate all the complex logic in a single query.
Query DSL is designed to handle all real-world complex logics in a single query.
Elasticsearch Features
Below are features offered by Elasticsearch:
Elasticsearch Architecture
Elasticsearch is not a data store primarily. But technically yes, we can make it a data store. Elasticsearch
stores documents and its versions. If two process simultaneously starts writing to a document, latest
version will be kept. It doesn’t support ACID properties like database.
Each node in a cluster contributes to the searching and indexing capabilities of cluster. For example,
if we have run some search query, each node will execute that to search through the data it stores.
Each node supports searching, indexing, manipulating of existing data.
Documents and Indices
Whatever data item we store in cluster is nothing but the document. Document is a JSON object
here and we can relate it to rows in database terminology. For example, if you want to store a
student then you will add one object having name and standard as its properties. As we are aware
that data will be spread across all the nodes, but do we know how to organize it? These documents
are stored under indices. Index is defined on the collection of documents having similar properties
or we can say logically related. For instance, an index for orders’ data, products’ data and
customers’ data.
Documents have their unique ID, which can be assigned by Elasticsearch or by users while adding
them to index. Any document is uniquely identified by its ID and index. There is no limit to number
of documents being added to index.
Indices are also identified by its name. Their names can be used to search for any document.
Shards and replicas
Elasticsearch uses Lucene technology for faster retrieval of data. It uses the power of Lucene index
in distributed system to retrieve data extremely fast. Shards are termed as individual instances of
Lucene index. As data volume increases, index performance also slows down. To overcome this,
Elasticsearch uses shards to divide indexes and multiple pieces. Shards are important due to below
two reasons.
1. Shards enable us to divide the content horizontally
2. Shards allow parallel operations across multiple nodes which in turn increases performance.
Replicas are invented to avoid any unexpected network failure. Replica shards as its name implies
are replicas of index’s shards. Replicas are important in Elasticsearch architecture for below 2
reasons.
1. In case of shard or node failure, it will act as a life savior for Elasticsearch. Replica shard is
never associated to that node on which primary shard is defined
2. Due to replica shards, we can increase the throughput and performance as parallel search
can happen on replica shards as well.
While creating index, we can choose number of shards and its replicas. Although, we can change
number of replicas dynamically anytime.
Elasticsearch Advantages
Below are few advantages of Elasticsearch:
Elasticsearch is built on Lucene – a full-featured information retrieval library. So, it gives the
most efficient and powerful full-text search capabilities of an open source product. It will be
great as it is widely known by developers.
Elasticsearch has implemented a lot of features like Facetted search, customized stemming,
customized splitting text into words, etc.
Elasticsearch supports fuzzy search. As you can find even though there are spelling mistakes in
search text.
Elasticsearch supports intelisense feature which autocompletes your search text by predicting
your search based on your search history or completing your text with existing tags. For
example, Google search.
As Elasticsearch is API driven, any action can be performed using a RESTful API.
Elasticsearch stores any changes in data in transaction loss which reduces the risk of data loss.
As Elasticsearch is distributed in nature, it is very easy to scale and integrate Elasticsearch in any
organization.
Elasticsearch supports faceted search which is like having multiple filters on data along with
classification system over them. This search is more robust in nature than normal text-search.
Elasticsearch implements multi-tenancy in a better way as a large Elasticsearch index.
Using Elasticsearch’s query DSL, it is very easy to prepare complex queries and tune them
precisely. Moreover, query DSL provides a way to rank and group the results.
As Elasticsearch uses JSON objects, it is very easy to communicate with other various
programming languages.
Elasticsearch Use-cases
Below are few use-cases for Elasticsearch:
An online store which allows its customers to explore all the products they sell. In this case, you
can use Elasticsearch to store the whole product inventory and catalog. It also allows user to
search and use autocomplete option.
Consider a scenario where you need to store log or transactions which you can use to analyze
trends, summarizations, anomalies or statistics. In this case, you can use Logstash, a part of ELK
Stack (Elasticsearch/Logstash/Kibana), to store and parse your data. Logstash helps you to feed
data into Elasticsearch.
Have you seen the button “Notify me if item in stock” or “Notify me if price of this item falls
down” on e-commerce sites? This feature can be achieved with the help of Elasticsearch. Using
Elasticsearch, you can reverse-search and have a watch on price movements or stock
movements and send the alerts to customers once conditions are satisfied.
Consider the requirement where you need to quickly analyze the data and visualize it. In this
case, Kibana can be best used with Elasticsearch. Elasticsearch is used to store data and Kibana
visualize that data in various custom dashboards. Kibana is a part of ELK Stack (Elasticsearch,
Logstash, Kibana).
Elasticsearch RDBMS
Semi-structured or unorganized data Structured and organized data
Eventual Consistency Tight Consistency
BASE transactions ACID transactions
No Pre-defined Schema Data and relationships stored in tables.
Index Database
Shard Partition
Type Table
Document Row
Field Column
Mapping Schema
Everything is indexed Index
Query DSL SQL
Supported Programing .NET, Java, JavaScript, Perl, Scala, .NET, Java, JavaScript, Perl, Scala,
Languages PHP, Python, Ruby, Erlang PHP, Python, Ruby, Erlang, XML
Conclusion
Elasticsearch stands out from all its competitors as it is highly scalable and widely distributed in nature.
If you have large volume of data and you want a faster search, then there is no way you can find
anything which is as good as Elasticsearch.