Elasticsearch

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Elasticsearch

What is Elasticsearch?

1. Elasticsearch is a free and open sourced inverted index created by Shay Banon
2. First public release happened on 2010
3. It was developed in Java, so inherently cross-platform
4. Used by:
a. LinkedIn
b. Netflix
c. Facebook
d. Ebay
e. GitHub and many more...
Why Elasticsearch?

1. Easy to scale
2. Everything is one JSON call away [RESTful API]
3. Minimizes the chance of data loss
4. Multi-tenancy
5. Document oriented
6. Schema free
7. Widely used, so a bigger community
Fundamentals
Cluster:

1. A cluster consists of one or more nodes which share the same cluster name.
2. Each cluster has a single master node which is automatically chosen by the cluster
3. The current master node can be replaced if it fails.

Node:

1. A node is a running instance of ES which belongs to a cluster


2. Multiple nodes can be started on a single server for testing purposes, but usually should
have one node per server
3. At startup, a node will discover an existing cluster with the same cluster name and will try
to join that cluster

Index:

1. An index is like a DB in a relational DB


2. It has mapping which defines multiple types
Type:

1. A type is like a table in a relational DB


2. Each type has a list of fields that can be specified for the document of that type

Document:

1. A document is a JSON document which is stored in ES


2. It is like a row in a table in a relational DB
3. Each Document is stored in the index, it will be having a type and an ID

Field:

1. A document will contain a list of fields [Key value pairs]


2. The value can be simple or a nested structure like an array or an object
Mapping:

1. A Mapping is like a “schema definition” in a relational DB


2. Each index has a mapping.
3. It will be either be defined explicitly or it will be generated automatically when a
document is created

Shard:

1. It is a low-level “worker” unit which is managed automatically by ES


2. An index is a logical name space which points to primary and replica shards
3. ES distributes shards amongst all nodes in the cluster and can move shards automatically
from one node to the other [Incase of a failure or addition of new nodes]
Primar shard:

1. Each document is stored in a single primary shard.


2. When we index a document, it is indexed first on the primary shard, then on all replicas
3. By default, an index has 5 primary shards [This can be configured - for scaling]

Replica shard:

1. Each primary shard can have 0 or more replicas


2. A Replica is a copy of the primary shard and has two purposes:
a. Increase failover
b. Increase performance
Elasticsearch vs MySQL

1. Indices -> Databases


2. Types -> Tables
3. Documents -> Rows
4. Keys -> Columns

Performance comparison: [core i7 2Ghz, 8GB RAM, 128GB SSD]

1. Insert [10 million datasets]: 23/56 Mins


2. Select [100 full entries]: 5/9 ms
3. Select [next 100 full entries]: 4/18 ms
Hands on development starts...
Installation and setup

1. Set Java Paths


2. Install ES
3. ES will be started at http://localhost:9200/
4. Install Kibana
5. Change Kibana.yml with the needed configs
6. Kibana will be started at http://localhost:5601/
7. Dev tool can be accessed at http://localhost:5601/app/dev_tools#/console
Queries

1. List all indices


2. Create the index
3. Add doc to the index
4. Search all doc in the index
5. Search docs with filter in the index
6. Find the mappings of the index
7. Update the document in the index
8. Delete the doc in the index
9. Delete the index
10. How to handle multiple indices
Java code for Elasticsearch

1. Create the Model


2. Define the connection parameters
3. Making the connections
4. Closing the connections
5. Inserting the data
6. Getting the data
7. Updating the data
8. Deleting the data
9. Putting everything to-gether
Thank you

You might also like