SlideShare a Scribd company logo
Introduction to Elasticsearch
Praveen Manvi July 2016
Agenda
• Overview
– History, Product overview
– ES Vocabulary
– Feature set
• Demo
– Setup/ Configuration
– Eco system
– APIs for Index/Search & monitor
What is ElasticSearch?
– Document (Json) oriented search engine
– Distributed
– Horizontally scalable and Highly Available
– Multi-tenancy enabled
– API centric & RESTful
– Built on Lucene search engine library
& used for
– full-text search, structured search, analytics, or all
three in combination
• Elastic search has become de facto search
solution
• few popular examples
• GitHub uses Elasticsearch to query 130 billion lines of
code.
• Wikipedia uses Elasticsearch to provide full-text search
with highlighted search snippets, and search-as-you-
type and did-you-mean suggestions.
• Stack Overflow combines full-text search with
geolocation queries and uses more-like-this to find
related questions and answers.
History
Shay Benon @kimchy
Doug Cutting @cutting
Started Lucene in 1999, released under apache in
2005.
Now part of cloudera supporting rival solution solr
and commercial offerings
Elasticsearch released in February 2010.
Worked on this for 6 years (started with compass)
Now part of http://elastic.co commercial offerings
Building Blocks
Term Description ( ~analogy with relational database)
Cluster ~Database cluster
Group of nodes
Node ~Instance of database
A JVM process, usually a machine
Index ~Database schema
Hosts mapping types and their definitions contains
many shards
Mapping Type ~Database Table
Field description, indexing requirements
Document ~Database row
Json document.
Shard A Lucene index. Scalable unit and heart of search
engine (primary and replica)
Physical Layout
Logical Layout
Lucene Inverted Index
value add over lucene
• Distributed
– Combines results with fork join against multiple indexes, with the new building blocks
• Transaction Log
– The transaction log guarantees durability, Operations are automatically replayed when a
shard is reopened
– It also simplifies shard relocation/recovery, Helps when moving a shard from one node
to another by being able to replay the changes while transferring committed segments
• Flush/Refresh/Monitor APIs
– For managing the cluster/node/index statuses
• Query DSL
– provides huge set of grammar for search syntax
mapping/index/search docs
Document Metadata Fields
• _id - The id of the document
• _type - The document type
• _source - enabled Stores the original document that
was indexed
• _all enabled Indexes all values of all document fields
• _timestamp disabled timestamp associated with the
document
• _ttl disabled optionally defines an expiration time
• _size disabled indexes the size of the uncompressed
Search Controller
Query DSL
Search request in place
Search Types
• COUNT
• Returns no hits, only total count matching the query,
thus executes in a
• single round trip to the shards
• SCAN
• Allows to iterate over large amounts of data using a
cursor to paginate and hence memory efficient, helpful
for re-indexing and decorating data outside the ES.
• SEARCH
• General search
Aggregation
Aggregations
Nested Aggregations
Introduction to elasticsearch
Few interesting Features
• Bulk Indexing
– Send multiple docs to ES
• Multi Get APIs
– Get multiple documents in a single API
• Percolator
– The idea is to have ES to notify your application when new content matches your filters
instead of having to constantly poll the search engine to check for new updates. Great
for building alerts
• Pagination
• Highlighting
Eco System
(debug tools/development)
Client SDKs
Plugins
•head
•Elastic HQ
•Marvel
•BigDesk
[ES_HOME/bin]./plugin install head
Configuration
• Enabling store compression uses 55% less
storage (LZF/snappy)
• Disabling the '_all' field saves you 13% in
storage.
• Removing _source saves ~26% storage on disk
• ES_HEAP_SIZE set it ½ of the machine memory
(os file cache)
• bootstrap.mlockall to true avoids swap
References
• https://www.youtube.com/watch?v=5444z-L2V2A&spfreload=1 - “Lucene now
and then” from Lucene creator Doug Cutting @ twitter, Gives history and how
lucene evolved.
• https://www.youtube.com/watch?v=lpZ6ZajygDY - from elastic search creator
Shay Benon (Its 3 years old, but its very good content on data design patterns)
• https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html -
Official documentation from elasticsearch
• https://www.manning.com/books/elasticsearch-in-action - From this place
diagrams were picked in this presentation

More Related Content

ODP
Elasticsearch for beginners
PDF
Elasticsearch
PPTX
ElasticSearch Basic Introduction
PDF
Introduction to elasticsearch
PPTX
Elastic search overview
PDF
Introduction to Elasticsearch
PPTX
Introduction to Elasticsearch with basics of Lucene
ODP
Deep Dive Into Elasticsearch
Elasticsearch for beginners
Elasticsearch
ElasticSearch Basic Introduction
Introduction to elasticsearch
Elastic search overview
Introduction to Elasticsearch
Introduction to Elasticsearch with basics of Lucene
Deep Dive Into Elasticsearch

What's hot (20)

PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
ODP
Elasticsearch presentation 1
PPTX
Elasticsearch
PPTX
An Introduction to Elastic Search.
PDF
Elasticsearch From the Bottom Up
PPTX
Elastic Stack Introduction
PPTX
Elasticsearch Introduction
PPSX
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
PPTX
ELK Stack
PDF
Elasticsearch
PPTX
The Elastic ELK Stack
PPTX
Elastic stack Presentation
PPTX
Introduction to Elasticsearch
PPTX
An Intro to Elasticsearch and Kibana
PPTX
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
PDF
Elasticsearch for Data Analytics
PPTX
Elasticsearch
PDF
Introduction To Kibana
PDF
Introducing ELK
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch presentation 1
Elasticsearch
An Introduction to Elastic Search.
Elasticsearch From the Bottom Up
Elastic Stack Introduction
Elasticsearch Introduction
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
ELK Stack
Elasticsearch
The Elastic ELK Stack
Elastic stack Presentation
Introduction to Elasticsearch
An Intro to Elasticsearch and Kibana
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Elasticsearch for Data Analytics
Elasticsearch
Introduction To Kibana
Introducing ELK
Ad

Similar to Introduction to elasticsearch (20)

PDF
Roaring with elastic search sangam2018
PDF
Elasticsearch Introduction at BigData meetup
PDF
Explore Elasticsearch and Why It’s Worth Using
PPTX
ElasticSearch in Production: lessons learned
PPTX
Episerver and search engines
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PDF
Hyperspace: An Indexing Subsystem for Apache Spark
PDF
Introduction to Azure Data Lake
PPTX
Populate your Search index, NEST 2016-01
PPTX
Elastic pivorak
PPTX
Elastic search
PPTX
06 integrate elasticsearch
PDF
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
PPTX
Elastic & Azure & Episever, Case Evira
PPTX
Elasticsearch Meetup - August 2018 - SocialCops
PPTX
ElasticSearch as (only) datastore
PDF
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
PPTX
Filebeat Elastic Search Presentation.pptx
PDF
Elasticsearch and Spark
PPTX
Elasticsearch - Scalability and Multitenancy
Roaring with elastic search sangam2018
Elasticsearch Introduction at BigData meetup
Explore Elasticsearch and Why It’s Worth Using
ElasticSearch in Production: lessons learned
Episerver and search engines
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Hyperspace: An Indexing Subsystem for Apache Spark
Introduction to Azure Data Lake
Populate your Search index, NEST 2016-01
Elastic pivorak
Elastic search
06 integrate elasticsearch
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
Elastic & Azure & Episever, Case Evira
Elasticsearch Meetup - August 2018 - SocialCops
ElasticSearch as (only) datastore
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Filebeat Elastic Search Presentation.pptx
Elasticsearch and Spark
Elasticsearch - Scalability and Multitenancy
Ad

Recently uploaded (20)

PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PPTX
Big Data Technologies - Introduction.pptx
PDF
Sensors and Actuators in IoT Systems using pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Transforming Manufacturing operations through Intelligent Integrations
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
madgavkar20181017ppt McKinsey Presentation.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 2 Digital Image Fundamentals.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Review of recent advances in non-invasive hemoglobin estimation
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
MYSQL Presentation for SQL database connectivity
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Big Data Technologies - Introduction.pptx
Sensors and Actuators in IoT Systems using pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Introduction to elasticsearch

  • 2. Agenda • Overview – History, Product overview – ES Vocabulary – Feature set • Demo – Setup/ Configuration – Eco system – APIs for Index/Search & monitor
  • 3. What is ElasticSearch? – Document (Json) oriented search engine – Distributed – Horizontally scalable and Highly Available – Multi-tenancy enabled – API centric & RESTful – Built on Lucene search engine library & used for – full-text search, structured search, analytics, or all three in combination
  • 4. • Elastic search has become de facto search solution • few popular examples • GitHub uses Elasticsearch to query 130 billion lines of code. • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you- type and did-you-mean suggestions. • Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
  • 5. History Shay Benon @kimchy Doug Cutting @cutting Started Lucene in 1999, released under apache in 2005. Now part of cloudera supporting rival solution solr and commercial offerings Elasticsearch released in February 2010. Worked on this for 6 years (started with compass) Now part of http://elastic.co commercial offerings
  • 6. Building Blocks Term Description ( ~analogy with relational database) Cluster ~Database cluster Group of nodes Node ~Instance of database A JVM process, usually a machine Index ~Database schema Hosts mapping types and their definitions contains many shards Mapping Type ~Database Table Field description, indexing requirements Document ~Database row Json document. Shard A Lucene index. Scalable unit and heart of search engine (primary and replica)
  • 10. value add over lucene • Distributed – Combines results with fork join against multiple indexes, with the new building blocks • Transaction Log – The transaction log guarantees durability, Operations are automatically replayed when a shard is reopened – It also simplifies shard relocation/recovery, Helps when moving a shard from one node to another by being able to replay the changes while transferring committed segments • Flush/Refresh/Monitor APIs – For managing the cluster/node/index statuses • Query DSL – provides huge set of grammar for search syntax
  • 12. Document Metadata Fields • _id - The id of the document • _type - The document type • _source - enabled Stores the original document that was indexed • _all enabled Indexes all values of all document fields • _timestamp disabled timestamp associated with the document • _ttl disabled optionally defines an expiration time • _size disabled indexes the size of the uncompressed
  • 16. Search Types • COUNT • Returns no hits, only total count matching the query, thus executes in a • single round trip to the shards • SCAN • Allows to iterate over large amounts of data using a cursor to paginate and hence memory efficient, helpful for re-indexing and decorating data outside the ES. • SEARCH • General search
  • 21. Few interesting Features • Bulk Indexing – Send multiple docs to ES • Multi Get APIs – Get multiple documents in a single API • Percolator – The idea is to have ES to notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates. Great for building alerts • Pagination • Highlighting
  • 25. Configuration • Enabling store compression uses 55% less storage (LZF/snappy) • Disabling the '_all' field saves you 13% in storage. • Removing _source saves ~26% storage on disk • ES_HEAP_SIZE set it ½ of the machine memory (os file cache) • bootstrap.mlockall to true avoids swap
  • 26. References • https://www.youtube.com/watch?v=5444z-L2V2A&spfreload=1 - “Lucene now and then” from Lucene creator Doug Cutting @ twitter, Gives history and how lucene evolved. • https://www.youtube.com/watch?v=lpZ6ZajygDY - from elastic search creator Shay Benon (Its 3 years old, but its very good content on data design patterns) • https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - Official documentation from elasticsearch • https://www.manning.com/books/elasticsearch-in-action - From this place diagrams were picked in this presentation