0% found this document useful (0 votes)
5 views513 pages

Elastic DB Engineer

For elastic DB engineering foundation

Uploaded by

hengkiat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views513 pages

Elastic DB Engineer

For elastic DB engineering foundation

Uploaded by

hengkiat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 513

6.6.

elastic.co/training
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
An Elastic Training Course
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Elasticsearch Engineer II
Elasticsearch Engineer II
Course: Elasticsearch Engineer II

Version 6.6.0

© 2015-2019 Elasticsearch BV. All rights reserved. Decompiling, copying, publishing and/or distribution without written consent of Elasticsearch BV is
strictly prohibited.

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !2


distributing without written permission is strictly prohibited
Welcome to This Virtual Training
• We will start momentarily
• The training will start with an audio/video test, to make sure
that everyone can hear and see the instructors
• To prevent any audio/video issues, please:
‒ disable any ad blockers or script blockers
‒ use a supported web browser: Chrome or Firefox

De
aHr
WB
N
Dom
• In case of problems, try the following steps in order:
sSt
Nu
-OC
TI
U19
2L0
Sr-O

‒ refresh this web page


ASp
1-I
-0N
-
1rt9
p0o

‒ open this page in an "incognito" or "private" window


rp-2
Spu
e-A
0is2
p-r
r
IeN

‒ try another web browser


Hnt
.t CE
.Lo
nHd
ar
Le

‒ as a last resort, restarting your computer sometimes helps too

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !3


distributing without written permission is strictly prohibited
Welcome to This Training
• Visit training.elastic.co and log in
‒ follow instructions from registration email to get access

• Go to "My Account" and click on today's training

• Download the PDF file (this contains all the slides)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• Click on "Virtual Link" to access the Lab Environment


-
1rt9
p0o
rp-2
Spu
e-A
0is2

‒ create an account
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

‒ you will need an access token, which the instructor will provide
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !4


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt

Agenda and
IeN
rp-r

Introductions
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
About This Training
• Environment
• Introductions
• Agenda...

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !6


distributing without written permission is strictly prohibited
Course Agenda
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

De
aHr
WB
6 Capacity Planning

N
Dom
sSt
Nu
-OC
TI
U19

7 Document Modeling
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

8
p0o

Monitoring and Alerting


rp-2
Spu
e-A
0is2
p-r

9
r
IeN

From Dev to Production


Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !7


distributing without written permission is strictly prohibited
Datasets
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Static Data vs. Time Series Data
• In general, we can categorize most data in our customers’
use cases as one of the following:
‒ (relatively) static data: a large (or small) dataset that may grow
or change slowly, like a catalog or inventory of items
‒ time series data: event data associated with a moment in time
that typically grows rapidly, like log files or metrics
• Elasticsearch works great for both types of data

De
aHr
WB
N
Dom
‒ and therefore we will use two datasets in the course…
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !9


distributing without written permission is strictly prohibited
Static Dataset
• Our static dataset is a collection of Elastic blog posts:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !10


distributing without written permission is strictly prohibited
Our Blogs Dataset
• The blogs index contains blog posts from elastic.co/blog:

{
"publish_date": "2017-11-10T07:00:00.000Z",
"seo_title": "Apply for an Elastic{ON} Opportunity Grant Today!",
"category": "News",
"locales": "de-de,fr-fr",
"title": "Apply for an Elastic{ON} Opportunity Grant Today!",
"content": " For the past few years, our developer relations team has been
running an informal scholarship program of sorts to help folks from

De
aHr
underrepresented groups in technology attend Elastic{ON}. ...",

WB
N
Dom
"author": "Anna Ossowski",

sSt
Nu
"url": "/blog/apply-for-an-elasticon-opportunity-grant-today" -OC
TI
U19
}
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !11


distributing without written permission is strictly prohibited
What do we want to build with our data?
• We want users to be able to search our blogs
‒ and get relevant and meaningful search results

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !12


distributing without written permission is strictly prohibited
Our Logging Dataset
• We also have the web access logs for elastic.co/blog:
{
"@timestamp": "2017-07-30T03:51:05.551Z",
"language": {
"url": "/blog/2011/08/05/0.17.4-released.html",
"code": "en-us"
},
"method": "GET",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X
10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/
70.0.3538.77 Safari/537.36",
"response_size": 37857,
"host": "server1",
"http_version": "1.1",

De
aHr
"status_code": 404,

WB
N
Dom
"runtime_ms": 142,

sSt
"geoip": {
Nu
"country_code3": "US", -OC
TI
U19
2L0

"location": {
Sr-O
ASp

"lon": -122.1206,
1-I
-0N

"lat": 47.6801
-
1rt9

},
p0o
rp-2

"region_name": "Washington",
Spu
e-A

"city_name": "Redmond",
0is2
p-r

"country_code2": "US",
r
IeN
Hnt

"country_name": "United States",


.t CE
.Lo

"continent_code": "NA"
nHd
ar

},
Le

"originalUrl": "/blog/2011/08/05/0.17.4-released.html",
"level": "info"
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !13


distributing without written permission is strictly prohibited
What do we want from our log data?
• To be able to answer questions about web traffic:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !14


distributing without written permission is strictly prohibited
Default Mappings

Blogs Dataset Logs Dataset


"mappings": { "mappings": {
"_doc": { "doc": {
"properties": { "properties": {
"author": { "geoip": {
"type": "text", "properties": {
"fields": { "city_name": {
"keyword": { "type": "text",
"type": "keyword", "fields": {
"ignore_above": 256 "keyword": {
} "type": "keyword",
} "ignore_above": 256

De
}, }

aHr
WB
N
Dom
"category": { }

sSt
"type": "text", },
Nu
-OC
TI
"fields": { "continent_code": {
U19
2L0

"type": "text",
Sr-O

"keyword": {
ASp

"type": "keyword", "fields": {


1-I
-0N

"keyword": {
-

"ignore_above": 256
1rt9
p0o
rp-2

} "type": "keyword",
Spu
e-A

} "ignore_above": 256
0is2
p-r

}, }
r
IeN
Hnt

"publish_date": { }
.t CE
.Lo

"type": "date" },
nHd
ar

}, ...
Le

...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !15


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab Environment
Lab Environment
• Visit Strigo using the link that was shared with you, and log
in if you haven't already done so
• Click on "My Lab" on the left

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !17


distributing without written permission is strictly prohibited
Lab Environment
• Click on the gear icon next to "My Lab" and select
"Machine Info"

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !18


distributing without written permission is strictly prohibited
Lab Environment
• Copy the hostname that is shown under "Public DNS"

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !19


distributing without written permission is strictly prohibited
Lab Environment
• From here you can access lab instructions and guides
‒ You also have them in your .zip file, but it is easier to access and
use the lab instructions from here:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !20


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 0
Setup Elasticsearch Cluster
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 1

De
Elasticsearch
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

Internals
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Lucene Indexing
• Understanding Segments
• Segment Merges
• Elasticsearch Indexing
• Doc Values

De
• Caching

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !23


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lucene Indexing
How Data Gets Written
• Node level overview (detailed in Engineer1)
• Next, let's focus on the shard level...
PUT blogs/_doc/551
{
Client "title": "A History of Logstash Output Workers",
"category": "Engineering",
...
}

De
aHr
WB
N
Dom
sSt
Nu
-OC node1
TI
U19
2L0
Sr-O

hash("551") % 5 = 3
ASp
1-I
-0N
-
1rt9

P1 P2
p0o
rp-2
Spu
e-A
0is2

P3
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

P4 P0
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !25


distributing without written permission is strictly prohibited
Understanding Shards
• A shard is a single instance of Lucene
‒ each shard is a complete search engine on its own
‒ max # of documents in a shard is Integer.MAX_VALUE-128
‒ clients do not refer to shards directly (use the index instead)

shard

De
aHr
node1

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0

P1 P2
Sr-O

What’s in a shard?
ASp
1-I
-0N
-
1rt9

P3
p0o
rp-2
Spu
e-A
0is2

P4 P0
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !26


distributing without written permission is strictly prohibited
Lucene Indexing
buffer

PUT blogs/_doc/551
analyzed document

buffer
PUT blogs/_doc/213

De
aHr
buffer

WB
N
Dom
sSt
Nu
PUT blogs/_doc/614
-OC
full buffer
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

buffer
rp-2
Spu

Lucene flush
e-A
0is2

segment1
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !27


distributing without written permission is strictly prohibited
Lucene Indexing
buffer

PUT blogs/_doc/5117
segment1

buffer
PUT blogs/_doc/31
segment1
refresh_interval limit
(default 1 second)

De
aHr
buffer

WB
N
Dom
Lucene flush

sSt
segment1
Nu
-OC segment2
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !28


distributing without written permission is strictly prohibited
Indexing Buffer
• Documents are analyzed, ...
• then go into a memory buffer
• The indexing buffer defaults to 10% of the node heap
• and it is shared by all shards allocated to that node
• You can change the buffer size using this static setting,
which must be configured on every data node in the cluster:

De
aHr
WB
N
Dom
sSt
Nu
indices.memory.index_buffer_size: 5% -OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !29


distributing without written permission is strictly prohibited
Lucene Flush
• A Lucene flush creates new segments from the documents
in the memory buffer:
• A Lucene flush happens:
‒ when the memory buffer is full
‒ Elasticsearch flushes (Lucene commit), which we will see soon
‒ when Elasticsearch refreshes...

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19

shard0 shard1 shard2


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

segment1 segment1
p0o

segment1
rp-2

segment2 segment2
Spu

segment
e-A
0is2

segment segment2
p-r
r
IeN

segment
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !30


distributing without written permission is strictly prohibited
Elasticsearch Refresh
• New indexed documents are not searchable until a refresh
occurs
• By default, every shard is refreshed once every second
‒ defined by a dynamic index level setting named refresh_interval

PUT my_index/_settings Increase the refresh_interval in


{
index heavy scenarios to achieve
"refresh_interval": "30s"
better indexing performance

De
}

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• The Document APIs that create, update or delete


ASp
1-I
-0N

documents have an optional refresh parameter...


-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !31


distributing without written permission is strictly prohibited
The refresh Parameter
• Controls when changes made by this request are made
visible to searches:
‒ false: the default value, any changes to the document are not
visible immediately
‒ true: forces a refresh in the affected primary and replica shards
so that the changes are visible immediately
‒ wait_for: synchronous request that waits for a refresh to happen

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
PUT my_index/_doc/102/?refresh=wait_for
U19
2L0
Sr-O

{
ASp
1-I
-0N

"firstname" : "James",
-
1rt9
p0o

"lastname" : "Brown",
rp-2
Spu
e-A

"address" : "6011 Downtown Lane",


0is2
p-r

"city" : "Detroit"
r
IeN
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !32


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Segments
Understanding
Understanding Segments
• A shard == Lucene instance == collection of segments
‒ Think of a segment as an immutable “mini-index”
• Each segment is a fully independent index
‒ Each segment is composed by a bunch of files
‒ Segment files are written to disk
‒ Segment files are never updated (read only)

De
aHr
WB
N
Dom
sSt
• A search request is distributed to the shards and on each TI
U19
Nu
-OC
2L0

shard it is performed sequentially over the segments


Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !34


distributing without written permission is strictly prohibited
Understanding Segments
• A shard == Lucene instance == collection of segments
‒ A segment is a package of many different data structures
representing an inverted index for each field

shard3

segment_0
segment_1

De
aHr
27

WB
N
Dom
13 segment_2
sSt
Nu
-OC
TI
U19
2L0

7
Sr-O

25 31
ASp

19
1-I
-0N

segments contain the


-
1rt9
p0o

inverted indices of
rp-2

37 85
Spu
e-A

49 multiple documents
0is2
p-r
r
IeN
Hnt
.t CE

67
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !35


distributing without written permission is strictly prohibited
Understanding Segments
• To understand the internals of segments, let’s see what the
following two documents look like in a segment
‒ we will assume their ID’s route to the same shard and that they
also appear in the same segment

PUT my_index/_doc/27
{
"author": "Uri",
"category": "Releases",

De
"title": "Elastic Cloud Enterprise Beta"

aHr
WB
N
Dom
}

sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

PUT my_index/_doc/14
1-I
-0N
-
1rt9

{
p0o
rp-2

"author": "Rasmus",
Spu
e-A
0is2

"category": “Releases",
p-r
r
IeN

"title": "Elastic APM enters beta-1.1"


Hnt
.t CE
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !36


distributing without written permission is strictly prohibited
Term Dictionary and Frequency Data

segment_0

term dictionary and frequency


author title
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27
cloud 1 27
category elastic 2 14,27
enterprise 1 27

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O

a dictionary of all terms in


ASp
1-I
-0N

all indexed fields, along


-
1rt9
p0o

with the number of docs


rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !37


distributing without written permission is strictly prohibited
Field Names

segment_0

term dictionary and frequency


author title
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27
cloud 1 27
category elastic 2 14,27
enterprise 1 27

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
-OC
TI
U19
2L0

field names
Sr-O
ASp

set of field names used in


1-I
-0N

the index
-
1rt9

author
p0o
rp-2

category
Spu
e-A
0is2

title
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !38


distributing without written permission is strictly prohibited
Term Proximity

segment_0

term dictionary and frequency


author title
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27
cloud 1 27
category elastic 2 14,27
enterprise 1 27

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
TI
U19
-OC the position that the term
occurs in each document
2L0

field names title term proximity


Sr-O
ASp
1-I
-0N

1 (14: 4,5)
-
1rt9

author
p0o

apm (14: 1)
rp-2

category
Spu

beta (14: 3) (27: 3)


e-A
0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0)


Hnt
.t CE

enterprise (27: 2)
.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !39


distributing without written permission is strictly prohibited
Deleted Documents

segment_0

term dictionary and frequency deleted documents


author title
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27
cloud 1 27
category elastic 2 14,27 an optional file that
enterprise 1 27 indicates which

De
aHr
enters 1 14

WB
releases 2 14,27 documents are deleted

N
Dom
sSt
Nu
-OC
TI
U19
2L0

field names title term proximity


Sr-O
ASp
1-I
-0N

1 (14: 4,5)
-
1rt9

author
p0o

apm (14: 1)
rp-2

category
Spu

beta (14: 3) (27: 3)


e-A
0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0)


Hnt
.t CE

enterprise (27: 2)
.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !40


distributing without written permission is strictly prohibited
Stored Field Values

segment_0

term dictionary and frequency deleted documents


author title
stored field values
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27
cloud 1 27
category elastic 2 14,27
enterprise 1 27

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
-OC
TI
_source is stored here
U19
2L0

field names title term proximity


Sr-O
ASp
1-I
-0N

1 (14: 4,5)
-
1rt9

author
p0o

apm (14: 1)
rp-2

category
Spu

beta (14: 3) (27: 3)


e-A
0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0)


Hnt
.t CE

enterprise (27: 2)
.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !41


distributing without written permission is strictly prohibited
BKD Trees

segment_0

term dictionary and frequency deleted documents


author title
stored field values
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27 BKD trees
cloud 1 27
category elastic 2 14,27
enterprise 1 27

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
-OC
TI
single- and multi-
U19
2L0

field names title term proximity


Sr-O

dimensional numerics
ASp
1-I
-0N

1 (14: 4,5)
-
1rt9

author
p0o

apm (14: 1)
rp-2

category
Spu

beta (14: 3) (27: 3)


e-A
0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0)


Hnt
.t CE

enterprise (27: 2)
.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !42


distributing without written permission is strictly prohibited
Normalization Factors

segment_0

term dictionary and frequency deleted documents


author title
stored field values
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27 BKD trees
cloud 1 27
category elastic 2 14,27
enterprise 1 27 normalization factors

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
-OC
TI
U19
2L0

field names title term proximity


Sr-O
ASp
1-I
-0N

1 (14: 4,5) For each field in each


-
1rt9

author
p0o

apm (14: 1) document, a value is


rp-2

category
Spu

beta (14: 3) (27: 3)


e-A

stored that is multiplied


0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0) into the score for hits on


Hnt
.t CE

enterprise (27: 2) that field


.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !43


distributing without written permission is strictly prohibited
Doc Values

segment_0

term dictionary and frequency deleted documents


author title
stored field values
rasmus 1 14 1 1 14
uri 1 27 apm 1 14
beta 2 14,27 BKD trees
cloud 1 27
category elastic 2 14,27
enterprise 1 27 normalization factors

De
aHr
enters 1 14

WB
releases 2 14,27

N
Dom
sSt
Nu
TI
-OC doc_values
U19
2L0

field names title term proximity


Sr-O
ASp
1-I
-0N

1 (14: 4,5)
-
1rt9

author
p0o

apm (14: 1)
rp-2

category
Spu

beta (14: 3) (27: 3)


e-A

Used for sorting and other


0is2

title cloud (27: 1)


p-r
r
IeN

elastic (14: 0) (27: 0) operations that require an


Hnt
.t CE

enterprise (27: 2) index to be “uninverted”


.Lo
nHd

enters (14: 2)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !44


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Segment Merges
Too Many Segments
• One segment per second can be too much
How can a
• Queries run sequentially on all segments search request be
fast?
• Deletes are "soft" (segments are read only)
• Updates also do "soft" deletes
How do we
get rid of deleted
documents?

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19

shard0
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

segment1 segment1 segment1 segment1


p0o
rp-2

segment2 segment2 segment2 segment2


Spu
e-A

segment segment segment segment


0is2
p-r

segment segment segment segment


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !46


distributing without written permission is strictly prohibited
Segments Can Be Merged
• Segments are periodically merged into larger segments
‒ this is an automatic task on the shard level
• Keeps the index size and number of segments manageable
• Deleted documents also get expunged during a merge
shard0

segment1 segment1 segment1

De
aHr
segment2 segment2 segment2

WB
N
Dom
segment segment segment
sSt
Nu
segment segment TI
-OC segment
U19
2L0
Sr-O
ASp

merge happens
1-I
-0N
-
1rt9

shard0
p0o
rp-2
Spu
e-A
0is2

segment1
p-r

merged segment new


r
IeN
Hnt
.t CE

segment
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !47


distributing without written permission is strictly prohibited
Merging with Indexing Only
• http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !48


distributing without written permission is strictly prohibited
Merging with Updates
• http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !49


distributing without written permission is strictly prohibited
The Force Merge API
• In general, Elasticsearch and Lucene do a good job of
deciding when to merge segments
• However, you can force an index to merge its segments:
POST blogs/_forcemerge

• Keep in mind you should rarely worry about invoking


_forcemerge manually

De
aHr
WB
• If you use it, make sure to only use _forcemerge on indices
N
Dom
sSt
that will never have write operations executed in the future Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !50


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Elasticsearch Indexing
Index -> Shard -> Segments
• An index consists of one or more shards
• A shard consists of segments

node1 node2
my_index

shard shard shard shard shard

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

shard
p0o
rp-2
Spu
e-A

What happens to a new


0is2
p-r
r
IeN

segment segment segment if the node fails?


Hnt

segment segment
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !52


distributing without written permission is strictly prohibited
Elasticsearch Indexing
• Segments are fsynced to disk during a Lucene commit:
‒ a relatively heavy operation
‒ should not be performed after every index or delete operation
• Until a commit, segment data is susceptible to loss
• To prevent this data loss, Elasticsearch implements a
transaction log for each shard...

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !53


distributing without written permission is strictly prohibited
The Transaction Log
buffer
existing segments have
not been fsynced
PUT blogs/_doc/976
transaction log segment1
segment2

buffer

PUT blogs/_doc/801
segment1

De
aHr
WB
transaction log
segment2

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

buffer
p0o
rp-2
Spu
e-A

Lucene commit
0is2

segment1
p-r

Elasticsearch flush
r
IeN
Hnt

segment2
.t CE
.Lo

transaction log segment3


nHd
ar
Le

now all segments have


been fsynced

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !54


distributing without written permission is strictly prohibited
Elasticsearch Indexing
• Any write operation is written to the transaction log
• By default, the translog is committed to disk
‒ after each write request (index/reindex/delete operations)
‒ at the end of a bulk request
• In the event of a crash or hardware failure
‒ during a shard recovery, recent transactions can be replayed

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !55


distributing without written permission is strictly prohibited
The Flush API
• You can force an Elasticsearch flush:
POST my_index/_flush

• Do you need to call _flush?


‒ Typically, no!
‒ Flush happens automatically depending on how many operations

De
aHr
WB
N
Dom
get added to the transaction log, how big they are, and when the
sSt
Nu
-OC
last flush happened
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !56


distributing without written permission is strictly prohibited
Synced Flush
• What is a synced flush?
‒ A synced flush performs a normal flush, then adds a generated
unique marker (sync_id) to all shards
‒ sync_id provides a quick way to check if two shards are identical

POST my_index/_flush/synced

De
aHr
WB
N
Dom
node1 node2 node3
sSt
Nu
-OC
TI
U19
2L0
Sr-O

P0 R0 R0
ASp
1-I
-0N
-
1rt9

sync_id = sync_id = sync_id =


p0o
rp-2

AU2VU0meX AU2VU0meX AU2VU0meX


Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

This primary shard and its


ar
Le

two replicas are in sync

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !57


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Doc Values
Why do I need to care about doc values?
• Let’s try something simple, like sorting blog posts by author
name:

GET blogs/_search
{
"query": {
"match": {
"content": "new releases"
}
},

De
aHr
WB
"sort": {

N
Dom
sSt
"author": {
Nu
-OC
TI
"order": "asc"
U19
2L0
Sr-O

}
ASp
1-I

}
-0N
-

This simple-looking query


1rt9

}
p0o
rp-2

actually fails. Why?


Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !59


distributing without written permission is strictly prohibited
The Error Mentions Fielddata
• Here is the error message from the previous query
‒ We are being told to enable fielddata, but also we are being
warned that it is expensive and potentially hazardous

{
"error": {
"root_cause": [
{

De
aHr
"type": "illegal_argument_exception",

WB
N
Dom
"reason": "Fielddata is disabled on text fields by

sSt
Nu
default. Set fielddata=true on [author] in order to load -OC
TI
U19

fielddata in memory by uninverting the inverted index. Note


2L0
Sr-O
ASp

that this can however use significant memory. Alternatively


1-I
-0N

use a keyword field instead."


-
1rt9
p0o
rp-2

}
Spu
e-A

],
0is2
p-r

...
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !60


distributing without written permission is strictly prohibited
Uninverting an Inverted Index
• We are asking Elasticsearch to sort blogs by the “author”
field, which is analyzed text in an inverted index:

author

aaron
Not an ideal format
alexander for sorting
baiera
The inverted index needs

De
aHr
banon

WB
to be uninverted

N
Dom
sSt
somehow if we want to
Nu
boness -OC
TI
U19

sort by authors’ names


2L0
Sr-O
ASp

cam
1-I
-0N
-
1rt9
p0o

clint
rp-2
Spu
e-A
0is2

cohen
p-r
r
IeN
Hnt
.t CE

…and so on
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !61


distributing without written permission is strictly prohibited
Fielddata is probably not the best option
• Enabling fielddata allows Elasticsearch to uninvert the
inverted field by loading its values into memory
‒ This is done in the JVM heap “on the fly” (at _search time)
‒ Old versions of ES had fielddata enabled by default, but it was
vulnerable to causing out-of-memory issues
• Notice the error message mentions an alternative to
enabling fielddata:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
... fielddata in memory by uninverting the inverted index.
2L0
Sr-O

Note that this can however use significant memory.


ASp
1-I
-0N

Alternatively use a keyword field instead."


-
1rt9

}
p0o
rp-2
Spu

],
e-A
0is2

...
p-r

Why does “keyword” not


r
IeN
Hnt
.t CE

have the same issue as


.Lo
nHd
ar

“text”?
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !62


distributing without written permission is strictly prohibited
Doc Values
• An inverted index is great if you are searching for
documents that contain a certain term
‒ but not great when you are searching for terms that are within a
document
• Doc values are a data structure that store the values of a
document on-disk in a column-oriented fashion
‒ which makes sorting and aggregations much more efficient

De
aHr
WB
N
Dom
• Doc values are fast and awesome
sSt
Nu
-OC
TI
U19
2L0
Sr-O

‒ BUT, they do not exist for analyzed string fields


ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !63


distributing without written permission is strictly prohibited
So how do I sort by analyzed text?
• …or perform an aggregation or use it in a script?
‒ Well, you should avoid that scenario when possible
• Consider indexing the text field as a “keyword”, which is
unanalyzed text that has doc values

"author": {
"type": "text",

De
aHr
WB
"fields": {

N
Dom
sSt
"keyword": {
Nu
-OC
TI
"type": "keyword",
U19
2L0
Sr-O

"ignore_above": 256
ASp
1-I

}
-0N
-
1rt9

} In blogs, we indexed
p0o
rp-2

“author” twice: as both


Spu

}
e-A
0is2

“text” and “keyword”


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !64


distributing without written permission is strictly prohibited
Sort by a keyword Field
• Notice sorting by “author.keyword” query works fine and
as expected:

GET blogs/_search
{
"query": {
"match": {
"content": "new releases" "author": "",
}

De
aHr
}, "author": "A.J. Angus",

WB
N
Dom
"sort": {
sSt
Nu
"author.keyword": { -OC
"author": "Aaron Aldrich",
TI
U19
2L0

"order": "asc"
Sr-O
ASp

}
1-I
-0N

} "author": "Aaron Katz",


-
1rt9
p0o
rp-2

}
Spu
e-A

"author": "Aaron Mildenstein",


0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !65


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Caching
Node Query Cache
• One cache per node that is shared by all shards
‒ Uses the LRU (Least Recent Used) eviction policy
‒ It only caches queries inside a filter context
• One static node-level setting:
default is 10% of the heap
indices.queries.cache.size: "5%" size. Can also be set with
an exact value, like 512mb

De
aHr
WB
N
Dom
sSt
Nu
-OC
• Segment level cache uses bit sets…
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !67


distributing without written permission is strictly prohibited
Bitsets
• Array of bits, each position represents a document
‒ super efficient
• Only built for segments that are "big enough"

segment with 3 docs


GET blogs_csv/_search
0 1 1

De
{ 1

aHr
WB
N
Dom
"query": {

sSt
Nu
"bool": { TI
U19
-OC
"filter": {
2
2L0
Sr-O

"range": {
ASp
1-I
-0N

"publish_date": {
-
1rt9
p0o

"gte": 2017,
rp-2

3
Spu
e-A

"lte": 2018
0is2
p-r

} } } } } }
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

better-query-execution-coming-elasticsearch-2-0
frame-of-reference-and-roaring-bitmaps

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !68


distributing without written permission is strictly prohibited
Shard Request Cache
• One cache per node that is shared by all shards
• LRU (Least Recent Used) eviction policy
• Data modification invalidates the cache
• Good fit for indices that you don't write anymore
• Search requests return results almost instantly

De
• Static setting that must be configured on every node:

aHr
WB
N
Dom
sSt
Nu
-OC
TI
default is 1% of the heap
U19
2L0
Sr-O

indices.request.cache.size: "5%" size. Can also be set with


ASp
1-I
-0N

an exact value, like 256mb.


-
1rt9
p0o
rp-2
Spu
e-A
0is2

• Shard level request cache


p-r
r
IeN
Hnt
.t CE
.Lo
nHd

• The cache is enabled on every index by default


ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !69


distributing without written permission is strictly prohibited
Shard Request Cache
• By default, it only caches the results of search requests
where size=0
‒ hits.total, aggregations, and suggestions
• Use the query-string parameter to force a request caching
GET /blogs/_search?request_cache=true
{
"query": {
caches even when size > 0

De
"query_string": {

aHr
WB
N
Dom
"query": "*_source*"

sSt
Nu
} TI
U19
-OC
}
The whole JSON body is
2L0
Sr-O

}
ASp

used as the cache key


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !70


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• A shard is a single instance of Lucene that consists of segments
• A segment is a package of many different data structures
representing an inverted index for each field
• Any document write operation is written to the transaction log
after being processed by the internal Lucene index
• During indexing, a segment is created every 1 second (by default)
• Sort and Aggregation of strings should be performed on keyword

De
aHr
WB
N
Dom
fields (uses doc_values).
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• A query in a filter context can be cached by Elasticsearch to


ASp
1-I
-0N

improve performance
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !72


distributing without written permission is strictly prohibited
Quiz
1. True or False: Increasing the refresh_interval value (ex:
30s) is a good practice before indexing a lot of documents.
2. Name three things that a segment stores?
3. True or False: You should occasionally invoke _forcemerge
if you are indexing a lot of documents continuously.
4. True or False: After an index operation documents might not
be searchable up to 1 second.

De
aHr
WB
N
Dom
sSt
5. What happens if you use ?refresh=wait_for in an index TI
U19
Nu
-OC
2L0

request. Explain one use case that benefits from it?


Sr-O
ASp
1-I
-0N
-
1rt9

6. True or False: Every write operation is recorded in the


p0o
rp-2
Spu
e-A
0is2

translog and the translog is fsynced to disk.


p-r
r
IeN
Hnt
.t CE
.Lo
nHd

7. When should you run a synced flush?


ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !73


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 1
Elasticsearch Internals
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 2

De
Field Modeling
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• The Need for Modeling
• Modeling Granular Fields
• Modeling Ranges
• Mapping Parameters
• Dynamic Templates

De
• Controlling Dynamic Fields

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !76


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The Need for Modeling
The Blogs Mapping

"mappings": {
"_doc": {
"properties": {
"author": {
"type": "text", Most of the fields are the
"fields": {
"keyword": { default “text” and “keyword”,
"type": "keyword", which does not make sense for
"ignore_above": 256 some fields
}
}
},
"category": {

De
aHr
"type": "text",

WB
N
Dom
"fields": {

sSt
Nu
"keyword": { -OC
TI
U19

"type": "keyword",
2L0
Sr-O

"ignore_above": 256
ASp
1-I
-0N

}
-
1rt9

}
p0o
rp-2

},
Spu
e-A
0is2

"publish_date": {
p-r
r
IeN

"type": "date"
Hnt
.t CE

},
.Lo
nHd

...
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !78


distributing without written permission is strictly prohibited
The Logs Mapping
"mappings": {
"doc": {
"properties": {
"@timestamp": {
"type": "date"
The @timestamp field is
}, a date, which is great…
"geoip": {
"properties": {
"city_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256

De
aHr
}

WB
N
Dom
}

sSt
Nu
}, -OC
TI
U19

"continent_code": {
2L0
Sr-O

"type": "text",
ASp

…but most of the other fields


1-I
-0N

"fields": {
-
1rt9

"keyword": { are text/keyword


p0o
rp-2

"type": "keyword",
Spu
e-A
0is2

"ignore_above": 256
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd

},
ar
Le

...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !79


distributing without written permission is strictly prohibited
Some Fixes are Easy

Choose a more appropriate data type:

"status_code": {
"status_code": 200 "type": "short"
}

De
aHr
WB
N
Dom
sSt
Nu
Clean-up some of the string fields: -OC
TI
U19
2L0
Sr-O
ASp
1-I

"code": {
-0N

"language": {
-
1rt9

"code": "fr-fr" "type": "keyword"


p0o
rp-2
Spu

} }
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !80


distributing without written permission is strictly prohibited
Some Fixes Require More Design
• For example, the “locale” field in the blogs dataset is a
comma-separated list of values
‒ Lists are easier to search if they are indexed as an array:

"locales": "de-de,fr-fr" "locales": {


"type": "keyword"
}

"locales": ["de-de","fr-fr"]

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• The “user-agent” field in the logging dataset is difficult to


ASp
1-I
-0N
-
1rt9

search in its current format:


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN

"user_agent": "Mozilla/5.0
Hnt
.t CE

(Windows NT 10.0; Win64; x64)


.Lo
nHd

?
ar

AppleWebKit/537.36 (KHTML,
Le

like Gecko) Chrome/


64.0.3282.186 Safari/537.36"

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !81


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Fields
Modeling Granular
Use Case for Granular Fields
• Suppose we include an Elastic Stack version to each blog
post:

"version": "6.2.1"

• And we want to add a facet so users can filter blogs by a


desired version of the Elastic Stack

De
aHr
WB
‒ Searching an exact version like 6.2.1 would be easy enough

N
Dom
sSt
Nu
-OC
TI
U19

‒ But how would you query the “version” field for “5.4” or “6.x”?
2L0
Sr-O
ASp
1-I
-0N

• We could use some complicated regular expressions


-
1rt9
p0o
rp-2
Spu
e-A
0is2

‒ or we could map and index the field better!


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !83


distributing without written permission is strictly prohibited
Modeling Granular Fields
• Adding a level of granularity can make it easier to answer
questions about a field:
PUT blogs
{
"mappings": {
"_doc": {
"properties": {
"version": {
"properties": {
"display_name": {
"type": "keyword"
},

De
aHr
WB
"major": {

N
Dom
"type": "byte"

sSt
Nu
}, -OC
TI
U19

"minor": { PUT blogs/_doc/1


2L0
Sr-O
ASp

"type": "byte" {
1-I
-0N

}, "version": {
-
1rt9

"bugfix": {
p0o

"display_name": "6.2.1",
rp-2
Spu

"type": "byte"
e-A

"major": 6,
0is2

}
p-r

"minor": 2,
r
IeN

}
Hnt
.t CE

} "bugfix": 1
.Lo
nHd

... }
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !84


distributing without written permission is strictly prohibited
Searching Granular Fields
• Now we can search a version number easily at any desired
level:
“I’m searching
for blogs about
GET blogs/_search version 5.4”
{
"query": {
"bool": {
"filter": [
{
"match": {

De
"version.major": 5

aHr
WB
N
Dom
}

sSt
Nu
}, TI
U19
-OC
{
2L0
Sr-O

"match": {
ASp
1-I
-0N

"version.minor": 4
-
1rt9
p0o

}
rp-2
Spu
e-A

}
0is2
p-r

]
r
IeN
Hnt

}
.t CE
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !85


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Modeling Ranges
Use Case for Ranges
• The blog posts currently have a publish_date field
• Suppose we want to add an end_date as well:
"publish_date": "2017-11-10",
"end_date": "2018-11-10"

• This granular approach above would work fine


‒ but there is a datatype (introduced in Elasticsearch 5.2) that is a

De
aHr
WB
N
Dom
nice solution for ranges like this…
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !87


distributing without written permission is strictly prohibited
Range Data Types
• Range data types allow you to define a field using a lower
and upper bound
• Works with different types:
‒ integer_range
‒ float_range
‒ long_range

De
aHr
WB
N
Dom
‒ double_range
sSt
Nu
-OC
TI
U19
2L0

‒ date_range
Sr-O
ASp
1-I
-0N
-
1rt9

‒ ip_range
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !88


distributing without written permission is strictly prohibited
Defining a Range Type
• Let’s test a date_range field for our blog posts:

PUT test_ranges
{
"mappings": {
"_doc": {
"properties": {
"publish_range": {
"type": "date_range"

De
aHr
WB
}

N
Dom
sSt
}
Nu
-OC
TI
}
U19
2L0
Sr-O

}
ASp
1-I

}
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !89


distributing without written permission is strictly prohibited
Indexing a Range Type
• Range data types are defined using a lower and/or upper
bound:

PUT test_ranges/_doc/1
{
"publish_range": {
"gte": "2017-11-10",
"lt": "2018-11-10"
}

De
aHr
WB
}

N
Dom
sSt
Nu
TI
U19
-OC This document defines
both an upper and lower
2L0
Sr-O
ASp

bound for
1-I
-0N
-
1rt9

“publish_range”
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !90


distributing without written permission is strictly prohibited
Querying a Range Type
• Use a range query to search a range field:

GET test_ranges/_search
{
"query": {
"range": {
"publish_range": {
"gte": "2018-04-01"
}
} "hits": {

De
aHr
} "total": 1,

WB
"max_score": 1,

N
Dom
}

sSt
"hits": [
Nu
-OC {
TI
U19

"_index": "test_ranges",
2L0
Sr-O

"_type": "_doc",
ASp
1-I

"_id": "1",
-0N
-
1rt9

"_score": 1,
p0o
rp-2

"_source": {
Spu

"publish_range": {
e-A
0is2

"gte": "2017-11-10",
p-r
r
IeN

"lt": "2018-11-10"
Hnt
.t CE

}
.Lo
nHd

}
ar
Le

}
]
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !91


distributing without written permission is strictly prohibited
The relation Parameter
intersects (default) “Give me all docs where
GET my_index/_search
the range defined in the doc {
intersects with the range "query": {
12 28 defined in query” "range": {
D "FIELD": {
"gte": 23,
"lte": 43,
"relation": "intersects"
Q 34
}
24 }
}
}

Q - Range Defined in the Query D - Range Defined in the Document

De
aHr
WB
N
Dom
within
sSt
contains
Nu
“Give me all docs where -OC
“Give me all docs where
TI
U19
2L0

the range defined in the doc


Sr-O

the range defined in the doc


ASp

contains the range defined in is within the range defined in


1-I
-0N

query”
-
1rt9

query”
p0o
rp-2
Spu
e-A
0is2

12 28 16 27
p-r
r
IeN

D D
Hnt
.t CE
.Lo
nHd
ar
Le

Q 12 Q 28
14 26
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !92
distributing without written permission is strictly prohibited
The relation Parameter (Example)
GET my_range_index/_search
{
"query": {
"range": {
"author_age_range": {
"gte": 23,
"lte": 43,
"relation": "intersects" intersects | within | contains
}
}
}

De
}

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

_id blog_title author_age_r query_age_range relation hits


1-I
-0N

ange
-
1rt9

1 “Hello, Kibana” 15 to 24
p0o
rp-2

23 to 43 intersects 1,2,3
Spu
e-A

2 “Where is my log?” 25 to 34
0is2

25 to 45 within 2,3
p-r
r
IeN

3 “Ingestion problems?” 35 to 44
Hnt
.t CE

4 “Aggregate this!” 45 to 54 35 to 40 contains 3


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !93


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Mapping Parameters
Mapping Parameters
• There are various parameters that can be applied to fields
when defining your mappings:

"mappings": {
"doc": {
"properties": {
"originalUrl": {
"type": "text",
"analyzer": "my_url_analyzer" An example of a
} mapping parameter
}

De
aHr
WB
...

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

• We will discuss a few mapping parameters now. See the


1-I
-0N
-
1rt9

docs for a complete list:


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN

‒ https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !95


distributing without written permission is strictly prohibited
Not Indexing a Field
• Suppose we know that we will not run any queries on the
http_version field of our access logs
‒ it is possible to not index a specific field
‒ the field is still returned in the _source and can be used in
aggregations, but the field will not be queryable

"mappings": {

De
aHr
WB
"doc": {

N
Dom
sSt
"properties": {
Nu
-OC
TI
"http_version": {
U19
2L0
Sr-O

"type": "keyword",
ASp
1-I
-0N

"index": false
-
1rt9

}
p0o
rp-2
Spu

...
e-A
0is2
p-r
r
IeN

“http_version” will
Hnt
.t CE
.Lo

not be indexed
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !96


distributing without written permission is strictly prohibited
Disabling Doc Values
• Or maybe we will not run any aggregations on the
http_version field of our access logs
‒ it is also possible to not have doc_values to a specific field
‒ the field is still returned in the _source and can be used in
queries, but the field cannot be used in aggregations

"mappings": {

De
aHr
WB
"doc": {

N
Dom
sSt
"properties": {
Nu
-OC
TI
"http_version": {
U19
2L0
Sr-O

"type": "keyword",
ASp
1-I
-0N

"doc_values": false
-
1rt9

}
p0o
rp-2
Spu

...
e-A
0is2
p-r
r
IeN

“http_version” will
Hnt
.t CE
.Lo

not have doc_values


nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !97


distributing without written permission is strictly prohibited
Disabling a Field
• When a field is not indexed, its value is still stored and
available for aggregations
• Another option is to completely disable a field:
‒ you cannot query or aggregate this field
‒ but this field is still be returned in the _source

De
aHr
WB
N
Dom
sSt
Nu
PUT my_logs/doc/_mapping
-OC
TI
U19
{
2L0
Sr-O

"properties": { “url” will not be


ASp
1-I
-0N

"url": {
indexed nor stored in
-
1rt9

"enabled": false
p0o

doc_values. It will still


rp-2
Spu

}
e-A

be stored in _source
0is2

}
p-r
r
IeN
Hnt

}
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !98


distributing without written permission is strictly prohibited
Disabling an Object
• Setting “enable” to false is useful when you want to skip
the indexing of an entire JSON object in your document
‒ suppose you will never perform any queries or aggregations on
the fields of the “language” object in the access logs:
"mappings": {
"doc": {
"properties": {
"language": {
"enabled": false

De
aHr
}

WB
N
Dom
...

sSt
Nu
-OC
TI
U19

{
2L0
Sr-O
ASp

"@timestamp": "2017-05-19T00:47:44.633Z",
1-I
-0N

"user_agent": "Amazon CloudFront",


-
1rt9
p0o

"language": {
rp-2

The “language” object


Spu
e-A

"code": "en-us",
0is2

is disabled
p-r

"url": "/blog/category/releases"
r
IeN
Hnt
.t CE

},
.Lo
nHd

"runtime": "454ms"
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !99


distributing without written permission is strictly prohibited
Disabling _all
• _all is a special catch-all field which concatenates the
values of all of the other fields into one big string
‒ the _all field is being removed in future versions of Elasticsearch
‒ unable to use it for new indices in Elasticsearch 6
• You can disable it to save disk space if your application is
not using _all

De
aHr
WB
N
Dom
PUT blogs
sSt
Nu
{ -OC
TI
U19
2L0

"mappings": { Only relevant for


Sr-O
ASp

"_doc": { Elasticsearch 5.x and


1-I
-0N
-
1rt9

"_all": { earlier
p0o
rp-2
Spu

"enabled": false
e-A
0is2
p-r

},
r
IeN
Hnt
.t CE

"properties": {
.Lo
nHd

...
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !100


distributing without written permission is strictly prohibited
Use Case for copy_to
• The access logs dataset has several fields representing the
location of the event:

"region_name": "Victoria",
"country_name": "Australia",
"city_name": "Surrey Hills"

• Suppose we want to frequently search all three of these

De
aHr
fields:

WB
N
Dom
sSt
Nu
-OC
TI
U19

‒ we could run a bool query with must or should clauses,


2L0
Sr-O
ASp
1-I
-0N

‒ or we could copy all three values to a single field during indexing


-
1rt9
p0o
rp-2

using copy_to…
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !101


distributing without written permission is strictly prohibited
The copy_to Parameter

"mappings": {
"_doc": {
"properties": {
"region_name": {
"type": "keyword",
"copy_to": "locations_combined"
},
"country_name": {
"type": "keyword", During indexing, the values
"copy_to": "locations_combined" will be copied to the

De
aHr
},
“locations_combined” field

WB
N
Dom
"city_name": {

sSt
Nu
"type": "keyword", TI
U19
-OC
"copy_to": "locations_combined"
2L0
Sr-O

},
ASp
1-I
-0N

"locations_combined": {
-
1rt9

"type": "text"
p0o
rp-2

}
Spu
e-A
0is2

...
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !102


distributing without written permission is strictly prohibited
The copy_to Parameter
• The locations_combined field is not a part of _source, but
it is indexed
‒ so you can query on it:
I am searching for
events in “Victoria
GET weblogs/_search Australia”
{
"query": {
"match": {
"locations_combined": "victoria australia"

De
aHr
WB
}

N
Dom
sSt
}
Nu
-OC
TI
U19
}
2L0
Sr-O

"hits": [
ASp

{
1-I
-0N
-

"_index": "weblogs",
1rt9
p0o
rp-2

"_type": "_doc",
Spu
e-A

"_id": "1",
0is2

"_score": 0.5753642,
p-r
r
IeN

"_source": {
Hnt
.t CE

"region_name": "Victoria",
.Lo
nHd

"country_name": "Australia",
ar
Le

"city_name": "Surrey Hills"


}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !103


distributing without written permission is strictly prohibited
Null Values
• When a field is set to null, it is treated as though that field
has no value:
PUT ratings/_doc/1
{
"rating": null
}

PUT ratings/_doc/2
{
"rating": 5.0
}

De
aHr
WB
N
Dom
sSt
Nu
GET ratings/_search?size=0 -OC
TI
U19
2L0

{
Sr-O
ASp

"aggs": {
1-I
-0N

"average_rating": {
-
1rt9
p0o
rp-2

"avg": {
Spu
e-A

"field": "rating"
0is2
p-r
r
IeN

}
Hnt

"aggregations": {
.t CE

}
.Lo
nHd

"average_rating": {
ar

}
Le

"value": 5
}
}
}
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !104
distributing without written permission is strictly prohibited
Specifying a Default Value for nulls
• Use the null_value parameter to assign a value to a field if
it is null
‒ The _source is not altered, but the value of null_value is
indexed

PUT ratings
{
"mappings": {
"_doc": {

De
aHr
WB
"properties": {

N
Dom
sSt
"rating": {
Nu
"type": "float", -OC
TI
U19
2L0
Sr-O

"null_value": 1.0
ASp

}
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A

If “rating” is null, then


0is2

}
p-r

1.0 will be indexed for


r
IeN

}
Hnt
.t CE

that field
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !105


distributing without written permission is strictly prohibited
Specifying a Default Value for nulls
• Notice the average changes now:
PUT ratings/_doc/1
{
"rating": null
}

PUT ratings/_doc/2
{
"rating": 5.0
}

De
aHr
WB
N
Dom
GET ratings/_search?size=0

sSt
Nu
{ -OC
TI
U19

"aggs": {
2L0
Sr-O
ASp

"average_rating": {
1-I
-0N

"avg": {
-
1rt9
p0o
rp-2

"field": "rating"
Spu
e-A

}
0is2

"aggregations": {
p-r
r
IeN

}
Hnt

"average_rating": {
.t CE

}
.Lo

"value": 3
nHd
ar

}
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !106


distributing without written permission is strictly prohibited
Coercing Data
• By default, Elasticsearch attempts to coerce data to match
the data type of the field
‒ For example, suppose the “rating” field is a “long”:

PUT ratings/_doc/1
{
"rating": 4 All three PUT
} commands work fine
PUT ratings/_doc/2

De
aHr
WB
{

N
Dom
sSt
"rating": "3"
Nu
-OC
TI
U19
}
2L0
Sr-O
ASp
1-I
-0N

PUT ratings/_doc/3
-
1rt9

{
p0o
rp-2
Spu

"rating": 4.5
e-A
0is2

}
p-r
r
IeN

A “sum” aggregation on
Hnt
.t CE
.Lo

“rating” returns “11”


nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !107


distributing without written permission is strictly prohibited
Disabling Coercion
• You can disable coercion if you do not want Elasticsearch
to try and clean up your dirty fields:
"mappings": {
"_doc": {
"properties": {
"rating": {
"type": "long",
"coerce": false Set the “coerce” parameter
} to false

De
aHr
WB
PUT ratings/_doc/1

N
Dom
sSt
{
Nu
-OC
"rating": 4 Works fine
TI
U19
2L0

}
Sr-O
ASp

PUT ratings/_doc/2
1-I
-0N
-
1rt9

{
p0o
rp-2

"rating": "3" Fails


Spu
e-A
0is2

}
p-r
r
IeN

PUT ratings/_doc/3
Hnt
.t CE

{
.Lo
nHd
ar

"rating": 4.5 Fails


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !108


distributing without written permission is strictly prohibited
The _meta Field
• Using the “_meta” field, you can put any custom metadata
you want in your mapping
‒ Any “_meta” data is associated with the type (not your individual
documents)

PUT blogs/_mapping/_doc
{

De
"_meta" : {

aHr
WB
N
Dom
"blog_mapping_version" : "2.1"
sSt
Nu
} -OC
TI
U19
2L0

}
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu

Store any application-specific


e-A
0is2

JSON here…
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !109


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Dynamic Templates
Use Case for Dynamic Templates
• Suppose you have documents with a large number of
fields,
• or documents with dynamic field names not known at the
time of your mapping definition
‒ and nested key/value pairs are not a good solution for your use
case
• Using dynamic templates, you can define a field’s

De
aHr
WB
mapping based on:
N
Dom
sSt
Nu
-OC
TI
U19

‒ the field’s datatype,


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

‒ the name of the field, or


p0o
rp-2
Spu
e-A
0is2
p-r

‒ the path to the field


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !111


distributing without written permission is strictly prohibited
Defining a Dynamic Template
• Suppose you want any unmapped string fields to be
mapped as type “keyword” by default:

PUT test2
{
"mappings": {
"_doc": {
"dynamic_templates": [
{
"my_string_fields": {

De
aHr
WB
"match_mapping_type": "string",

N
Dom
sSt
"mapping": {
Nu
-OC
TI
"type": "keyword"
U19
2L0
Sr-O

}
If the field is a string, map
ASp

}
1-I
-0N

it as “keyword”
-
1rt9

}
p0o
rp-2

]
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE
.Lo

}
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !112


distributing without written permission is strictly prohibited
Test Our Template

POST test2/_doc
{
"blog_reaction": ":thumbsup:"
}

GET test2/_mapping

De
aHr
"properties": {

WB
N
Dom
"blog_reaction": {
sSt
Nu
-OC "type": "keyword"
TI
U19
2L0

}
Sr-O
ASp

}
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !113


distributing without written permission is strictly prohibited
Matching Field Names
• Use the “match” parameter to match a field name to a
mapping:

PUT test2/_doc/_mapping
{
"dynamic_templates": [
{
"my_float_fields": { If an unmapped field name
"match": "f_*", starts with “f_”, then it will be

De
aHr
WB
"mapping": { mapped as a float

N
Dom
sSt
"type": "float"
Nu
} -OC
TI
U19
2L0

}
Sr-O
ASp

}
1-I
-0N
-
1rt9

]
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !114


distributing without written permission is strictly prohibited
Matching Field Names

POST test2/_doc/
{
"f_avg_response_time": "34.8"
}

"properties": {

De
aHr
WB
"blog_reaction": {

N
Dom
"type": "keyword"
sSt
Nu
-OC },
TI
U19
2L0

"f_avg_response_time": {
Sr-O
ASp

"type": "float"
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !115


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Fields
Controlling Dynamic
Strict Mappings
• By default, if a document is indexed with an unexpected
field, the mapping is dynamically modified
• In production, you will likely define your mappings prior to
any indexing
‒ and you probably do not want your mappings to change

De
aHr
POST blogs/_doc/

WB
N
Dom
{
sSt
Nu
"some_new_field": "This is quite unexpected" -OC
TI
U19
2L0

}
Sr-O
ASp

I will simply add


1-I
-0N
-
1rt9

this field to your


p0o
rp-2
Spu

mapping.
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !117


distributing without written permission is strictly prohibited
Controlling the dynamic feature…
• You can control the effect of new fields added to a mapping
using the “dynamic” property (three options):
doc indexed? fields indexed? mapping updated?
“true” ✔ ✔ ✔
“false” ✔ 𝙓 𝙓
“strict” 𝙓

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

PUT blogs/_doc/_mapping
-0N

Do not index a document


-
1rt9

{
p0o

that contains fields not


rp-2
Spu
e-A

"dynamic": "strict"
already defined in the
0is2
p-r

}
r
IeN

mapping
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !118


distributing without written permission is strictly prohibited
If dynamic is set to “strict”…
• …then indexing a document with undefined fields generates
an error:

POST blogs/_doc/
{
"some_other_field": "This wont't work"
}

{
"error": {

De
"root_cause": [

aHr
WB
{

N
Dom
sSt
"type": "strict_dynamic_mapping_exception",
Nu
-OC
"reason": "mapping set to strict, dynamic
TI
U19
2L0

introduction of [some_other_field] within [_doc] is not


Sr-O
ASp

allowed"
1-I
-0N

}
-
1rt9
p0o

],
rp-2
Spu

"type": "strict_dynamic_mapping_exception",
e-A
0is2

"reason": "mapping set to strict, dynamic introduction


p-r
r
IeN
Hnt

of [some_other_field] within [_doc] is not allowed"


.t CE

},
.Lo
nHd
ar

"status": 400
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !119


distributing without written permission is strictly prohibited
Adding Fields to a Mapping
• If “dynamic” is “strict”, you can still add a field by modifying
the mapping directly
‒ you just can’t add a field to the mapping dynamically

PUT blogs/_doc/_mapping
{
"properties": {
"some_other_field": { The POST command on the

De
"type": "text" previous slide will work now

aHr
WB
N
Dom
}

sSt
Nu
} TI
U19
-OC
}
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

Add a new field named


p0o
rp-2
Spu
e-A

“some_other_field” to the
0is2
p-r

mapping
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !120


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Adding a level of granularity can make it easier to answer
questions about a field
• Range data types allow you to define a field using a lower and
upper bound
• Setting “index” to false disables indexing, but its value is still
stored and available for aggregations
• Setting “enabled” to false completely disables a field so that it is
not indexed or available for searches or aggs

De
aHr
WB
N
Dom
sSt
• Use the “null_value” parameter to assign a value to a field if it is Nu
-OC
TI
U19
2L0

null
Sr-O
ASp
1-I
-0N
-
1rt9

• Using dynamic templates, you can define a field’s mapping based


p0o
rp-2
Spu
e-A

on its name or datatype


0is2
p-r
r
IeN
Hnt
.t CE

• You can control the effect of new fields added to a mapping using
.Lo
nHd
ar
Le

the “dynamic” property

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !122


distributing without written permission is strictly prohibited
Quiz
1. How might the following field be modeled more effectively
for searching and aggregating dev environments?
"dev_environment": "Emacs; Notepad++; PyCharm"

2. How might the following field be modeled more effectively?


"interview_likelihood": "90%"

3. How would you map a field that you never need to use for

De
aHr
WB
searches or aggregations?
N
Dom
sSt
Nu
-OC
TI
U19
2L0

4. How would you configure an index so that it rejects


Sr-O
ASp
1-I
-0N

documents that contain fields not defined in its mapping?


-
1rt9
p0o
rp-2
Spu
e-A
0is2

5. What is the default value of the “relation” parameter in a


p-r
r
IeN
Hnt
.t CE

range query?
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !123


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 2
Field Modeling
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 3

De
Fixing Data
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Tools for Fixing Data
• Fixing Mappings
• Reindexing Tips
• Picking Up Mapping Changes
• Fixing Fields

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !126


distributing without written permission is strictly prohibited
Fixing Data
• In the previous chapter we discussed
‒ different problems that a dataset can have
‒ best practices for field modeling
• Now we will discuss techniques to fix data issues, like:
‒ fields with incorrect types
‒ comma separated string should be an array

De
aHr
WB
N
Dom
sSt
‒ changing an existing mapping TI
U19
Nu
-OC
2L0
Sr-O
ASp

• First, let's look into tools that can help you with the task...
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !127


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Tools for Fixing Data
Overview of Painless
• Painless is a scripting language designed specifically for
use with Elasticsearch
‒ It’s fast, secure, and has a Groovy-like syntax
• Supports all of Java’s data types and a subset of the Java
API:
‒ https://www.elastic.co/guide/en/elasticsearch/reference/current/painless-api-reference.html

• Languages like Groovy, Javascript, and Python are no

De
aHr
WB
N
Dom
longer available for scripting as of Elasticsearch 6.0
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !129


distributing without written permission is strictly prohibited
Use Cases for Painless
• Accessing document fields for various reasons:
‒ update/delete a field in a document
‒ perform computations on fields before returning them
‒ customize the score of a document
‒ working with aggregations
• Ingest Node

De
aHr
WB
N
Dom
sSt
‒ execute a script within an ingest pipeline TI
U19
Nu
-OC
2L0
Sr-O
ASp

• Reindex API
1-I
-0N
-
1rt9
p0o
rp-2
Spu

‒ manipulate data as it is getting reindexed


e-A
0is2
p-r
r
IeN
Hnt
.t CE

• Lots of use cases, as long as you use it wisely!


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !130


distributing without written permission is strictly prohibited
Simple Use Case for a Script
• We will start with a simple use case of updating a field
• Be warned!
‒ Updating a field results in the entire document getting deleted
and reindexed
‒ We will discuss a better solution later in the course if you need to
update a field frequently
• Let’s add a field for the number of views of a blog post:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19

PUT my_index/_doc/1
2L0
Sr-O
ASp

{
1-I
-0N

"blog_id": "h81CKmIBCLh5xF6i7Y2f",
-
1rt9
p0o
rp-2

"num_of_views": 3
Spu
e-A

}
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

Let’s write a script to


ar
Le

increment “num_of_views”

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !131


distributing without written permission is strictly prohibited
An Inline Script
• There are two ways to run a Painless script: inline or
stored
‒ The following example demonstrates an inline script that updates
the “num_of_views” field:

Handy for short scripts - just


POST my_index/_doc/1/_update write the code inline
{

De
aHr
WB
"script": {

N
Dom
“source” is
sSt
"source": "ctx._source.num_of_views += params.new_views",
Nu
the code -OC
TI
"params": {
U19
2L0
Sr-O

"new_views": 2
ASp
1-I
-0N

}
-
1rt9

}
p0o
rp-2
Spu

}
e-A
0is2
p-r
r
IeN
Hnt

“params” is for
.t CE
.Lo
nHd

optional parameters
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !132


distributing without written permission is strictly prohibited
A Stored Script
• Scripts can be stored in the cluster state
‒ and invoked later using the script’s ID:
POST _scripts/add_new_views id of the script
{
"script": {
"lang": "painless",
"source": "ctx._source.num_of_views += params.new_views"
}
}

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
POST my_index/_doc/1/_update
U19
2L0
Sr-O

{
ASp
1-I
-0N

"script": {
-
1rt9

"id": "add_new_views",
p0o
rp-2
Spu

"params": {
e-A
0is2

"new_views": 2
p-r
r
IeN

the params are passed


Hnt

}
.t CE
.Lo

} to the script
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !133


distributing without written permission is strictly prohibited
Accessing Fields
• The syntax in Painless for accessing a field’s value depends
on the context:

Context Syntax for accessing fields

Ingest node: access fields using ctx ctx.field_name

De
aHr
WB
N
Dom
Updates: use the _source field ctx._source.field_name
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

Search and aggs: if doc values are


1-I
-0N
-
1rt9

enabled, doc is very efficient to doc['field_name'].value


p0o
rp-2
Spu

access a field
e-A
0is2

if you expect an array,


p-r
r
IeN
Hnt

use doc['field_name']
.t CE
.Lo
nHd
ar
Le

https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-contexts.html

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !134


distributing without written permission is strictly prohibited
Script Caching
• The first time Elasticsearch sees a new script, it compiles it
and stores the compiled version in a cache
‒ both inline and stored scripts are stored in the cache
• A new script can evict a cached script
‒ the default size of the cache is 100 scripts
‒ configurable using script.cache.max_size

De
aHr
WB
N
Dom
‒ or set a timeout using script.cache.expire
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !135


distributing without written permission is strictly prohibited
Compilation can be Expensive
• If you compile too many unique scripts within a small
amount of time, Elasticsearch will reject the new dynamic
scripts
‒ and throw a circuit_breaking_exception error
• By default, up to 75 compilations per 5-minute window
(“75/5m”) can be compiled
‒ configured by script.max_compilations_rate

De
aHr
WB
N
Dom
sSt
‒ may be able to avoid this issue by using parameters… TI
U19
Nu
-OC
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !136


distributing without written permission is strictly prohibited
Parameters in Scripts
• Scripts have to match exactly to take advantage of the
cache
‒ Be careful with literals in your scripts:

"script": {
"source": "ctx._source.num_of_views += 2"
}
Two different scripts, so
two compilations needed
"script": {

De
aHr
WB
"source": "ctx._source.num_of_views += 3"

N
Dom
sSt
}
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

• Use parameters instead…


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN

"script": {
Hnt
.t CE

"source": "ctx._source.num_of_views += params.new_views"


.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !137


distributing without written permission is strictly prohibited
Reindexing APIs
• We will also use the Reindex and Update By Query APIs
in this chapter to fix data
‒ The _reindex endpoint is for reindexing from one index to
another
‒ The _update_by_query endpoint is for reindexing into the same
index
• And for demonstration purposes, we will use the

De
aHr
_delete_by_query endpoint as well

WB
N
Dom
sSt
Nu
-OC
TI
U19

‒ deletes all documents that are hits for a given query


2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !138


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Fixing Mappings
Fixing Mappings
• Our blogs fields are not mapped very well
‒ In particular, the default “text”/“keyword” mapping of our string
fields needs some cleaning up:

"category": {
"type": "text",
"fields": { “category” only has 10 distinct
"keyword": { values, making it a good
"type": "keyword", candidate for “keyword” only
"ignore_above": 256

De
}

aHr
WB
N
}

Dom
sSt
},
Nu
-OC
TI
"content": { “content” is a large amount of
U19
2L0
Sr-O

"type": "text",
text, so “keyword” seems
ASp

"fields": {
1-I
-0N

unnecessary
-
1rt9

"keyword": {
p0o
rp-2

"type": "keyword",
Spu
e-A

"ignore_above": 256
0is2
p-r

}
r
IeN
Hnt
.t CE

}
.Lo
nHd

},
ar
Le

...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !140


distributing without written permission is strictly prohibited
Fixing Mappings
• You can not change the data type of a field in a mapping
‒ We will have to define a new index for our blogs with the desired
data types:
PUT blogs_fixed
{
"mappings": {
"doc": {
"properties": {
"author": {
"type": "text",
"fields": { We will index “author” as
"keyword": { both “text” and “keyword”

De
aHr
WB
"type": "keyword",

N
Dom
"ignore_above": 256

sSt
Nu
-OC
} TI
U19
}
2L0
Sr-O

},
ASp

"category": {
1-I
-0N

"type": "keyword"
-
1rt9

…but clean up the other


p0o

},
rp-2
Spu

"content": { string fields to fit our dataset


e-A
0is2

"type": "text"
and use case better
p-r
r
IeN

},
Hnt
.t CE

"locales": {
.Lo
nHd

"type": "keyword"
ar
Le

},
...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !141


distributing without written permission is strictly prohibited
Applying the New Mapping
• Now we need to copy the old blogs into our new index
‒ which can be easily accomplished using the Reindex API
• The _reindex command indexes the documents from the
“source” index into the “dest” index
‒ done in batches, with a default batch size of 1,000

POST _reindex

De
aHr
{

WB
N
Dom
"source": {
sSt
Nu
"index": "my_source_index", TI
-OC Optional “query" to
U19

specify which documents


2L0

"query": {
Sr-O
ASp

... to reindex
1-I
-0N
-

}
1rt9
p0o
rp-2

},
Spu
e-A
0is2

"dest": {
p-r
r
IeN

"index": "my_destination_index"
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !142


distributing without written permission is strictly prohibited
Let’s Reindex the Blogs
• All documents from “blogs” will be reindexed into
“blogs_fixed”:

POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {

De
aHr
"index": "blogs_fixed"

WB
N
Dom
}
sSt
Nu
} -OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !143


distributing without written permission is strictly prohibited
Test the New Mapping
• We should be able to aggregate on “category”, since it is a
“keyword” now
GET blogs_fixed/_search
{
"size": 0,
"aggs": {
"top_categories": { "aggregations": {
"terms": { "top_categories": {
"field": "category", "doc_count_error_upper_bound": 0,
"size": 5 "sum_other_doc_count": 305,

De
} "buckets": [

aHr
WB
N
Dom
} {

sSt
"key": "Engineering",
Nu
} TI
U19
-OC
} "doc_count": 440
2L0
Sr-O

},
ASp
1-I

{
-0N
-
1rt9

"key": "",
p0o
rp-2

"doc_count": 333
Spu
e-A
0is2

},
p-r
r
IeN

{
Hnt
.t CE

"key": "Releases",
.Lo
nHd
ar

"doc_count": 238
Le

},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !144


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Reindexing Tips
Dealing with Versions
• Suppose something went wrong while we were reindexing
the blogs index
‒ we do not want to have to start over!
• By default, _reindex blindly overwrites existing documents
in the destination
‒ this is the behavior of setting the “version_type” parameter to
“internal” (the default value)

De
aHr
WB
N
Dom
• Let’s discuss a few tips for dealing with document versions
sSt
Nu
-OC
TI
U19

and reindexing…
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !146


distributing without written permission is strictly prohibited
Behavior of “internal”
• When “version_type” equals “internal”, version numbers
are ignored
‒ existing documents in the destination index are overwritten

source index dest index “internal” behavior

_id: 456 _id: 456 overwritten


_version: 4 _version: 2 _version: 3

De
aHr
WB
N
Dom
sSt
overwritten
Nu
_id: 123 _id: 123 TI
U19
-OC
_version: 1 _version: 3 _version: 4
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

overwritten
rp-2

_id: abc _id: abc


Spu
e-A

_version: 2 _version: 2
0is2

_version: 3
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

created
ar

_id: 789
Le

(no doc with id “789” exists)


_version: 6 _version: 1

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !147


distributing without written permission is strictly prohibited
Behavior of “external”
• When “version_type” equals “external”, the version
number of the document in the source index is preserved
‒ a “newer” document overwrites an “older” document

source index dest index “external” behavior

_id: 456 _id: 456 overwritten


_version: 4 _version: 2 _version: 4

De
aHr
WB
N
Dom
sSt
exception
Nu
_id: 123 _id: 123 TI
U19
-OC
_version: 1 _version: 3 _version: 3
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

exception
rp-2

_id: abc _id: abc


Spu
e-A

_version: 2 _version: 2
0is2

_version: 2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

created
ar

_id: 789
Le

(no doc with id “789” exists)


_version: 6 _version: 6

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !148


distributing without written permission is strictly prohibited
“external” Version Type
Let’s run the same reindex
POST _reindex
{ again, but change
"source": { “version_type” to “external”
"index": "blogs"
},
"dest": {
"index": "blogs_fixed",
"version_type": "external"
}
}
"failures": [
{

De
"index": "blogs_fixed",

aHr
WB
"type": "doc",

N
Dom
sSt
"id": "Cc1CKmIBCLh5xF6i7Y2b",
Nu
-OC
"cause": {
TI
U19
2L0

"type": "version_conflict_engine_exception",
Sr-O
ASp

"reason": "[doc][Cc1CKmIBCLh5xF6i7Y2b]:
1-I
-0N

version conflict, current version [1] is higher or


-
1rt9
p0o

equal to the one provided [1]",


rp-2

We get 1,000 failures


Spu
e-A

"index_uuid": "HUkf-mfCQfOghnqSIi5kBQ",
0is2

because the documents "shard": "0",


p-r
r
IeN
Hnt

already exist in the "index": "blogs_fixed"


.t CE
.Lo

},
nHd

destination
ar

"status": 409
Le

},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !149


distributing without written permission is strictly prohibited
Version Conflicts
• Notice in the previous example that only 1 batch of 1,000
documents was reindexed because a failure occurred
‒ set “conflicts” equal to “proceed” to ignore conflicts
‒ keeps a running count of them instead
{
POST _reindex "took": 65,
{ "timed_out": false,
"source": { "total": 1594,
"updated": 0,

De
aHr
"index": "blogs"

WB
"created": 0,

N
Dom
},

sSt
"deleted": 0,
Nu
"dest": { -OC "batches": 2,
TI
U19

"index": "blogs_fixed",
2L0

"version_conflicts": 1594,
Sr-O
ASp

"version_type": "external" "noops": 0,


1-I
-0N

}, "retries": {
-
1rt9
p0o

"bulk": 0,
rp-2

"conflicts": "proceed"
Spu

"search": 0
e-A

}
0is2

},
p-r
r
IeN

"throttled_millis": 0,
Hnt
.t CE

"requests_per_second": -1,
.Lo
nHd
ar

"throttled_until_millis": 0,
Le

"failures": []
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !150


distributing without written permission is strictly prohibited
Updating a Document
• To better demonstrate “external”, let’s change some of the
documents in the “blogs” index
‒ we will use an _update_by_query to modify every blog whose
“category” is an empty string
• This will increase the “_version” number of 333 blogs:

POST blogs/_update_by_query
{

De
aHr
WB
"query": {

N
Dom
empty string
sSt
"match": {
Nu
-OC
TI
"category.keyword": ""
U19
2L0
Sr-O

}
ASp
1-I

},
-0N
-
1rt9

"script": {
p0o
rp-2
Spu

"source": "ctx._source.category = \"None\""


e-A
0is2

}
p-r
r
IeN
Hnt

}
.t CE
.Lo
nHd
ar
Le

333 documents are


updated

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !151


distributing without written permission is strictly prohibited
Reindex Again with “external”
• This time just the changed “blogs” are reindexed into
“blogs_fixed”

POST _reindex
{
"source": {
"index": "blogs"
},
"dest": { 333 blogs were
"index": "blogs_fixed", “updated”

De
"version_type": "external" {

aHr
WB
"took": 130,

N
Dom
},

sSt
"timed_out": false,
Nu
"conflicts": "proceed" TI
U19
-OC "total": 1594,
}
2L0

"updated": 333,
Sr-O
ASp

"created": 0,
1-I
-0N

"deleted": 0,
-
1rt9
p0o

"batches": 2,
rp-2
Spu
e-A

"version_conflicts": 1261,
0is2

"noops": 0,
p-r
r
IeN
Hnt

"retries": {
.t CE
.Lo

"bulk": 0,
nHd
ar

"search": 0
Le

},
...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !152


distributing without written permission is strictly prohibited
Using “op_type”
• We can set “op_type” to “create” if we are not worried
about version numbers
‒ only missing documents get reindexed

source index dest index “external” behavior

_id: 456 _id: 456 exception


_version: 4 _version: 2 _version: 2

De
aHr
WB
N
Dom
sSt
exception
Nu
_id: 123 _id: 123 TI
U19
-OC
_version: 1 _version: 3 _version: 3
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

exception
rp-2

_id: abc _id: abc


Spu
e-A

_version: 2 _version: 2
0is2

_version: 2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

created
ar

_id: 789
Le

(no doc with id “789” exists)


_version: 6 _version: 1

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !153


distributing without written permission is strictly prohibited
Delete By Query API
• Let’s do another demo, by first deleting some documents
‒ The _delete_by_query endpoint will delete all the documents
from an index that hit the provided query

POST blogs_fixed/_delete_by_query
{
"query": {
"range": {
"publish_date": {

De
aHr
WB
"lte": "2016"

N
Dom
sSt
}
Nu
-OC
TI
}
U19
2L0
Sr-O

}
ASp
1-I

}
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2

Deletes 812 blogs from


p-r
r
IeN
Hnt

2016 and earlier


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !154


distributing without written permission is strictly prohibited
Setting “op_type” to “create”
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fixed", 812 documents were
"op_type": "create" reindexed
},
"conflicts": "proceed"
} {
"took": 257,
"timed_out": false,

De
aHr
"total": 1594,

WB
N
Dom
"updated": 0,

sSt
Nu
"created": 812,
-OC
TI
U19
"deleted": 0,
2L0
Sr-O

"batches": 2,
ASp
1-I

"version_conflicts": 782,
-0N
-
1rt9

"noops": 0,
p0o

Notice “op_type” throws


rp-2

"retries": {
Spu
e-A

an exception if the "bulk": 0,


0is2
p-r

"search": 0
r
IeN

document already exists


Hnt
.t CE

},
in the target
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !155


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Changes
Picking Up Mapping
Adding a Multi-Field to a Mapping
• Suppose we want to add a multi-field to the mapping of the
blogs_fixed index
‒ Specifically, suppose we want to analyze the “content” field in a
new way:

PUT blogs_fixed/_mapping/_doc
{
"properties": {

De
"content": {

aHr
WB
"type": "text",

N
Dom
sSt
Nu
"fields": { TI
U19
-OC
"english": {
Add a new multi-field
2L0
Sr-O

"type": "text",
ASp

named “english” that uses


1-I
-0N

"analyzer": "english"
-
1rt9

} the english analyzer


p0o
rp-2
Spu

}
e-A
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !157


distributing without written permission is strictly prohibited
Picking Up a New Multi-Field
• The mapping has changed, but existing documents already
in the index do not have this new multi-field:

GET blogs_fixed/_search
{
"query": {
"match": {
"content.english": "performance tips"
}
}

De
aHr
WB
}

N
Dom
No hits, because no
sSt
Nu
-OC
documents have this field yet
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !158


distributing without written permission is strictly prohibited
Update By Query
• Run an _update_by_query to have existing documents
pick up the new “content.english” field:

POST blogs_fixed/_update_by_query

GET blogs_fixed/_search
{
"query": {

De
aHr
"match": {

WB
N
Dom
"content.english": "performance tips"
sSt
Nu
} -OC
TI
U19
2L0

}
Sr-O

426 hits
ASp

}
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !159


distributing without written permission is strictly prohibited
Adding Your Own Reindex Batch Field
• What happens if something fails during the reindexing?
‒ We probably do not want to start over from the beginning!
• You can add a field to each document just for a reindex job
‒ a simple numeric field that is set to 1 if a document has been
reindexed successfully:

De
aHr
WB
PUT blogs_fixed/_doc/_mapping

N
Dom
sSt
{
Nu
-OC
TI
"properties": {
U19

Add a field to our


2L0
Sr-O

"reindexBatch": {
ASp

mapping just to track the


1-I

"type": "short"
-0N
-
1rt9

} reindexing
p0o
rp-2
Spu

}
e-A
0is2

}
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !160


distributing without written permission is strictly prohibited
Updating Our Reindex Batch Field
POST blogs_fixed/_update_by_query
{
"query": {
"bool": {
"must_not": [
{
"range": {
Find only the documents
"reindexBatch": { that need updating
"gte": 1
}
}
}
]

De
aHr
}

WB
N
Dom
},
sSt
Nu
"script": { -OC
TI
U19
2L0

"source": """
Sr-O
ASp

if(ctx._source.containsKey("content")) {
1-I
-0N
-

ctx._source.content_length = ctx._source.content.length();
1rt9
p0o
rp-2

} else {
Spu
e-A
0is2

ctx._source.content_length = 0;
p-r
r
IeN

}
Hnt

Update the batch number


.t CE

ctx._source.reindexBatch=1;
.Lo
nHd
ar

"""
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !161


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Fixing Fields
Splitting a Field
• Let’s fix the “locales” field by splitting it into an array of
“keyword” values
‒ We could figure this out with a script, but there is an easier
way…

De
"locales": "de-de,fr-fr" "locales": {

aHr
WB
N
Dom
"type": "keyword"

sSt
Nu
TI
U19
-OC
}
2L0
Sr-O

"locales": ["de-de","fr-fr"]
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !163


distributing without written permission is strictly prohibited
Ingest Node
• Ingest nodes provide the ability to pre-process a document
right before it gets indexed
‒ an ingest node intercepts an index or bulk API request,
‒ applies transformations,
‒ passes the documents back to the index or bulk API

De
aHr
WB
N
Dom
{

sSt
node1
Nu
“field1” : “value1”,
“field2” : “value2”, -OC
TI
U19

“field3” : “value3”
2L0
Sr-O

}
ASp
1-I
-0N
-
1rt9

Client grok date set


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

When indexing a doc, you


ar
Le

can specify a pipeline

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !164


distributing without written permission is strictly prohibited
Dedicated Ingest Nodes
• All nodes are ingest nodes by default
‒ set node.ingest to false to disable
‒ larger clusters will likely have dedicated ingest nodes (as
discussed later in the course)

my_cluster

ingest_node1 data_node1

De
aHr
data_node2

WB
N
Dom
ingest_node2
sSt
Nu
Client TI
U19
-OC data_node3
2L0
Sr-O

data_node4
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !165


distributing without written permission is strictly prohibited
What is a Pipeline?
• A pipeline is a set of processors
‒ a processor is similar to a filter in Logstash
‒ has read and write access to documents that pass through the
pipeline

data_node1

De
aHr
ingest_node1

WB
N
Dom
sSt
Nu
{
-OC
{
“a” : “value4”, P1
TI
“field1” : “value1”,
U19

grok date set “b” : “value5”,


2L0

“field2” : “value2”,
Sr-O

“c” : “value6”,
ASp

“field3” : “value3”
“d” : “value7”
1-I
-0N

}
}
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE

pipeline = set of processors


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !166


distributing without written permission is strictly prohibited
Defining Pipelines
• Pipelines are defined using a PUT with the Ingest API
‒ stored in the cluster state

unique id for your pipeline

PUT _ingest/pipeline/my-pipeline-id
{
"description" : "DESCRIPTION",

De
aHr
"processors" : [

WB
N
Dom
{

sSt
Nu
... array of processors TI
-OC
U19

}
2L0
Sr-O
ASp

],
1-I
-0N

"on_failure" : [
-
1rt9
p0o

{
rp-2
Spu
e-A

...
0is2
p-r

}
r
IeN
Hnt
.t CE

]
optional array of processors
.Lo
nHd

}
ar

if an error occurs
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !167


distributing without written permission is strictly prohibited
Example of a Pipeline
• The following pipeline adds a field to a document using the
“set” processor

PUT _ingest/pipeline/my_pipeline
{
"processors": [
{
"set": {
"field": "number_of_views",

De
aHr
"value": 0

WB
N
Dom
}

sSt
Nu
} -OC
TI
U19

]
2L0
Sr-O

Adds “number_of_views” if
ASp

}
1-I
-0N

it does not exist, or sets it to


-
1rt9
p0o
rp-2

0 if it already does
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !168


distributing without written permission is strictly prohibited
Testing a Pipeline
• Use the _simulate endpoint to test a pipeline
POST _ingest/pipeline/my_pipeline/_simulate
{
"docs": [
{
"_source": {
"author": "Shay Banon",
"blog_title": "You know, for Search!"
}
}
]

De
aHr
}

WB
N
Dom
sSt
Nu
-OC
TI
"_source": {
U19
2L0

"number_of_views": 0,
Sr-O
ASp

"blog_title": "You know, for Search!",


1-I
-0N

"author": "Shay Banon"


-
1rt9
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !169


distributing without written permission is strictly prohibited
Using a Pipeline
• Indexing new documents

PUT my_index/_doc/1?pipeline=my_pipeline
{
"author": "Monica Sarbu",
"category": "Brewing in Beats"
}

"_source": {

De
"number_of_views": 0,

aHr
WB
"author": "Monica Sarbu”,

N
Dom
sSt
"category": "Brewing in Beats"
Nu
-OC
TI
}
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2

POST _bulk
p-r
r
IeN

{"index": {"_index": "my_index", "_type": “_doc", "_id" : "1", "pipeline": "my_pipeline"}}


Hnt
.t CE

{"author": "Monica Sarbu", "category": "Brewing in Beats"}


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !170


distributing without written permission is strictly prohibited
Using a Pipeline
POST _reindex
{
"source": {
"index": "my_index"
},
"dest": {
"index": "new_index",
"pipeline": "my_pipeline"
}
}

De
aHr
The pipeline is applied on

WB
N
Dom
sSt
every document
Nu
-OC
TI
U19
2L0

PUT test_index
Sr-O
ASp

{
1-I
-0N

"settings": {
-
1rt9
p0o
rp-2

"default_pipeline": "my_pipeline"
Spu
e-A

}
0is2
p-r
r
IeN

}
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !171


distributing without written permission is strictly prohibited
The split Processor
• The built-in split processor is perfect for splitting the
“locales” field into an array:

PUT _ingest/pipeline/blogs_pipeline
{
"processors": [
{
"split": {
"field": "locales",

De
Can be a regular

aHr
WB
"separator": ","

N
Dom
} expression as well
sSt
Nu
} -OC
TI
U19
2L0

]
Sr-O
ASp

}
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !172


distributing without written permission is strictly prohibited
Adding a Useful Field
• Suppose we want to analyze the web traffic to our blogs
based on the length of the blog
‒ using the length of the “content” field
• We do not have this value in our blogs index
‒ We could compute it at request time, but that is expensive
‒ A better solution would be to compute the length of the blog

De
once, at index time

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !173


distributing without written permission is strictly prohibited
The script Processor
• The script processor allows scripts to be executed within
ingest pipelines:
PUT _ingest/pipeline/blogs_pipeline
{
"processors": [
{
"split": {
"field": "locales",
"separator": ","

},
} “ctx” is a map, and we
{ check to see if it contains

De
aHr
"script": { a “content” field

WB
N
Dom
"source": """

sSt
Nu
if(ctx.containsKey("content")) {
-OC
TI
ctx.content_length = ctx.content.length();
U19
2L0

} else {
Sr-O
ASp

ctx.content_length = 0;
1-I
-0N

}
-
1rt9
p0o

"""
rp-2
Spu

}
e-A
0is2

}
p-r

Assigning “ctx.content_length”
r
IeN

]
Hnt
.t CE

}
adds the field to the doc if it is not
.Lo
nHd
ar

already defined
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !174


distributing without written permission is strictly prohibited
The script Processor
POST _ingest/pipeline/blogs_pipeline/_simulate
{
"docs": [
{
"_source": {
"locales": "de-de,fr-fr,ja-jp,ko-kr",
"content": "This is a test."
}
},
{
"_source": {
"locales": "en-en"

De
} "locales": [

aHr
WB
"de-de",

N
Dom
}

sSt
"fr-fr",
Nu
] "ja-jp",
TI
U19
-OC
}
2L0

"ko-kr"
Sr-O
ASp

],
1-I
-0N

"content": "This is a test.",


-
1rt9
p0o

"content_length": 15
rp-2
Spu
e-A

...
0is2

"locales": [
p-r
r
IeN
Hnt

"en-en"
.t CE
.Lo

],
nHd
ar

"content_length": 0
Le

...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !175


distributing without written permission is strictly prohibited
Fix the Blogs Index
• Let’s run our new pipeline on all of existing indexed blogs:

POST blogs_fixed/_update_by_query?pipeline=blogs_pipeline

GET blogs_fixed/_search

{
"locales": [
"de-de",

De
"fr-fr",

aHr
WB
"ja-jp",

N
Dom
sSt
"ko-kr",
Nu
"zh-chs" -OC
TI
U19

],
2L0
Sr-O

"author": "Steve Dodson",


ASp
1-I
-0N

"category": "News",
-
1rt9

"title": "Introducing Machine Learning for the Elastic Stack",


p0o
rp-2

"publish_date": "2017-05-04T06:00:00.000Z",
Spu
e-A

"seo_title": "",
0is2
p-r

"content": "...",
r
IeN
Hnt

"url": "/blog/introducing-machine-learning-for-the-elastic-stack",
.t CE
.Lo

"content_length": 5861
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !176


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Painless is a scripting language designed specifically for use
with Elasticsearch
• Painless scripts can be defined inline or stored in the cluster
• The first time Elasticsearch sees a new script, it compiles it and
stores the compiled version in a cache
• You can copy documents from one index to another using the
Reindex API

De
aHr
WB
N
Dom
• The Update By Query API allows you to reindex a collection of
sSt
Nu
-OC
TI
documents into the same index
U19
2L0
Sr-O
ASp
1-I
-0N

• Ingest nodes provide the ability to pre-process a document right


-
1rt9
p0o
rp-2

before it gets indexed


Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE

• A pipeline is a set of processors that are executed by an ingest


.Lo
nHd
ar

node when a document is indexed


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !178


distributing without written permission is strictly prohibited
Quiz
1. True or False: The Painless scripting language was developed
just for Elasticsearch.
2. True or False: Adding a field to a mapping automatically adds
that field to all the documents already indexed
3. In a _reindex request, what is the effect of setting
“version_type” to “external”?
4. How many documents are updated in the following request?

De
aHr
WB
N
Dom
POST messages/_update_by_query
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

5. What is the effect of the following pipeline?


1-I
-0N
-
1rt9
p0o
rp-2
Spu

"processors" : [
e-A
0is2

{
p-r
r
IeN

"script" : {
Hnt
.t CE

"source" : "ctx._index=ctx.clientip.country_iso_code.toLowerCase()"
.Lo
nHd
ar

}
Le

}
]

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !179


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 3
Fixing Data
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 4

De
Advanced Search
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

& Aggregations
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Searching for Patterns
• Dealing with null Values
• Scripted Searches
• Search Templates
• Aggregations

De
• Pipeline Aggregations

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !182


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Searching for Patterns
Wildcard Query
• The wildcard query is a term-based search that contains a
wildcard:
‒ * = anything
‒ ? = any single character
• wildcard can be expensive, especially if you have leading
wildcards

De
aHr
WB
N
Dom
GET blogs/_search

sSt
I'm looking for all blog
Nu
{ TI
U19
-OC
"query": { posts about 5.x releases
2L0
Sr-O

"wildcard": {
ASp
1-I
-0N

"title.keyword": {
-
1rt9
p0o

"value": "* 5.*"


rp-2
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !184


distributing without written permission is strictly prohibited
Regexp Query
• The regexp query is a term-based search that offers even
more power to match patterns
‒ Like wildcard queries, regexp queries can be very expensive
‒ Syntax is based on the Lucene regular expression engine
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax

De
aHr
WB
N
Dom
GET blogs/_search

sSt
Nu
{ I'm looking for all blog TI
-OC
U19

"query": { posts about 5.0, 5.1 or 5.2


2L0
Sr-O
ASp

"regexp" : { releases
1-I
-0N

"title.keyword": ".*5\\.[0-2]\\.[0-9].*"
-
1rt9
p0o
rp-2

}
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt
.t CE

The pattern provided


.Lo
nHd

must match the entire


ar
Le

string (always anchored)

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !185


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
null Values
Dealing with
Dealing with null Values
• Your documents will likely have missing or null fields
‒ If you want to know if a field exists and is not null, use the exists
query
• For example, the “locales” field in blogs is non-null if the
blog is written in multiple languages

“I am looking for blogs

De
aHr
written in multiple

WB
N
Dom
sSt
languages.”
Nu
-OC
TI
U19
2L0
Sr-O

GET blogs/_search
ASp
1-I
-0N

{
-
1rt9
p0o

"query": {
rp-2
Spu
e-A

"exists": {
0is2
p-r

"field": "locales"
r
IeN
Hnt

}
.t CE
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !187


distributing without written permission is strictly prohibited
Finding Missing Fields
• The opposite of exists is finding fields that are missing
‒ You can wrap exists in a must_not clause to find documents
that are missing a field

“I am looking for log


events missing the
region name.”
GET logs_server*/_search
{

De
aHr
"query": {

WB
N
Dom
"bool": {
sSt
Nu
"must_not": { -OC
TI
U19
2L0

"exists": {
Sr-O
ASp

"field": "geoip.region_name"
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !188


distributing without written permission is strictly prohibited
Using exists on Objects
• The exists query works on objects as well
‒ For example, geoip is an object in our logs indices

“Which log events


are missing the geoip
details?”
GET logs_server*/_search
{

De
aHr
"query": {

WB
N
Dom
"bool": {
sSt
Nu
"must_not": { -OC
TI
U19
2L0

"exists": {
Sr-O
ASp

"field": "geoip"
1-I
-0N
-

}
1rt9
p0o
rp-2

}
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !189


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Scripted Queries
The Script Query
• Suppose we want blogs that have more than one “locale”
GET blogs_fixed/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['locales'].size() > 1"
}
}

De
aHr
WB
"hits": {
}

N
Dom
"total": 54,

sSt
}
Nu
"max_score": 0,
-OC
TI
} "hits": [
U19
2L0

{
Sr-O

}
ASp

"_source": {
1-I
-0N

"locales": [
-
1rt9

"de-de",
p0o

Script queries must


rp-2

"fr-fr",
Spu
e-A

"ja-jp",
compute a Boolean value
0is2

"ko-kr",
p-r
r
IeN

"zh-chs"
Hnt
.t CE

],
.Lo
nHd

"author": "Steve Dodson",


ar

"category": "News",
Le

"title": "Introducing Machine


Learning for the Elastic Stack",

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !191


distributing without written permission is strictly prohibited
The Execute API
• It is possible to test a script before executing it in a query
using the execute API
• The API supports multiple contexts depending on how the
script is being executed
POST /_scripts/painless/_execute
{
"script": {
Script to test
"source": "doc['locales'].size() > 1"

De
}, Context in which the

aHr
WB
"context": "filter",

N
Dom
script is applied
sSt
"context_setup": {
Nu
-OC
TI
"index": "blogs_fixed",
U19
2L0
Sr-O

"document": {
Specify the
ASp

"locales": ["fr-fr", "de-de"]


1-I
-0N

document on which
-

}
1rt9
p0o
rp-2

} the script will be


Spu
e-A

}
0is2

applied
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

{
ar
Le

"result": true
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !192


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Scripted Fields
Script Fields
• You can add fields to a query response that are
generated in a script
‒ Use script_fields to return a script evaluation for each hit:

The name of the field being


GET blogs_fixed/_search added to the response
{
"script_fields": {
"day_of_week": {
"script": {

De
aHr
WB
"source": """

N
Dom
sSt
def d = new Date(doc['publish_date'].value.millis);
Nu
-OC
TI
return d.toString().substring(0,3);
U19
2L0
Sr-O

"""
ASp
1-I

}
-0N
-
1rt9

}
p0o
rp-2
Spu

}
e-A
0is2

}
p-r
r
IeN
Hnt
.t CE
.Lo

In Painless, use def to The value returned


nHd
ar
Le

declare variables from the script

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !194


distributing without written permission is strictly prohibited
Script Fields
• The response from the previous query includes the day of
the week
‒ but the _source did not get returned?
"hits": [
{
"_index": "blogs_fixed",
"_type": "doc",
"_id": "Cc1CKmIBCLh5xF6i7Y2b",
"_score": 1,
"fields": {
"day_of_week": [

De
aHr
"Fri"

WB
N
Dom
]

sSt
Nu
} TI
U19
-OC
},
2L0
Sr-O

{
ASp
1-I

"_index": "blogs_fixed",
-0N
-
1rt9

"_type": "doc",
p0o
rp-2

"_id": "g81CKmIBCLh5xF6i7Y-v",
Spu
e-A

"_score": 1,
0is2
p-r

"fields": {
r
IeN
Hnt
.t CE

"day_of_week": [
.Lo
nHd

"Tue"
ar
Le

]
}
},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !195


distributing without written permission is strictly prohibited
Adding _source to Script Fields
• If you add a script_fields section, the query no longer
returns the _source by default
‒ but it is easy to add _source to the response

GET blogs_fixed/_search Now you will get the _source


{ plus the define “script_fields”
"_source": [],

De
"script_fields": {

aHr
WB
"day_of_week": {

N
Dom
sSt
"script": {
Nu
-OC
TI
"source": """
U19
2L0
Sr-O

def d = new Date(doc['publish_date'].value.millis);


ASp

return d.toString().substring(0,3);
1-I
-0N
-

"""
1rt9
p0o
rp-2

}
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !196


distributing without written permission is strictly prohibited
Performance Concerns
• It is important to understand the cost of using a script in
some use cases
‒ For example, the script in a script_fields clause needs to be
executed on each hit
• It may be much more efficient to calculate these needed
values once, at index time
‒ perhaps using an ingest pipeline

De
aHr
WB
N
Dom
‒ or modeling your data in a way where scripting can be avoided
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !197


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Search Templates
Use Case for Search Templates
• Our users write some very lengthy and complex queries
‒ In many cases, the same query is executed over and over, but
with a few different values
• Search templates allow you to define a query with
parameters that can be defined at execution time
• Benefits include:

De
aHr
‒ avoid repeating code in multiple places

WB
N
Dom
sSt
Nu
-OC
‒ minimize mistakes
TI
U19
2L0
Sr-O
ASp
1-I
-0N

‒ easier to test and execute your queries


-
1rt9
p0o
rp-2
Spu
e-A

‒ share queries between applications


0is2
p-r
r
IeN
Hnt
.t CE
.Lo

‒ allow users to only execute a few predefined queries


nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !199


distributing without written permission is strictly prohibited
Defining a Template
• Templates are stored in the cluster state using the _scripts
endpoint
‒ the language for search templates is “mustache”

POST _scripts/my_search_template
Name of the
{ search template
"script": {

De
"lang": "mustache",

aHr
WB
N
Dom
"source": {

sSt
Nu
"query": { TI
U19
-OC
"match": {
2L0
Sr-O

"{{my_field}}": "{{my_value}}"
ASp
1-I
-0N

}
-
1rt9
p0o

}
rp-2
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo
nHd

Parameters
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !200


distributing without written permission is strictly prohibited
Using a Stored Template
• Use the _search/template endpoint to execute a stored
template
‒ passing in the necessary parameter values

“I am looking for
blogs that have shard in
the title.”

De
aHr
WB
N
Dom
GET blogs/_search/template

sSt
Nu
{ TI
U19
-OC
"id": "my_search_template",
2L0
Sr-O

"params": {
ASp
1-I
-0N

"my_field": "title",
-
1rt9

"my_value": "shard"
p0o
rp-2
Spu

}
e-A
0is2
p-r

}
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !201


distributing without written permission is strictly prohibited
Template for the Blogs Web App
POST _scripts/blogs_webform_search
{
"script": {
"lang": "mustache",
"source": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "{{blog_query}}",
"fields": ["title","title.*","content","content.*"],
"operator": "and",
"type": "most_fields"
}

De
aHr
WB
},

N
Dom
"should": {

sSt
Nu
"multi_match": { -OC
TI
U19

"query": "{{blog_query}}",
2L0
Sr-O
ASp

"fields": ["title","title.*","content","content.*"],
1-I
-0N

"type": "phrase"
-
1rt9

}
p0o
rp-2
Spu

}
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !202


distributing without written permission is strictly prohibited
Using the Blogs Templates
• When a user searches for something on our website, we
can simply execute the search template with the input:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O

GET blogs_fixed/_search/template
ASp
1-I
-0N

{
-
1rt9

"id": "blogs_webform_search",
p0o
rp-2
Spu

"params": {
e-A
0is2

"blog_query": "shard allocation"


p-r
r
IeN
Hnt

}
.t CE
.Lo

}
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !203


distributing without written permission is strictly prohibited
Conditionals
• Mustache does not have if/else logic
‒ But you can define a section that gets skipped if the parameter
is false or not defined

{{#param1}}
"This section is skipped if param1 is null or false"
{{/param1}}

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !204


distributing without written permission is strictly prohibited
Example with a Conditional
POST _scripts/blogs_with_date_search
{
"script": {
"lang": "mustache",
"source":
""" {
"query": {
"bool": {
"must": {
"match": {"content": "{{search_term}}"}
}
{{#search_date}}
,

De
aHr
WB
"filter": {

N
Dom
sSt
"range": {
Nu
-OC
TI
"publish_date": {"gte": "{{search_date}}"}
U19
2L0
Sr-O

}
ASp
1-I

}
-0N
-
1rt9

{{/search_date}}
p0o

Add a filter clause if the


rp-2
Spu

}
e-A

“search_date” parameter
0is2

}
p-r
r
IeN

is defined
Hnt

}
.t CE
.Lo

"""
nHd
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !205


distributing without written permission is strictly prohibited
Test the Conditional Script

GET blogs_fixed/_search/template
{
"id": "blogs_with_date_search", 47 hits
"params": {
"search_term": "shay banon"
}
}

De
aHr
WB
N
Dom
GET blogs_fixed/_search/template
sSt
Nu
{ -OC
TI
U19
2L0

"id": "blogs_with_date_search", 5 hits


Sr-O
ASp

"params": {
1-I
-0N

"search_term": "shay banon",


-
1rt9
p0o
rp-2

"search_date": "2017-07-01"
Spu
e-A

}
0is2
p-r
r
IeN

}
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !206


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Aggregations
The Percentiles Aggregation
• The percentiles metrics aggregation calculates percentiles
over a numeric field
‒ percentiles show the point at which a certain percentage of
observed values occur

GET logs_server*/_search
{
"size": 0,
"aggs": {

De
"runtime_percentiles": {

aHr
"runtime_percentiles": {

WB
N
Dom
"percentiles": { "values": {

sSt
Nu
"field": "runtime_ms" TI
U19
-OC "1.0": 0,
} "5.0": 88.00109639047503,
2L0
Sr-O

"25.0": 95.00000000000001,
ASp

}
1-I
-0N

} "50.0": 103.37306961911929,
-
1rt9
p0o

"75.0": 159.88916204500126,
rp-2

}
Spu
e-A

"95.0": 685.1015874756147,
0is2
p-r

"99.0": 4198.930939937213
r
IeN
Hnt
.t CE

}
.Lo
nHd

}
ar

95% of logs have a runtime


Le

less than 685.1 ms

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !208


distributing without written permission is strictly prohibited
Percentiles Example
• You can specify the percentiles to be computed
‒ For example, the following agg computes the quintiles (20%
intervals) for the runtime of the log events

GET logs_server*/_search
{
"size" : 0,
"aggs": {
"runtime_quintiles": { "runtime_quintiles": {
"percentiles": { "values": {

De
aHr
"field": "runtime_ms", "20.0": 93.83826098733662,

WB
N
Dom
"percents": [ "40.0": 99,

sSt
Nu
20, -OC "60.0": 111.35144881124744,
TI
U19

40, "80.0": 236.679414908546,


2L0
Sr-O
ASp

60, "100.0": 59756


1-I
-0N

80, }
-
1rt9
p0o

100
rp-2

}
80% of the responses
Spu
e-A

]
0is2

have of runtime less than


p-r

}
r
IeN
Hnt

236 ms
.t CE

}
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !209


distributing without written permission is strictly prohibited
Percentiles Rank
• Instead of providing percents and getting back values, you
can pass in a value and get back its percentile
‒ percentiles_rank is essentially the opposite logic of percentile

GET logs_server*/_search
{
"size": 0,
"aggs": { "runtime_ranks": {

De
aHr
"values": {

WB
"runtime_ranks": {

N
Dom
"100.0": 43.74582092577736,
sSt
"percentile_ranks": {
Nu
-OC "500.0": 94.35230182775403
TI
"field": "runtime_ms",
U19
2L0

}
Sr-O

"values": [100, 500]


ASp

}
1-I

}
-0N

43.7% of responses occurred


-
1rt9

}
in less than 100 ms
p0o
rp-2
Spu

}
e-A
0is2

}
p-r

You can pass in multiple


r
IeN
Hnt
.t CE

values
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !210


distributing without written permission is strictly prohibited
Motivation for top_hits
• Suppose we do a search for blogs about Logstash filters,
and we want to bucket them by author:
Suyog wrote 69 blogs about
Logstash filters, but which blogs
GET blogs/_search are the most relevant?
{
"size": 0,
"query": {
"match": { "buckets": [
{
"content": "logstash filters"
"key": "Suyog Rao",
}

De
aHr
"doc_count": 69

WB
},

N
Dom
},

sSt
"aggs": {
Nu
{
-OC
TI
"blogs_by_author": { "key": "Alexander Reelsen",
U19
2L0
Sr-O

"terms": { "doc_count": 67
ASp

},
1-I

"field": "author.keyword"
-0N
-

{
1rt9

}
p0o
rp-2

"key": "Megan Wieling",


Spu

}
e-A

"doc_count": 31
0is2

}
p-r

},
r
IeN
Hnt

} {
.t CE
.Lo
nHd

"key": "Leslie Hawthorn",


ar
Le

"doc_count": 29
},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !211


distributing without written permission is strictly prohibited
Example of top_hits
GET blogs/_search
{
"size": 0,
"query": {
"match": {
"content": "logstash filters"
}
},
"aggs": {
"blogs_by_author": {
"terms": {
"field": "author.keyword"

De
aHr
},

WB
N
Dom
"aggs": {

sSt
Nu
"logstash_top_hits": { -OC
TI
U19

"top_hits": {
2L0
Sr-O
ASp

"size": 5
1-I
-0N

}
-

Returns the top 5 blogs from each


1rt9
p0o

}
rp-2

author (based on the _score from


Spu
e-A

}
0is2

the “match” query)


p-r

}
r
IeN
Hnt
.t CE

}
.Lo
nHd

}
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !212


distributing without written permission is strictly prohibited
The output of top_hits:
• Notice the top 5 hits from each bucket are returned in the
“aggregations” clause of the response

"buckets": [
{
"key": "Suyog Rao",
"doc_count": 69,
"logstash_top_hits": {
"hits": {
"total": 69,
"max_score": 6.6510196,
"hits": [

De
aHr
WB
{

N
Dom
"_index": "blogs",

sSt
Nu
"_type": "doc", -OC
TI
U19

"_id": "TM1CKmIBCLh5xF6i7Y2b",
2L0
Sr-O
ASp

"_score": 6.6510196,
1-I
-0N

"_source": {
-
1rt9

"publish_date": "2016-06-27T06:00:00.000Z",
p0o
rp-2
Spu

"seo_title": "",
e-A
0is2

"category": "The Logstash Lines",


p-r
r
IeN

"locales": "",
Hnt
.t CE

"title": "Logstash Lines: More Monitoring


.Lo
nHd
ar

Info, Beats Input Improvements",


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !213


distributing without written permission is strictly prohibited
The missing Aggregation

“How many log


events are missing the
GET logs_server*/_search IP location?”
{
"size": 0,
"aggs": {
"missing_latitude": {
"missing": {
"field": "geoip.location.lat"
}

De
},

aHr
WB
N
Dom
"missing_longitude": {

sSt
Nu
"missing": { TI
U19
-OC
"field": "geoip.location.lon"
2L0
Sr-O

} "aggregations": {
ASp

"missing_latitude": {
1-I
-0N

}
-
1rt9

"doc_count": 6397
p0o

}
rp-2

},
Spu
e-A

}
0is2

"missing_longitude": {
p-r
r
IeN

"doc_count": 6397
Hnt
.t CE
.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !214


distributing without written permission is strictly prohibited
Scripted Aggregations
• Typically, the values used in an aggregation are from a field
within the document:

GET blogs/_search
{
"size": 0,
"aggs": {

De
aHr
"author_terms": {

WB
A simple terms agg on a

N
Dom
"terms": {

sSt
field in the document
Nu
-OC "field": "author.keyword"
TI
U19

}
2L0
Sr-O
ASp

}
1-I
-0N

}
-
1rt9
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !215


distributing without written permission is strictly prohibited
Scripted Aggregations
• It is also possible to define a script which will generate the
values used in an aggregation
‒ The script generates a value per document
GET blogs/_search
{
"size": 0, A terms agg that buckets by
"aggs": { the day of the week
"blogs_by_day_of_week": {
"terms": {

De
"script": {

aHr
WB
N
Dom
"source": "doc['publish_date'].value.dayOfWeek"

sSt
Nu
} TI
U19
-OC
}
2L0
Sr-O

}
ASp
1-I
-0N

} "buckets": [
-
1rt9

{
p0o

}
rp-2

"key": "2",
Spu
e-A

"doc_count": 415
0is2
p-r

1 = Monday,
r

},
IeN
Hnt
.t CE

{
2 = Tuesday,
.Lo
nHd

"key": "1",
ar

etc.
Le

"doc_count": 385
},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !216


distributing without written permission is strictly prohibited
Scripted Aggregations
• Another common use case for scripts in aggs is to combine
multiple fields for a terms agg
‒ We can view the days of the week that authors publish blogs:
GET blogs/_search
{
"size": 0,
"aggs": {
"blogs_by_author_and_day_of_week": {
"terms": {
"script": {

De
aHr
"source": "doc['author.keyword'].value +

WB
N
Dom
'_' +doc['publish_date'].value.dayOfWeek"

sSt
"buckets": [

Nu
} { TI
U19
-OC
} "key": "Alexander Reelsen_3",
2L0
Sr-O

} "doc_count": 67
ASp

},
1-I
-0N

} {
-
1rt9

}
p0o

"key": "Livia Froelicher_1",


rp-2

"doc_count": 52
Spu
e-A

},
0is2
p-r

{
r
IeN
Hnt

"key": "Clinton Gormley_1",


.t CE

"doc_count": 42
.Lo
nHd

},
ar
Le

{
"key": "Clinton Gormley_2",
"doc_count": 38
},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !217


distributing without written permission is strictly prohibited
The Significant Terms Aggregation
• Terms Aggregation + Noise Filter
‒ Discards commonly common terms that terms agg would
return
• Low frequency terms in the background data pop out as
high frequency terms in the foreground data
‒ Finds uncommonly common terms in your dataset
Some Use Cases:

De
aHr
Query

WB
N
Dom
sSt
Nu - Recommendation
FG -OC
TI
U19
2L0
Sr-O
ASp

- Fraud Detection
1-I
-0N
-
1rt9
p0o

BG
rp-2
Spu
e-A

- Defect Detection
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

And more…
nHd
ar
Le

https://www.elastic.co/blog/significant-terms-aggregation

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !218


distributing without written permission is strictly prohibited
Let’s Start with a Terms Agg

Let’s look for a


relationship between
GET blogs/_search authors and content.
{
"size": 0,
"aggs": {
"author_buckets": {
"terms": {
"field": "author.keyword",
"size": 10
},

De
aHr
"aggs": {

WB
N
Dom
"content_terms": {
sSt
Nu
"terms": { -OC
TI
U19
2L0

"field": "content",
Sr-O
ASp

"size": 10
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !219


distributing without written permission is strictly prohibited
The Results of the Terms Agg
• These would be considered “commonly common” terms
that our authors use in their blogs:
{
"key": "Monica Sarbu",
"doc_count": 89,
"content_terms": {
"doc_count_error_upper_bound": 66,
"sum_other_doc_count": 15096,
"buckets": [
{
"key": "and",
"doc_count": 89
},
{

De
"key": "the",

aHr
Monica likes to blog about “and”,

WB
"doc_count": 89

N
Dom
}, “the”, “to”, “in” and so on.
sSt
Nu
{ -OC
TI
U19
"key": "to",
2L0
Sr-O

"doc_count": 88
ASp

},
1-I
-0N

{
-
1rt9
p0o

"key": "in",
rp-2
Spu

"doc_count": 86
e-A
0is2

},
p-r
r
IeN

{
Hnt
.t CE

"key": "is",
.Lo
nHd

"doc_count": 86
ar

},
Le

{
"key": "this",

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !220


distributing without written permission is strictly prohibited
Using significant_terms
• Now let’s try the same agg with significant_terms:

GET blogs/_search Let’s look for the


{ “uncommonly common”
"size": 0, relationship between
"aggs": {
"author_buckets": { author and content.
"terms": {
"field": "author.keyword",
"size": 10
},

De
aHr
"aggs": {

WB
N
Dom
"content_significant_terms": {
sSt
Nu
"significant_terms": { -OC
TI
U19
2L0

"field": "content",
Sr-O
ASp

"size": 10
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !221


distributing without written permission is strictly prohibited
The Output of significant_terms
"key": "Monica Sarbu",
"doc_count": 89,
"content_significant_terms": {
"buckets": [
{
"key": "metricbeat",
"doc_count": 66,
"score": 8.295430260419582,
"bg_count": 97
},
{
"key": "filebeat",
It appears Monica is an
"doc_count": 66, expert on Beats!
"score": 5.849324105618168,

De
"bg_count": 133

aHr
WB
},

N
Dom
sSt
{
Nu
-OC
TI
"key": "beat",
U19
2L0

"doc_count": 56,
Sr-O
ASp

"score": 5.3810714135420605,
1-I
-0N

"bg_count": 105
-
1rt9
p0o

},
rp-2
Spu
e-A

{
0is2
p-r

"key": "beats",
r
IeN
Hnt

"doc_count": 84,
.t CE
.Lo

"score": 4.668550553314773,
nHd
ar

"bg_count": 253
Le

},..

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !222


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Pipeline Aggregations
Use Case for Pipeline Aggregations
• The following aggregations calculate the monthly response
size of the logs dataset:
‒ but suppose we want the cumulative sum as well…
GET logs_server*/_search
{
"size": 0,
"aggs": {
"logs_by_month": {
"date_histogram": {
"field": "@timestamp",

De
aHr
WB
"interval": "month"

N
Dom
sSt
},
Nu
-OC
TI
"aggs": {
U19
2L0
Sr-O

"monthly_sum_response": {
ASp
1-I
-0N

"sum": {
-
1rt9

"field": "response_size"
p0o
rp-2
Spu

}
e-A

This gives us the sum from


0is2

}
p-r
r
IeN

each month, but suppose we


Hnt

}
.t CE

want the cumulative sum


.Lo

}
nHd
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !224


distributing without written permission is strictly prohibited
Pipeline Aggregations
• A pipeline aggregation performs computations on the
results of other aggregations
‒ The input of a pipeline aggregation is typically the output of
another aggregation
‒ Use the buckets_path parameter to reference the values in the
other aggregations
• Pipeline aggregations are executed on the coordinating

De
aHr
node, after the results of the input agg are collected

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !225


distributing without written permission is strictly prohibited
Example of a Pipeline Aggregation
GET logs_server*/_search
{
"size": 0,
"aggs": {
"logs_by_month": {
"date_histogram": {
"field": "@timestamp",
"interval": "month"
},
"aggs": {
"monthly_sum_response": {
"sum": {
"field": "response_size"

De
aHr
WB
}

N
Dom
“cumulative_sum” is a
sSt
},
Nu
-OC
TI
"cumulative_sum_response": { pipeline agg
U19
2L0
Sr-O

"cumulative_sum": {
ASp
1-I

"buckets_path": "monthly_sum_response"
-0N
-
1rt9

}
p0o
rp-2
Spu

} The input of “cumulative_sum” is


e-A
0is2

}
p-r

the result of
r
IeN
Hnt

}
.t CE

“monthly_sum_response”
.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !226


distributing without written permission is strictly prohibited
Example of a Pipeline Aggregation
• The output from the previous search looks like:

"buckets": [
{
"key_as_string": "2017-03-01T00:00:00.000Z",
"key": 1488326400000,
"doc_count": 255,
"monthly_sum_response": {
"value": 15860968 monthly sum
},
"cumulative_sum_response": {
"value": 15860968 cumulative sum
}

De
aHr
WB
},

N
Dom
sSt
{
Nu
"key_as_string": "2017-04-01T00:00:00.000Z", -OC
TI
U19
2L0

"key": 1491004800000,
Sr-O
ASp

"doc_count": 467961,
1-I
-0N

"monthly_sum_response": {
-
1rt9

monthly sum
p0o

"value": 25446117219
rp-2
Spu

},
e-A
0is2

"cumulative_sum_response": {
p-r
r
IeN

"value": 25461978187
Hnt

cumulative sum
.t CE

}
.Lo
nHd
ar

},
Le

...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !227


distributing without written permission is strictly prohibited
Specifying buckets_path
• When referring to an aggregation at the same level,
buckets_path can simply be the name of the agg:
GET logs_server*/_search
{
"size": 0,
"aggs": {
"logs_by_month": {
"date_histogram": {
"field": "@timestamp",
"interval": "month"
},
"aggs": {
"monthly_sum_response": {
"sum": {

De
aHr
"field": "response_size"

WB
N
Dom
}

sSt
Nu
},
-OC In this example, the input
TI
"monthly_max_response": {
U19

agg and cumulative agg


2L0

"max": {
Sr-O
ASp

"field": "response_size"
are at the same level
1-I
-0N

}
-
1rt9

},
p0o
rp-2

"cumulative_sum_response": {
Spu
e-A

"cumulative_sum": {
0is2
p-r

"buckets_path": "monthly_sum_response"
r
IeN
Hnt

}
.t CE
.Lo

}
nHd
ar

}
Le

}
}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !228


distributing without written permission is strictly prohibited
Aggregation Separator ‘>’
• If a pipeline aggregation is at a parent level, use the >
symbol to reference the desired sub-level aggregation
being used for the input:

GET logs_server*/_search
{ “max_monthly_sum” is a single output
"size": 0,
"aggs": { over all buckets and the max value of
"logs_by_month": { “monthly_sum_response”
"date_histogram": {
"field": "@timestamp",
"interval": "month"

De
aHr
},

WB
N
Dom
"aggs": {

sSt
"monthly_sum_response": {
Nu
"sum": { -OC
TI
U19
2L0

"field": "response_size"
Use a ‘>’ symbol as a separator
Sr-O

}
ASp
1-I

between aggregations
-0N

}
-
1rt9

}
p0o
rp-2

},
Spu
e-A

"max_monthly_sum": {
0is2
p-r

"max_bucket": {
r
IeN

"buckets_path": "logs_by_month>monthly_sum_response"
Hnt
.t CE

}
.Lo
nHd

}
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !229


distributing without written permission is strictly prohibited
Examples of Pipeline Aggs
• There are about a dozen pipeline aggs available, including:
‒ avg_bucket: calculates the average of a specified metric
‒ sum_bucket: calculates the sum of a specified metric
‒ min_bucket and max_bucket: for finding the min or max
‒ moving_avg: calculates a moving average over a specified
window

De
aHr
WB
‒ bucket_script: executes a script

N
Dom
sSt
Nu
-OC
TI
U19

• View the documentation for a complete list of pipeline


2L0
Sr-O
ASp

aggregations:
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A


0is2

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !230


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Use script_fields to add fields to a query response that
are generated in a script
• Search templates allow you to define a query with
parameters that can be defined at execution time
• The percentiles aggregation calculates percentiles over a
numeric field
• The top_hits aggregation keeps track of the most relevant

De
aHr
WB
N
Dom
documents being aggregated over
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• A pipeline aggregation performs computations on the


ASp
1-I
-0N

results of other aggregations


-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !232


distributing without written permission is strictly prohibited
Quiz
1. True or False: Wildcards are powerful and should be used
often.
2. How would you check how many documents do not have a
specific field?
3. True or False: Scripts can be used to query, to aggregate
and to return new calculated values.
4. Why would you use search templates?

De
aHr
WB
N
Dom
sSt
5. In pipeline aggregations, when do you use the ‘>’ symbol? Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

6. True or False: The input of a pipeline aggregation is the


-0N
-
1rt9
p0o
rp-2

output of another aggregation.


Spu
e-A
0is2
p-r
r
IeN
Hnt

7. In our logs_server* indices, how could you verify that 95%


.t CE
.Lo
nHd
ar

of web requests are executed in less than 100ms?


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !233


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 4
Advanced Search & Aggregations
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 5

De
Cluster
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

Management
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Elasticsearch Architecture Recap
• Dedicated Nodes
• Hot/Warm Architecture
• Shard Filtering
• Shard Filtering for Hardware

De
• Shard Allocation Awareness

aHr
WB
N
Dom
sSt
Nu
-OC
• Forced Awareness
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !236


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Elasticsearch
Architecture Recap
An Elasticsearch Cluster
• The largest unit of scale in Elasticsearch is a cluster
• A cluster is made up out of one or more nodes
• Each node is a running Elasticsearch process and typically
there is a 1x1 relationship between a server and a node

my_cluster

De
aHr
node1 node2 node3

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !238


distributing without written permission is strictly prohibited
Indexes and Shards
• In your cluster, you create one or more indices
• Each index is sharded and its shards are distributed over
the nodes
• Elasticsearch automatically distributes the shards evenly
across the nodes

my_cluster

De
aHr
WB
N
Dom
node1 node2 node3

sSt
Nu
-OC
TI
U19
2L0

my_index
Sr-O
ASp
1-I
-0N

P0 P1 P2 P3 P4
-
1rt9
p0o
rp-2
Spu
e-A

R4 R0 R1 R2 R3
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !239


distributing without written permission is strictly prohibited
More Control
• Maybe you want more control over
‒ your servers specification
‒ or where the shards of your indices are allocated to
my_cluster

node1 node2 node3

my_index1 my_index2

De
aHr
WB
N
Dom
P0 P1 P0 P1 P2 P3
sSt
Nu
-OC
TI
U19

R2 R3 R0 R1
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

The shards of my_index1 live The shards of my_index2 live


ar
Le

on node1 on node2 and node3

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !240


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Dedicated Nodes
Node Roles
• There are several roles a node can have:
‒ Master eligible
‒ Data
‒ Ingest
‒ Machine Learning
‒ Coordinating

De
aHr
WB
N
Dom
sSt
• Nodes can be dedicated nodes that only take on a single TI
U19
Nu
-OC

role...
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !242


distributing without written permission is strictly prohibited
Configuring Node Roles
• By default, a node is a master-eligible, data, and ingest
node:

Node type Configuration parameter Default value

master eligible node.master true

De
aHr
WB
N
Dom
sSt
Nu
data -OC
node.data true
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2

ingest node.ingest true


Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !243


distributing without written permission is strictly prohibited
Dedicated Nodes
• You can configure a node to have a single role.
“I am
“I am a a dedicated
dedicated data
master eligible
node.” node1 node2 node.”
node.master: false node.master: true
node.data: true node.data: false
node.ingest: false node.ingest: false

De
aHr
WB
N
Dom
“I am a
sSt
Nu
-OC
dedicated ingest “What am I?”
TI
U19
2L0
Sr-O

node.”
ASp

node3 node4
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A

node.master: false node.master: false


0is2
p-r

node.data: false node.data: false


r
IeN
Hnt
.t CE

node.ingest: true node.ingest: false


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !244


distributing without written permission is strictly prohibited
Dedicated Nodes Example
minimum_master_nodes: 2
• Add data nodes to scale the cluster

node4
ingest node14 data
node5
data node6
data node7 node1
data
node8
data

De
aHr
WB
node15 node9 node2

N
Dom
ingest
data
sSt
Nu
-OC
TI
node10
U19
2L0

data
Sr-O

node3
ASp

node11
1-I
-0N

data
-
1rt9
p0o
rp-2

node12
Spu
e-A

data
0is2
p-r
r
IeN

node13
Hnt
.t CE

ingest node16 data


.Lo
nHd
ar
Le

my_cluster

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !245


distributing without written permission is strictly prohibited
Why Use Dedicated Nodes?
• Machines can be selected for specific purposes
• Dedicated master eligible nodes
‒ can focus on the cluster state
‒ machines with low CPU, RAM, and disk resources
• Dedicated data nodes
‒ can focus on data storage and processing client requests

De
aHr
WB
N
Dom
sSt
Nu
‒ machines with high CPU, RAM, and disk resources TI
-OC
U19
2L0
Sr-O
ASp

• Dedicated ingest nodes


1-I
-0N
-
1rt9
p0o
rp-2
Spu

‒ can focus on data processing


e-A
0is2
p-r
r
IeN
Hnt
.t CE

‒ machines with low disk, medium RAM, and high CPU resources
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !246


distributing without written permission is strictly prohibited
Coordinating Only Node
• A coordinating only node is specifically configured to not
be a master, data or ingest node
‒ machines with low disk, medium/high RAM, and medium/high
CPU resources (medium/high depends on use cases)
• useful in specific situations in large clusters
‒ behave like smart load balancers

De
‒ perform the gather/reduce phase of search requests

aHr
WB
N
Dom
sSt
Nu
-OC
‒ lightens the load on data nodes
TI
U19
2L0
Sr-O

“What am I?
ASp
1-I
-0N

Oh, I am a coordinating-only
-
1rt9
p0o

node.”
rp-2
Spu

node4
e-A
0is2
p-r
r
IeN
Hnt
.t CE

node.master: false
.Lo
nHd
ar

node.data: false
Le

node.ingest: false

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !247


distributing without written permission is strictly prohibited
Coordinating Only Node
node1

node4
node2
data
node5
data node6 node3
node14
coordinating
data node7
Client
data
node8
App data

De
aHr
node9

WB
node15
data
N
Dom
sSt
Nu
coordinating -OC node10
TI
U19

data
2L0
Sr-O
ASp

node11
1-I
-0N

data
-
1rt9
p0o

node12
rp-2
Spu

data
e-A
0is2
p-r

node13
r
IeN

data
Hnt
.t CE
.Lo
nHd
ar
Le

my_cluster

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !248


distributing without written permission is strictly prohibited
Dedicated Nodes Architecture

node4
READ data node14 data
node5
coordinating
data node6
Client
App data node7
node1
node15
coordinating
data node8
node2
data

De
node9

aHr
WB
data
N
Dom
sSt
WRITE data
Nu
TI
-OC node10 node3
data
U19
2L0
Sr-O

ingest node16 node11


ASp

data
1-I
-0N

Client
-
1rt9

node12
p0o
rp-2

App data
Spu
e-A

ingest node17
0is2

node13
p-r
r
IeN

data
Hnt
.t CE
.Lo
nHd

my_cluster
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !249


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Hot/Warm
Architecture
Hot/Warm Architecture
• You can configure the dedicated data nodes in your
cluster to use a hot/warm architecture
‒ useful for scenarios where you want to control which nodes
perform indexing vs. query handling
• Fine-grained control over data allocation
• Dedicated data nodes can be used as:

De
aHr
‒ Hot nodes

WB
N
Dom
sSt
Nu
-OC
‒ for supporting the indices with new documents being written to
TI
U19
2L0
Sr-O
ASp
1-I
-0N

‒ Warm nodes
-
1rt9
p0o
rp-2
Spu
e-A

‒ for handling read-only indices that are not as likely to be queried


0is2
p-r
r
IeN

frequently
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !251


distributing without written permission is strictly prohibited
Hot Nodes
• Use hot nodes for the indexing
‒ indexing is a CPU and IO intensive operation, so hot nodes
should be powerful servers
‒ faster storage than the warm nodes

my_cluster

De
aHr
WB
N
Dom
{

sSt
{ "volume": 46965,

Nu
{ "high":
"volume":
31.56,46965,
{ "high":
"volume":
31.56,46965,
"stock_symbol": "ALL", hot_node1
TI
-OC
U19
{ "high":
"volume":
31.56,46965,
"stock_symbol":
"low": 30.68, "ALL",
2L0

{ "high":
"volume":
31.56,46965,
Sr-O

"stock_symbol":
"low":
"close": 30.68,
30.91, "ALL",
{ "high":
"low": "volume":
30.68,31.56,
"stock_symbol": 46965,
"ALL", hot_node2
ASp

"close":
"trade_date": 30.91,
"username"
30.68, : "kimchy",
"high": 31.56,
"stock_symbol":
"low": 30.91,
"close": "ALL",
"trade_date":
1-I

"2010-01-15T07:00:00.000Z",
-0N

"tweet"
"low":
"close":
"trade_date": :
30.68,
30.91, "Search
"stock_symbol": is something
"ALL",
"2010-01-15T07:00:00.000Z",
that any application should have",
-

"low": 30.68,
1rt9

"close":
"trade_date": 30.91,
"2010-01-15T07:00:00.000Z",
"tweet_time"
"close": 30.91,: hot_node3
p0o

"trade_date":
"2010-01-15T07:00:00.000Z",
rp-2

"2010-02-17T23:09:00Z"
"trade_date":
"2010-01-15T07:00:00.000Z",
Spu

}
"2010-01-15T07:00:00.000Z",
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !252


distributing without written permission is strictly prohibited
Warm Nodes
• Use warm nodes for older, read-only indices
‒ tend to utilize large attached disks (usually spinning disks)
‒ larger amounts of data may require additional nodes to meet
performance requirements

GET tweets*/_search my_cluster


{

De
"query": { warm_node1

aHr
WB
"match": {

N
Dom
GET tweets*/_search
"tweet": "elastic"

sSt
warm_node2
Nu
{ }
hot_node1
TI
-OC
} "query": {
U19
2L0

"match": { warm_node3
Sr-O

} GET tweets*/_search hot_node2


ASp

{ "tweet": "elastic"
1-I
-0N

}"query": {
warm_node4
-
1rt9

} hot_node3
p0o

"match": {
rp-2

}
Spu

"tweet": "elastic"
warm_node5
e-A
0is2

}
p-r
r
IeN

}
warm_node6
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !253


distributing without written permission is strictly prohibited
Shard Filtering
• How is a hot/warm architecture deployed?
‒ you configure shard filtering, which we discuss next…

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !254


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Shard Filtering
Shard Filtering
• Shard filtering refers to the ability to control which nodes
the shards for an index are allocated:
‒ use node.attr to tag your nodes
‒ use index.routing.allocation to assign indexes to nodes
• Three types of rules for assigning indexes to nodes:
assign the index to a node whose {attr}
dynamic setting

De
has:

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19

index.routing.allocation.include.{attr} at least one of the values


2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu

index.routing.allocation.exclude.{attr} none of the values


e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

index.routing.allocation.require.{attr} all of the values


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !256


distributing without written permission is strictly prohibited
1. Tag the Nodes
• Step 1: tag your nodes using the node.attr property:
‒ a node attribute can be any name and any value
‒ either in elasticsearch.yml or the -E command line option

warm_node1

hot_node1 warm_node2

hot_node2 warm_node3

De
aHr
warm_node4

WB
hot_node3

N
Dom
sSt
Nu
-OC warm_node5
TI
U19
2L0
Sr-O

warm_node6
ASp
1-I

node.attr.my_temp: hot
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN

my_temp is the name of the node.attr.my_temp: warm


Hnt
.t CE

attribute and can be any name


.Lo
nHd
ar

you choose
Le

“hot” and “warm” are arbitrary


(but descriptive) values
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !257
distributing without written permission is strictly prohibited
2. Configure the Hot Data
• Step 2: configure your indexes to be allocated to the
tagged nodes
‒ suppose we want the logs from March, 2017, to be allocated to a
“hot” node:
PUT logs-2017-03
{
"settings": {
"index.routing.allocation.require.my_temp" : "hot"
}

De
aHr
}

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

The shards of
1-I
-0N
-
1rt9

logs-2017-03 will only be


p0o
rp-2
Spu

on “hot” nodes
e-A
0is2
p-r
r
IeN
Hnt
.t CE

‒ use index templates to automatically create all new indexes on


.Lo
nHd
ar

hot nodes
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !258


distributing without written permission is strictly prohibited
3. Move Older Shards to Warm
• Let’s move the log index from the previous month to “warm”
nodes
‒ index.routing.allocation is a dynamic setting, so we can
change it using the API:

PUT logs-2017-02/_settings
{
“index.routing.allocation.require.my_temp" : "warm"

De
aHr
WB
}

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

Move the shards from February, 2017


1-I
-0N
-
1rt9

to the warm nodes


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !259


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
for Hardware
Shard Filtering
Shard Filtering for Hardware
• Suppose you are implementing a hot/warm architecture
‒ and your nodes are tagged accordingly with “my_temp”
• But you also have different sizes of hardware:
‒ so you tag nodes using “my_server” as “small”, “medium”,
“large”
node.attr.my_temp: hot

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
node1 node3
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

node.attr.my_temp: warm
p0o
rp-2
Spu

node2 node4
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

node.attr.my_server: medium node.attr.my_server: small

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !261


distributing without written permission is strictly prohibited
Configure Your Indices
Put my_index1
on a medium server that
is also “hot”
PUT my_index1
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.routing.allocation.include.my_server" : "medium",
"index.routing.allocation.require.my_temp" : "hot"
}
}

De
aHr
WB
N
Dom
sSt
Nu
TI
-OC Put my_index2 on any
U19

server that is not “hot”


2L0

PUT my_index2
Sr-O
ASp

{
1-I
-0N
-
1rt9

"settings": {
p0o
rp-2

"number_of_shards": 1,
Spu
e-A
0is2

"number_of_replicas": 1,
p-r
r
IeN

"index.routing.allocation.include.my_server" : "medium,small",
Hnt
.t CE

"index.routing.allocation.exclude.my_temp" : "hot"
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !262


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Awareness
Shard Allocation
Awareness Example
• Suppose your hardware is spread across two racks (or
zones, or VMs, or any grouping you have):
rack1 rack2

my_cluster
node1 node3
R0 P1

De
aHr
WB
my_index

N
Dom
node2 node4

sSt
Nu
P0 -OC R1
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

If a rack fails, my_index


nHd
ar
Le

would not be fully available

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !264


distributing without written permission is strictly prohibited
Shard Allocation Awareness
• You can make Elasticsearch aware of the physical
configuration of your hardware
‒ using cluster.routing.allocation.awareness, which is a cluster
level setting
‒ referred to as shard allocation awareness
• Useful when more than one node shares the same
resource (disk, host machine, network switch, rack, etc.)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !265


distributing without written permission is strictly prohibited
Step 1: Label Your Nodes
• Use node.attr to label your nodes:

rack1 rack2

node1 node3

De
aHr
WB
N
Dom
node2 node4

sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

node.attr.my_rack_id=rack1 node.attr.my_rack_id=rack2
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

We will use “my_rack_id” as the


Le

awareness attribute

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !266


distributing without written permission is strictly prohibited
Step 2: Configure Your Cluster
• You have to tell Elasticsearch which attribute (or attributes)
are being used for awareness:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "my_rack_id"

De
}

aHr
WB
N
Dom
}

sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

The name of the node


1-I
-0N
-
1rt9

attribute you defined for


p0o
rp-2
Spu

awareness
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !267


distributing without written permission is strictly prohibited
Awareness Example
• Now you are guaranteed that at least one copy of all shards
will exist in each rack for each index:
rack1 rack2

my_cluster
node1 node3

P1

De
R1

aHr
WB
my_index

N
Dom
sSt
Nu
-OC
TI
U19
2L0

node2 node4
Sr-O
ASp
1-I
-0N
-
1rt9

P0 R0
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

“my_index” will now be fully


available if a rack fails

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !268


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Forced Awareness
Why Forced Awareness?
• Suppose you have a rack (or zone) that fails
‒ the remaining rack will try to reassign all the missing replicas
‒ the single rack might not be able to handle that type of volume

rack1 rack2

If rack1 fails, rack2 could


become overwhelmed

De
my_cluster

aHr
WB
N
Dom
node1 node3

sSt
Nu
-OC
TI
U19
2L0

P1
Sr-O

R1 Oh no! We
ASp
1-I
-0N

need a lot of new


-
1rt9

replicas.
p0o
rp-2
Spu
e-A

node2 node4
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

P0 R0
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !270


distributing without written permission is strictly prohibited
Forced Awareness
• You can configure forced awareness to avoid
overwhelming a zone
‒ never allows copies of the same shard to be in the same zone

rack1 rack2

De
my_cluster

aHr
WB
N
Dom
node1 node3

sSt
Nu
-OC
TI
U19

Never mind. I just


2L0

P1
Sr-O

R1
ASp

need to promote some


1-I
-0N
-

replicas.
1rt9
p0o
rp-2
Spu
e-A

node2 node4
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

P0 P0
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !271


distributing without written permission is strictly prohibited
Configure Forced Awareness
• You have to tell Elasticsearch which attributes and values to
use for forced awareness:
‒ In this example, no copies of the same shard will appear on
rack1 and rack2

PUT _cluster/settings
this setting is the name of
{ the attribute
"persistent": {

De
aHr
WB
"cluster": {

N
Dom
sSt
"routing": {
Nu
-OC
TI
"allocation.awareness.attributes": "my_rack_id",
U19
2L0
Sr-O

"allocation.awareness.force.my_rack_id.values": "rack1,rack2"
ASp
1-I

}
-0N
-
1rt9

}
p0o
rp-2
Spu

}
e-A
0is2

}
this setting contains the
p-r
r
IeN
Hnt
.t CE

values of the attribute


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !272


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Dedicated nodes can help you to better utilize hardware
• Coordinating-only nodes lightens the load on data nodes in
some use cases
• You can use shard filtering to configure a hot/warm
architecture for your cluster
• Shard filtering refers to the ability to control to which nodes an
index is allocated

De
aHr
WB
N
Dom
sSt
• You can make Elasticsearch aware of the physical configuration TI
U19
Nu
-OC

of your hardware using cluster.routing.allocation.awareness


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

• You can configure forced awareness to avoid overwhelming a


p0o
rp-2
Spu
e-A

rack or zone of servers


0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !274


distributing without written permission is strictly prohibited
Quiz
1. Suppose you created a new index every day for that day’s log
files. How could this scenario benefit from a hot/warm
architecture?
2. What happens if you configure an index’s shard filtering with
a scenario that is impossible for the cluster to implement?
3. Why configure shard allocation awareness if you have
already configured shard allocation filtering?

De
aHr
WB
4. What is the benefit of forced awareness over simply
N
Dom
sSt
Nu
-OC
configuring shard allocation awareness?
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !275


distributing without written permission is strictly prohibited
Lab Cluster Architecture

Node Server Type Tags

node1 server1 dedicated master-eligible none

node2 server2 data and ingest hot, rack1

De
aHr
WB
N
Dom
node3 server3 dedicated data warm, rack1
sSt
Nu
-OC
TI
U19
2L0
Sr-O

node4 server4 data and ingest hot, rack2


ASp
1-I
-0N
-
1rt9
p0o
rp-2

node5 server5 dedicated data warm, rack2


Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !276


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 5
Cluster Management
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

Chapter 6 6 Capacity Planning

De
Capacity Planning
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Designing for Scale
• Capacity Planning
• Scaling with Replicas
• Scaling with Indices
• Capacity Planning Use Cases

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !279


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Designing for Scale
Designing for Scale
• Elasticsearch is built to scale
‒ and the default settings can take you a long way
• Proper design can make scaling easier 100+ node
cluster

10 node cluster Master Nodes


…and so can
growing from

De
aHr
Growing from 1 to 10

WB
10 to 100

N
Dom
sSt
can be easy…
Nu
Master Node -OC Ingest Nodes
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2

Ingest Node Data Nodes – Hot


Spu
e-A
0is2
p-r

1 node cluster
r
IeN
Hnt
.t CE
.Lo
nHd
ar

Data Nodes
Le

Data Nodes – Warm

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !281


distributing without written permission is strictly prohibited
One Shard…
• …does not scale very well:

node1

P0

Adding a second node


would not provide any

De
aHr
WB
scaling for my_index

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

PUT my_index
1-I
-0N
-

{
1rt9
p0o
rp-2

"settings": {
Spu
e-A
0is2

"number_of_shards": 1,
p-r
r
IeN

"number_of_replicas": 0
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !282


distributing without written permission is strictly prohibited
Two Shards…
• …can scale if we add a node:

node1

P0 P1

We plan ahead by

De
aHr
WB
overallocating my_index

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

PUT my_index
1-I
-0N
-

{
1rt9
p0o
rp-2

"settings": {
Spu
e-A
0is2

"number_of_shards": 2,
p-r
r
IeN

"number_of_replicas": 0
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !283


distributing without written permission is strictly prohibited
Balancing of Shards
• Elasticsearch automatically balances the shards:

node1 node2

De
aHr
P0 P1 P1

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

Adding a node causes


p0o
rp-2
Spu
e-A

the cluster to rebalance


0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !284


distributing without written permission is strictly prohibited
Shard Overallocation
• If you are expecting your cluster to grow, then it is good to
plan for that by overallocating shards:
‒ #shards > #nodes
• Shards can move between nodes quickly as the cluster
grows
‒ and there is no downtime during shard relocation

De
aHr
WB
N
Dom
sSt
Nu
PUT my_index -OC node1 node2
TI
U19
2L0

{
Sr-O
ASp

"settings": { P1
1-I
-0N
-
1rt9

"number_of_shards": 5,
p0o
rp-2

"number_of_replicas": 0 P0 P2 P3
Spu
e-A
0is2

}
p-r

P4
r
IeN

}
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !285


distributing without written permission is strictly prohibited
Be careful
• The scaling unit is the shard
• But you should NOT have too many shards
‒ specially sub-utilized shards
• Each shard uses resources from the machine
‒ if you have too many it will seriously slow things down
• And if you have too many shards, Elasticsearch will be slow

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !286


distributing without written permission is strictly prohibited
“Too Much” Overallocation
• A little overallocation is good
• A “kagillion” shards is not:
‒ each shard comes at a cost (Lucene indices, file descriptors,
memory, CPU)
‒ also, a search request needs to hit every shard in the index
• A shard can optimally hold from 10 to 40 gigabytes

De
aHr
WB
N
Dom
‒ optimal depends on the use case
sSt
Nu
-OC
TI
U19
2L0

‒ a 1GB shard is sub-utilized


Sr-O
ASp
1-I
-0N
-
1rt9

node1 node2
p0o
rp-2
Spu
e-A
0is2

P0
P2 P0
P2
p-r
r
IeN

P2 P2
Hnt

P2 P2
.t CE

P2 P2
.Lo

P2 P2
nHd

P2 P2
ar

P2 P2
Le

P500 P1000
This is not a good plan

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !287


distributing without written permission is strictly prohibited
Do not Overshard
Cluster my_cluster
• Business Requirements
‒ 1GB per day access-
2017.10.01
‒ 6 months retention
‒ ~180GB we could easily
have this data
in 10 shards
my_app-
• Common Scenario 2017.10.01

De
aHr
WB
N
Dom
sSt
Nu
‒ 3 different logs TI
-OC
U19
2L0
Sr-O
ASp

‒ 1 index per day each


1-I
-0N
-
1rt9
p0o
rp-2
Spu

‒ 5 shards (default)
e-A

sql-
0is2
p-r

2017.10.01
r
IeN
Hnt

too many
.t CE

‒ 6 months retention
.Lo
nHd

shards for no
ar
Le

‒ ~2700 shards good reason!

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !288


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Capacity Planning
Capacity Planning
• So how many shards should I configure for my index?
‒ no simple formula to it
‒ “It depends!”
• Too many factors involved:
‒ use case: metrics, logging, search, apm, etc.
‒ hardware

De
aHr
WB
N
Dom
sSt
‒ # of documents TI
U19
Nu
-OC
2L0
Sr-O
ASp

‒ size and complexity of your documents


1-I
-0N
-
1rt9
p0o
rp-2

‒ how you index the data


Spu
e-A
0is2
p-r
r
IeN
Hnt

‒ how you search and aggregate the data


.t CE
.Lo
nHd
ar
Le

‒ how many indices the data will be spread across

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !290


distributing without written permission is strictly prohibited
Capacity Planning
• Before trying to determine your capacity, you need to
determine your SLA(s):
‒ How many docs/second do you need to index?
‒ How many queries/second do you need to process?
‒ What is the maximum response time for queries?
• Get some production data

De
aHr
WB
N
Dom
‒ actual documents you are going to index
sSt
Nu
-OC
TI
U19
2L0

‒ actual queries you are going to run in production


Sr-O
ASp
1-I
-0N
-
1rt9

‒ actual mappings you are going to use


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !291


distributing without written permission is strictly prohibited
Maximum Shard Capacity
• You can evaluate the maximum shard size for your
particular use case

1. Create a 1-node cluster using


your production hardware

my_cluster

De
aHr
WB
N
Dom
node1
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !292


distributing without written permission is strictly prohibited
Maximum Shard Capacity

2. Create a single index with 1


shard and no replicas

my_cluster

node1

De
aHr
WB
N
Dom
my_index

sSt
P0
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

Use the same settings and


p0o
rp-2
Spu

analyzers that you plan to


e-A
0is2
p-r

use in production
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !293


distributing without written permission is strictly prohibited
Maximum Shard Capacity

3. Incrementally index documents


and run your searches and
GET my_index/_search aggregations. Push this shard and
{ index docs until it “breaks” and you
...
}
will find its max capacity

De
my_cluster

aHr
WB
N
Dom
sSt
node1
Nu
-OC
TI
U19

“breaks” depends on
2L0
Sr-O
ASp

my_index your own definition -


1-I
-0N

P0
-
1rt9

ingestion rates, search


p0o
rp-2
Spu

rates, latency, etc.


e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !294


distributing without written permission is strictly prohibited
Number of Primary Shards
• It is not an exact science, but you can:
‒ estimate the total amount of data for your index
‒ leave room for growth (if applicable)
‒ divide by the maximum capacity of a single shard
• The number of primary shards is usually determined by
indexing speed

De
aHr
WB
N
Dom
• For searching, the total number of shards is the
sSt
Nu
-OC
measurement (agnostic of index)
TI
U19
2L0
Sr-O
ASp
1-I
-0N

‒ note we are not talking about replicas here, just number of


-
1rt9
p0o
rp-2

primary shards
Spu
e-A
0is2
p-r
r
IeN
Hnt

‒ we will talk about replicas next…


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !295


distributing without written permission is strictly prohibited
Measure Indexing Capacity of a Node
• You can calculate the indexing capacity of a single node
‒ index documents in a similar way as your application will
‒ index in parallel until 429 responses begin to come back
‒ use this to calculate the number of docs/second
‒ you can now calculate how to scale your nodes for ingestion
based on your SLA

De
aHr
WB
N
Dom
{
"volume": 46965, node1

sSt
Nu
{ "high": 31.56,
"stock_symbol":
"volume": 693,"ALL",
-OC
TI
U19
"low":"high":
30.68,31.56,
2L0

"close": 30.91,
{ "stock_symbol":
Sr-O

"ALL",
"trade_date":
"low":"volume":
30.68, 2381,
ASp

"2010-01-15T07:00:00.000Z",
{ "high":
"close": 31.56,
30.91,
1-I
-0N

"stock_symbol":
"trade_date": "ALL",
"volume": 90333,
"low":"high":
30.68,31.56,
-
1rt9

"2010-01-15T07:00:00.000Z",
"close": 30.91,
p0o

"stock_symbol": "ALL",
rp-2

"trade_date":
"low": 30.68,
Spu

"2010-01-15T07:00:00.000Z",
e-A

"close": 30.91,
0is2

"trade_date":
p-r

"2010-01-15T07:00:00.000Z",
r
IeN
Hnt
.t CE
.Lo
nHd
ar

Index docs until a 429 response, which


Le

mean ES can not keep up with indexing

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !296


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Scaling with Replicas
Scaling with Replicas
• Adding replicas provides an additional level of scaling
‒ We know replicas provide high availability
‒ They also provide scaling for reads and searches (if you add
additional hardware)

Suppose our index has

De
aHr
WB
one shard on one node

N
Dom
sSt
Nu
PUT my_index TI
U19
-OC node1
2L0

{
Sr-O
ASp

"settings": {
1-I
-0N
-
1rt9

"number_of_shards": 1,
P0
p0o
rp-2

"number_of_replicas": 0
Spu
e-A
0is2

}
p-r
r
IeN

}
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !298


distributing without written permission is strictly prohibited
Scaling with Replicas
• If you add another node, you can increase the number of
replicas:
‒ “number_of_replicas” is a dynamic setting

node1 node2
PUT my_index/_settings

De
aHr
{

WB
N
Dom
"number_of_replicas": 1

sSt
P0 R0
Nu
} -OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2

Adding a node and


Spu
e-A
0is2

replicas adds compute


p-r
r
IeN
Hnt

power to the cluster


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !299


distributing without written permission is strictly prohibited
The Cost of Replicas
• Replicas have a cost associated with them as well,
including:
‒ slower indexing speed (although replicas are indexed in parallel,
so it is not a cumulative cost)
‒ more storage on disk
‒ larger associated heap memory footprint of the extra shard(s)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !300


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Scaling with Indices
Scaling with Indices
• Using multiple indices also provides scaling
‒ If you need to add capacity, consider just creating a new index
• Then search across both indices to search “new” and “old”
data
‒ you could even define a single alias for the multiple indices
• Searching 1 index with 50 shards is equivalent to searching

De
aHr
50 indices with 1 shard each

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !302


distributing without written permission is strictly prohibited
Scaling with Indices

PUT my_index
node1 node2
{
"settings": {
"number_of_shards": 4
} P0 P2 P1 P3
}

Searching my_index hits 4 shards.

De
PUT my_index1

aHr
So does a search over

WB
N
Dom
{
my_index1,my_index2
sSt
Nu
"settings": { TI
U19
-OC
"number_of_shards": 2
2L0
Sr-O

}
ASp
1-I
-0N

}
-
1rt9
p0o

node1 node2
rp-2
Spu
e-A

PUT my_index2
0is2
p-r

my_index2 my_index1 my_index1 my_index2


{
r
IeN
Hnt
.t CE

"settings": { P0 P1 P0 P1
.Lo
nHd

"number_of_shards": 2
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !303


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Use Cases
Capacity Planning
Capacity Planning Use Cases
• When planning your cluster and designing indices, it is
important to understand:
‒ what your data looks like
‒ how that data is going to be searched
• Two very common use cases are:
‒ searching fixed-size data: searching a large dataset that may

De
grow slowly

aHr
WB
N
Dom
sSt
Nu
-OC
‒ time-based data: data that grows rapidly, like log files
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !305


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Fixed-size Data
Fixed-size Data
• Your use case may involve searching a large dataset that
only gradually grows and/or changes
‒ relatively fixed-size collection of documents
‒ search for relevant documents, no matter their age

node1 node2

De
aHr
WB
PUT hotels

N
Dom
P0 R2 P1 R3

sSt
{
Nu
"settings": { -OC
TI
U19
2L0

"number_of_shards": 4,
Sr-O
ASp

"number_of_replicas": 1
1-I
-0N

node3 node4
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN

R0 P2 R1 P3
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !307


distributing without written permission is strictly prohibited
Planning for Fixed-size Data
• In this case, size your indices based on the maximum shard
capacity vs. the amount of data you anticipate
‒ If you need to increase capacity, either reindex or add multiple
indices (but preferably not very often)
‒ Easy to increase throughput by adding more nodes and replicas

De
PUT hotels/_settings

aHr
node1 node2

WB
N
Dom
{

sSt
Nu
"number_of_replicas": 2 TI
U19
-OC
}
2L0

P0 R2 P1 R3
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu

node5 node6 node3 node4


e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

R0 R2 R1 R3 R0 P2 R1 P3
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !308


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Time-based Data
Time-based Data
• Time-based data includes:
‒ logs
‒ social media streams
‒ time-based events
• These documents have a timestamp, and likely do not
change

De
aHr
node1 node2

WB
N
Dom
sSt
Nu
-OC
TI
U19

PUT tweets-2017-02-05
2L0
Sr-O

{ P0 R2 P1 R3
ASp
1-I
-0N

"settings": {
-
1rt9
p0o

"number_of_shards": 4,
rp-2
Spu

node3 node4
e-A

"number_of_replicas": 1
0is2
p-r

}
r
IeN
Hnt
.t CE

}
.Lo
nHd

R0 P2 R1 P3
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !310


distributing without written permission is strictly prohibited
Planning for Time-based Data
• Searching on time-based data usually involves a
timestamp
‒ you typically search for recent events
‒ older documents become less important
• Data ingestion is another key factor to consider
‒ you typically have a lot of data coming in

De
aHr
WB
N
Dom
‒ you do not want indexing to become a bottleneck (it has to keep
sSt
Nu
-OC
up)
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !311


distributing without written permission is strictly prohibited
Index per Time Frame
• Time-based data is best organized using time-based
indices
‒ create a new index each day, week, month, year (whatever is
appropriate for the amount of data being ingested)
‒ add the date to the name of the index

PUT tweets-2017-02-05

De
aHr
{

WB
PUT tweets-2017-02-06

N
Dom
"settings":
{ {

sSt
Nu
PUT 4,
tweets-2017-02-07
"number_of_shards":
"settings": { -OC
TI
U19
{
"number_of_replicas": 1
2L0

"number_of_shards": 4,
Sr-O

"settings": {
ASp

} "number_of_replicas": 1
1-I
-0N

} "number_of_shards": 4,
-
1rt9

} "number_of_replicas": 1
p0o
rp-2

}
Spu

}
e-A
0is2

}
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !312


distributing without written permission is strictly prohibited
Searching Time-based Indices
• Searches can use wildcards or aliases to search over
multiple timeframes:

I want to search all


tweets in February, 2017

De
aHr
GET tweets-2017-02*/_search

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !313


distributing without written permission is strictly prohibited
Data Ingestion for Time-based Indices
• Hardcoding the index name in the application is not
scalable
• There are three main options to define the index name for
data ingestion in time-based indices:
‒ date library that calculates the correct index name
‒ date math in index names

De
aHr
‒ aliases

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !314


distributing without written permission is strictly prohibited
Data Ingestion for Time-based Indices
• Date math in index names:
‒ simple to define (e.g. <tweets-{now/d}>)
‒ less work on a daily basis
‒ redeploy the application when the unit changes (e.g. from monthly
indices to daily indices)

If now = 2018-03-22T11:56:22

De
aHr
WB
logstash-2018.03.22

N
Dom
<logstash-{now/d}>
sSt
Nu
-OC
TI
U19

logstash-2018.03
2L0

<logstash-{now{YYYY.MM}}>
Sr-O
ASp
1-I
-0N
-
1rt9

<logstash-{now/w}> logstash-2018.03.19
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt

Special characters should be URI encoded


.t CE
.Lo
nHd
ar

# GET /<logstash-{now/d}>/_search
Le

GET /%3Clogstash-%7Bnow%2Fd%7D%3E/_search

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !315


distributing without written permission is strictly prohibited
Data Ingestion for Time-based Indices
• Alias:
‒ very simple to define (e.g. tweets-write)
‒ needs an external tool to update the alias (e.g. every midnight)
‒ no need to redeploy your application

POST _aliases
{
"actions": [

De
aHr
{

WB
N
Dom
"add": {
sSt
Nu
"index": "<tweets-{now/d}>", -OC
TI
U19
2L0

"alias": "tweets_write"
Sr-O
ASp

}
1-I
-0N

},
-
1rt9
p0o
rp-2

{
Spu
e-A

"remove": {
0is2
p-r
r
IeN

"index": "<tweets-{now/d-1d}",
Hnt
.t CE

"alias": "tweets_write"
.Lo
nHd
ar

}
Le

} ] }

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !316


distributing without written permission is strictly prohibited
Managing Time-based Indices
• For optimal ingest rates:
‒ Spread the shards of your active index over as many nodes as
possible
‒ for example, 20 nodes = 20 primary shards on the active index
• For optimal search and low resource usage:
‒ shrink the older indices down to the optimal number of shards

De
aHr
WB
‒ close indices that are no longer being searched
N
Dom
sSt
Nu
-OC
TI
U19

• Use a hot/warm architecture


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

‒ use hot nodes for indexing and warm nodes for querying
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !317


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• If you are expecting your cluster to grow, then it is good to plan
for that by overallocating shards
• A little overallocation is good. A “kagillion” shards is not
• You can attempt to calculate the maximum shard size for your
particular use case by pushing the limits of your index using one
primary shard on a one-node cluster
• You can scale the query workload of your cluster by adding more

De
aHr
nodes and increasing the number of replicas of your indices

WB
N
Dom
sSt
Nu
-OC
TI
• You can similarly provide scaling by distributing your documents
U19
2L0
Sr-O
ASp

across multiple indices


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !319


distributing without written permission is strictly prohibited
Quiz
1. True or False: The number of primary shards of an index is
fixed at the time the index is created.
2. Is it more optimal to search over 1 index with 10 primary
shards, or 10 indices with 1 primary shard each?
3. If you have a two node cluster, why would you ever create an
index with more than two primary shards?
4. True or False: Creating an index with only one primary shard

De
aHr
WB
is not a good design.
N
Dom
sSt
Nu
-OC
TI
U19
2L0

5. Suppose you calculated the max shard size for your dataset
Sr-O
ASp
1-I
-0N

to be about 100,000 documents. How many shards should


-
1rt9
p0o
rp-2

you use for a relatively fixed-size dataset of 900,000


Spu
e-A
0is2
p-r

documents?
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !320


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 6
Capacity Planning
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 7

De
Document
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

Modeling
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• The Need for Document Modeling
• Denormalization
• The Need for Nested Types
• Nested Types
• Querying a Nested Type

De
• The Nested Aggregation

aHr
WB
N
Dom
sSt
Nu
-OC
• Parent/Child Relationship
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• The has_child Query


-
1rt9
p0o
rp-2
Spu
e-A
0is2

• The has_parent Query


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !323


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The Need for
Document Modeling
It’s all about relationships…
• If you come from the SQL world, you will likely need to
change your thought process for modeling data for
Elasticsearch
‒ In SQL, you typically normalize your data
• Search requires different considerations
‒ In Elasticsearch, you typically denormalize your data!

De
aHr
• A flat world has its advantages

WB
N
Dom
sSt
Nu
-OC
TI
‒ Indexing and searching is fast
U19
2L0
Sr-O
ASp
1-I
-0N

‒ No need to join tables or lock rows


-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !325


distributing without written permission is strictly prohibited
…and sometimes relationships matter
• There are times when relationships matter
‒ We need to bridge the gap between normal and flat
• Four common techniques for managing relational data in
Elasticsearch
‒ Denormalizing: flatten your data (typically the best solution)
‒ Application-side joins: run multiple queries on normalized data

De
aHr
WB
N
Dom
‒ Nested objects: for working with arrays of objects
sSt
Nu
-OC
TI
U19
2L0

‒ Parent/child relationships
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !326


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Denormalization
Denormalization
• Denormalizing your data refers to “flattening” your data
‒ storing redundant copies of data in each document, instead of
using some type of relationship
‒ _source is compressed which reduces the disk "waste"
• Denormalization provides the best performance out of
Elasticsearch

De
‒ no need to perform expensive joins

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !328


distributing without written permission is strictly prohibited
Example of Denormalization
• Suppose you are indexing users and tweets
‒ users in one index, and tweets in another
• When searching for tweets, suppose you want to search by
the username also

users tweets

De
aHr
WB
N
Dom
sSt
Nu
{ { -OC
TI
U19
2L0

"username" : "harrison", "body" : "My favorite movie is Star Wars",


Sr-O
ASp

"userid" : 1, "time" : "2017-01-24T02:32:27",


1-I
-0N

"city" : "Los Angeles",


-
1rt9

"userid" : 1
p0o
rp-2

"state" : "California" }
Spu
e-A

}
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !329


distributing without written permission is strictly prohibited
Example of Denormalization
• Duplicate the desired users data in the tweets document:

tweets

PUT tweets/_doc/123
{
"body" : "My favorite movie is Star Wars",
"time" : "2017-09-24T02:32:27",
"user": {
"userid" : 1,

De
aHr
"username" : "harrison",

WB
PUT tweets/_doc/456

N
Dom
"city" : "Los Angeles",

sSt
{
Nu
"state" : "California" "body" : "Laugh it up, fuzzball.", -OC
TI
U19

}
2L0

"time" : "1980-06-20T00:00:00",
Sr-O
ASp

} "user": {
1-I
-0N
-
1rt9

"userid" : 1,
p0o
rp-2

"username" : "harrison",
Spu
e-A
0is2

"city" : "Los Angeles",


p-r
r
IeN

"state" : "California"
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !330


distributing without written permission is strictly prohibited
Example of Denormalization
• Now you can search tweets and username all in a single
query:

GET tweets/_search
{
"query": {
"bool": {
"must": [
{"match": {"body": "movie"}},
{"match": {"user.username": "harrison"}}
]

De
aHr
WB
}

N
Dom
sSt
}
Nu
"_source": { -OC
TI
}
U19
2L0

"body": "My favorite movie is Star Wars",


Sr-O
ASp

"time": "2017-09-24T02:32:27",
1-I
-0N
-
1rt9

"user": {
p0o
rp-2

"userid": 1,
Spu
e-A
0is2

"username": "harrison",
p-r
r
IeN

"city": "Los Angeles",


Hnt
.t CE

"state": "California"
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !331


distributing without written permission is strictly prohibited
Tips for Denormalizing Data
• Denormalize data to optimize for reads
• There is no mechanism to keep the denormalized data
consistent with the original document
‒ avoid denormalizing data that is likely to change frequently
• When denormalizing data that changes, you should ensure
that there is 1 and only 1 authoritative source for the data
‒ If you do denormalize data that might change, it can help to also

De
aHr
WB
N
Dom
denormalize a field that does not change, like an _id
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

users tweets
-
1rt9
p0o
rp-2
Spu
e-A
0is2

PUT users/_doc/1 PUT tweets/_doc/123


p-r

Our example uses


r
IeN
Hnt

{ {
.t CE

“userid” as an
.Lo

"userid" : 1, "userid" : 1,
nHd

authoritative source
ar

...
Le

...
} }

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !332


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Types
The Need for Nested
Inner Objects
• Suppose we have a scenario where denormalization is not
an option (or would be difficult to implement)
• For example, suppose we are indexing metadata of image
files, and users can tag an image with any tags they want
‒ To maintain strict mappings, we use a key/value pair design:
PUT photos/_doc/1
{

De
aHr
"filename": "img1.jpg",

WB
N
Dom
"tags": [
A photo can be tagged
sSt
Nu
{"key": "event", "value": "Christmas"},
-OC
with any key/value pair…
TI
{"key": "folder", "value": "December2017"}
U19
2L0
Sr-O

]
ASp

}
1-I
-0N
-
1rt9
p0o
rp-2

PUT photos/_doc/2
Spu
e-A

{
0is2

…and can have any


p-r

"filename": "img2.jpg",
r
IeN
Hnt

"tags": [ number of tags, so we


.t CE
.Lo

{"key": "event", "value": "vacation"},


nHd

use an array
ar

{"key": "holiday", "value": "Christmas"}


Le

]
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !334


distributing without written permission is strictly prohibited
Searching inner objects…
• There is an interesting (and confusing) scenario that can
arise with arrays of JSON inner objects
‒ Which of the two documents on the previous slide is a hit for the
following query?
GET photos/_search
{
"query": {
"bool": {
"must": [

De
aHr
{

WB
N
Dom
"match": {

sSt
Nu
"tags.key": "event"
-OC
TI
} ?
U19
2L0
Sr-O

},
ASp

{
1-I
-0N
-

"match": {
1rt9
p0o
rp-2

"tags.value": "Christmas"
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

]
.t CE
.Lo
nHd

}
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !335


distributing without written permission is strictly prohibited
…can have surprising results
• Both documents are a hit?

GET photos/_search
{
"query": { "filename": "img1.jpg",
"bool": {
"must": [ "tags": [
{ {"key": "event", "value": "Christmas"},
"match": { {"key": "folder", "value": "December2017"}
"tags.key": "event" ]

De
aHr
WB
}

N
Dom
},

sSt
Nu
{ -OC
TI
"filename": "img2.jpg",
U19

"match": {
2L0
Sr-O

"tags": [
ASp

"tags.value": "Christmas"
1-I
-0N

} {"key": "event", "value": "vacation"},


-
1rt9

}
p0o

{"key": "holiday", "value": "Christmas"}


rp-2
Spu

]
e-A

]
0is2

}
p-r
r
IeN

}
Hnt
.t CE

}
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !336


distributing without written permission is strictly prohibited
Why the confusing results?
• The “key” and “value” fields are JSON inner objects of the
“tags” array field
‒ When the JSON object got flattened in the array, we lost the
relationship between “key” and “value”

All keys and values are put into


respective arrays

De
aHr
WB
N
Dom
{
sSt
Nu
"filename" : "img2.jpg", -OC
TI
U19
2L0

"tags.key" : [ "event", "holiday" ],


Sr-O
ASp

"tags.value" : [ "vacation", "Christmas" ]


1-I
-0N

}
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE

A query for “event = Christmas” is a hit


.Lo
nHd

(so is “holiday = vacation”)


ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !337


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Nested Types
Nested Data Type
• The nested type allows arrays of objects to be indexed and
queried independently of each other
‒ Use nested if you need to maintain the relationship of each
object in the array
• The mapping syntax looks like:

"mappings": {
"_doc": {

De
aHr
WB
"properties": {

N
Dom
sSt
"outer_object": {
Nu
-OC
TI
"type": "nested",
U19
2L0
Sr-O

"properties": {
ASp
1-I

"inner_field": "TYPE",
-0N
-
1rt9

...
p0o
rp-2
Spu

}
e-A
0is2

}
p-r
r
IeN
Hnt

},
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !339


distributing without written permission is strictly prohibited
The Photo Metadata Example
• Define tags as nested to maintain the relationship between
the key/value fields within tags:
PUT photos
{
"mappings": {
"_doc": {
"properties": {
"filename": {
"type": "keyword"
},
"tags": { Making an array field nested

De
maintains the relationship

aHr
"type": "nested",

WB
N
Dom
"properties": {
between its nested fields
sSt
Nu
"key": {
-OC
TI
"type": "keyword"
U19
2L0
Sr-O

},
ASp

"value": {
1-I
-0N
-

"type": "text"
1rt9
p0o
rp-2

}
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !340


distributing without written permission is strictly prohibited
Nested Objects are Stored Separately
• Internally, nested documents are stored as separate
Lucene documents that are joined at query time
‒ Joins always come at a performance cost, so use nested types
only when denormalization is not an option

“_id”: 1,
“filename”: “img1.jpg"
“tags”: [“12”, “74”]

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

// id=12 // id=74
p0o
rp-2
Spu

"key": "event", "key": "folder",


e-A
0is2
p-r

"value": “Christmas" "value": “December2017"


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !341


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Querying a
Nested Type
Querying a nested Type
• To query a nested data type, you have to use the “nested”
query:

GET photos/_search
{ Specify the “path” to
"query": { the nested object
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [

De
aHr
WB
{"match": {"tags.key": "event"}},

N
Dom
sSt
{"match": {"tags.value": "Christmas"}}
Nu
-OC
TI
U19
]
2L0
Sr-O

}
ASp
1-I
-0N

} "filename": "img1.jpg",
-
1rt9

} "tags": [
p0o
rp-2
Spu

} {"key": "event", "value": "Christmas"},


e-A
0is2

}
p-r

{"key": "folder", "value": "December2017"}


r
IeN
Hnt
.t CE

]
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !343


distributing without written permission is strictly prohibited
Finding the Cause of a Hit
• Suppose we search for “tags” that have a “value” named
“Christmas”
‒ both photos are a hit, but the response does not reveal
specifically which “key” has the value “Christmas”

GET photos/_search
{
"query": {

De
"nested": {

aHr
WB
N
Dom
"path": "tags",

sSt
Nu
"query": { TI
U19
-OC “img1.jpg”
"match": {
2L0
Sr-O

"tags.value": "Christmas"
ASp

“img2.jpg”
1-I
-0N

}
-
1rt9
p0o

}
rp-2
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !344


distributing without written permission is strictly prohibited
The inner_hits Query
• The inner_hits query reveals which “key” generated the hit
for the “Christmas”
‒ returned in a separate “inner_hits” clause in the response
‒ inner_hits is added to your nested query:

GET photos/_search
{
"query": {

De
"nested": { “img1.jpg”

aHr
WB
N
Dom
"path": "tags", “key”: “event”
sSt
Nu
"query": { -OC
TI
“value”: “Christmas”
U19

"match": {
2L0
Sr-O

"tags.value": "Christmas"
ASp
1-I
-0N

} “img2.jpg”
-
1rt9
p0o

},
“key”: “holiday”
rp-2
Spu
e-A

"inner_hits": {}
“value”: “Christmas”
0is2
p-r

}
r
IeN
Hnt

}
.t CE
.Lo
nHd

} You can add “inner_hits”


ar
Le

to a “nested” query

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !345


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The Nested
Aggregation
The nested Aggregation
• The nested bucket aggregation puts nested objects into a
bucket
‒ Useful for performing sub-aggregations
• The following simple example puts all tags into a bucket:

GET photos/_search

De
aHr
WB
{

N
Dom
sSt
"size": 0,
Nu
-OC
TI
"aggs": { "aggregations": {
U19
2L0
Sr-O

"my_tags": { "my_tags": {
ASp
1-I

"nested": { "doc_count": 4
-0N
-
1rt9

"path": "tags" }
p0o
rp-2
Spu

} }
e-A
0is2

}
p-r
r
IeN
Hnt

}
.t CE
.Lo

}
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !347


distributing without written permission is strictly prohibited
Example of a nested Aggregation
• Suppose we want to put all tags with the same value into
the same bucket:

GET photos/_search
{
"size": 0,
"aggs": { "buckets": [
"my_tags": { {
"nested": { "key": "event",
"path": "tags" "doc_count": 2
}, },

De
aHr
{

WB
"aggs": {

N
Dom
"key": "folder",

sSt
"tag_terms": {
Nu
-OC "doc_count": 1
TI
"terms": {
U19

},
2L0
Sr-O

"field": "tags.key",
ASp

"size": 10 {
1-I
-0N

"key": "holiday",
-
1rt9

}
p0o
rp-2

} "doc_count": 1
Spu
e-A

}
0is2

}
p-r

]
r
IeN

}
Hnt
.t CE
.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !348


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Parent/Child
Relationship
The Need for Parent/Child Relationships
• Updating a nested object requires:
‒ a complete reindexing of the root object, AND
‒ a complete reindexing of all its nested objects
• Using a join datatype, you can completely separate two
objects while maintaining their relationship
‒ the parent and children are completely separate documents

De
aHr
WB
N
Dom
‒ the parent can be updated without reindexing the children
sSt
Nu
-OC
TI
U19
2L0

‒ children can be added/changed/deleted without affecting the


Sr-O
ASp
1-I

parent or any other children


-0N
-
1rt9
p0o
rp-2
Spu
e-A

• Configured in your mappings using the join datatype


0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !350


distributing without written permission is strictly prohibited
Parent/Child and Shards
• Behind the scenes, the parent and all its children must
live on the same shard
‒ This makes the query-time join faster
• Remember how documents are routed using the hash
function?
‒ They will not work for a child document - it must get routed to the
same shard as its parent

De
aHr
WB
N
Dom
‒ The parent’s id is used as the routing value for the child
sSt
Nu
-OC
TI
U19

document
2L0
Sr-O
ASp
1-I
-0N

• Therefore, every time you refer to a child document, you


-
1rt9
p0o
rp-2
Spu

must specify its parent’s id


e-A
0is2
p-r
r
IeN
Hnt
.t CE

‒ We will see how this is implemented next…


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !351


distributing without written permission is strictly prohibited
Defining a Parent/Child Relationship
• Let’s go through the steps of defining a parent/child
relationship
1. Define the mapping
2. Index some parent documents
3. Index some child documents
4. Query the documents

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !352


distributing without written permission is strictly prohibited
1. Define the mapping
• The parent and child relationship is defined using a field of
type join
‒ The following “my_join_relation” field defines “company” as a
parent of “employee”:
PUT companies
{
"mappings": { “company” is a parent
"_doc" : {
"properties": { of “employee”

De
aHr
WB
"my_join_relation": {

N
Dom
sSt
"type": "join",
Nu
-OC
TI
"relations": {
U19
2L0
Sr-O

"company": "employee"
ASp
1-I

}
-0N
-
1rt9

},
p0o
rp-2
Spu

... A mapping can only define one


e-A
0is2

}
join field, but that join field can
p-r
r
IeN
Hnt

}
.t CE

define multiple relations


.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !353


distributing without written permission is strictly prohibited
2. Index some parent documents
• When indexing a parent, specify its relation
‒ In our join field, we used “company” as the parent relation name:

PUT companies/_doc/c1
{
"company_name" : "Stark Enterprises",
"my_join_relation": { name of the join field
"name": "company"
}

De
aHr
WB
} This document is a

N
Dom
sSt
Nu
PUT companies/_doc/c2
TI
-OC “company” document
U19
2L0
Sr-O

{
ASp
1-I
-0N

"company_name" : "NBC Universal",


-
1rt9

"my_join_relation": {
p0o
rp-2
Spu

"name": "company"
e-A
0is2

}
p-r
r
IeN
Hnt

}
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !354


distributing without written permission is strictly prohibited
3. Index some child documents
• Specify the relation name of the join field
• Use the “routing” parameter to ensure the child document
is indexed on the same shard as its parent

PUT companies/_doc/emp1?routing=c1
A child document has to be on
{ the same shard as its parent

De
aHr
"first_name" : "Tony",

WB
N
Dom
"last_name" : "Stark",
sSt
Nu
"my_join_relation": { -OC name of the join field
TI
U19
2L0

"name": "employee",
Sr-O
ASp

"parent": "c1"
1-I
-0N

}
-
1rt9

This document is an
p0o
rp-2

}
Spu
e-A

“employee” document
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

This employee works for


ar
Le

“Stark Enterprises”

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !355


distributing without written permission is strictly prohibited
4. Query the documents
• Notice the parent and child documents are in the same
index:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
GET companies/_search "failed": 0
},
"hits": {
"total": 3,

De
aHr
"max_score": 1,

WB
N
Dom
"hits": [
sSt
Nu
-OC ...
TI
U19
2L0

]
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu

• Typical parent/child queries involve has_child and


e-A
0is2
p-r

has_parent…
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !356


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The has_child Query
The has_child Query
• The has_child query is a filter that accepts a query and the
child type to run against
‒ It results in parent documents that have child docs matching the
query
‒ Parent and child documents are routed to the same shard, so
joining them when searching will be in-memory and efficient
• The syntax looks like:

De
aHr
WB
N
Dom
sSt
Nu
GET my_index/_search -OC
TI
U19
2L0
Sr-O

{
ASp

"query": {
1-I
-0N
-
1rt9

"has_child": {
p0o
rp-2

"type": "relation_name",
Spu
e-A
0is2

"query": {}
p-r
r
IeN

}
Hnt
.t CE

the child relation name


.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !358


distributing without written permission is strictly prohibited
Example of has_child

I want all
companies who have an
employee named
“Stark”
GET companies/_search "hits": [
{ {
"query": { "_index": "companies",
"has_child": { "_type": "_doc",
"type": "employee", "_id": "c1",

De
"_score": 1,

aHr
"query": {

WB
N
Dom
"_source": {
"match": {
sSt
Nu
"company_name": "Stark Enterprises",
"last_name": "Stark" -OC
TI
"my_join_relation": {
U19
2L0

}
Sr-O

"name": "company"
ASp

} }
1-I
-0N

}
-

}
1rt9
p0o
rp-2

} }
Spu
e-A

} ]
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !359


distributing without written permission is strictly prohibited
The inner_hits Query
• Notice in the previous query that “Stark Enterprises” has
an employee with the last name “Stark”, but we do not
know which employee caused the hit
‒ Use the inner_hits query to get the relevant children from the
has_child query:
...
"inner_hits": {
GET companies/_search "employee": {
{ "hits": {
"total": 1,
"query": { "max_score": 0.6931472,

De
"has_child": {

aHr
"hits": [

WB
N
Dom
{
"type": "employee",

sSt
"_type": "_doc",

Nu
"query": { TI
U19
-OC "_id": "emp1",
"_score": 0.6931472,
"match": {
2L0
Sr-O

"_routing": "c1",
"last_name": "Stark"
ASp

"_source": {
1-I
-0N

} "first_name": "Tony",
-
1rt9

"last_name": "Stark",
p0o

},
rp-2

"my_join_relation": {
Spu
e-A

"inner_hits" : {} "name": "employee",


0is2

"parent": "c1"
p-r

}
r
IeN

}
Hnt
.t CE

} }
.Lo
nHd

}
}
ar

]
Le

}
}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !360


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The has_parent Query
The has_parent Query
• The has_parent query accepts a parent and
‒ Returns child documents which associated parents have
matched
• The syntax looks like:

De
aHr
GET my_index/_search

WB
N
Dom
{
sSt
Nu
"query": { -OC
TI
U19
2L0

"has_parent": {
Sr-O
ASp

"parent_type": “relation_name”,
1-I
-0N
-

"query": {}
1rt9
p0o
rp-2

}
Spu
e-A
0is2

}
the parent relation name
p-r
r
IeN

}
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !362


distributing without written permission is strictly prohibited
Example of has_parent

I want all employees


that work for “NBC”

GET companies/_search
{
"query": { "hits": [
"has_parent": { {
"parent_type": "company", "_index": "companies",
"_type": "_doc",
"query": {

De
"_id": "emp3",

aHr
"match": {

WB
"_score": 1,

N
Dom
"company_name": "NBC"
sSt
"_routing": "c2",
Nu
} -OC "_source": {
TI
U19
2L0

} "first_name": "Tony",
Sr-O
ASp

} "last_name": "Potts",
1-I
-0N

} "my_join_relation": {
-
1rt9
p0o

"name": "employee",
rp-2

}
Spu

"parent": "c2"
e-A
0is2

}
p-r
r
IeN
Hnt

}
.t CE

}
.Lo
nHd
ar

]
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !363


distributing without written permission is strictly prohibited
Accessing a Child Document
• The Document APIs require the routing property when
working with child documents
‒ The APIs need to know the routing id of the parent to find the
child
GET companies/_doc/emp1 GET companies/_doc/emp1?routing=c1

{
{ "_index": "companies",
"_index": "companies",

De
"_type": "_doc",

aHr
WB
"_type": "_doc", "_id": "emp1",

N
Dom
sSt
"_id": "emp1", "_version": 1,
Nu
-OC
TI
"found": false "_routing": "c1",
U19
2L0

"found": true,
Sr-O

}
ASp

"_source": {
1-I
-0N

"first_name": "Tony",
-
1rt9
p0o

"last_name": "Stark",
rp-2
Spu
e-A

"my_join_relation": {
0is2

"name": "employee",
p-r
r
IeN
Hnt

"parent": "c1"
.t CE
.Lo

}
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !364


distributing without written permission is strictly prohibited
Updating Child Documents
• One of the key benefits of a parent/child relationship is the
ability to modify a child object independent of the parent
‒ For example, changing an employee has no effect on its parent
(or any of its siblings either)

POST companies/_doc/emp1/_update?routing=c1
{

De
aHr
WB
"doc" : {

N
Dom
sSt
"first_name" : "Anthony"
Nu
} -OC
TI
U19
2L0

}
Sr-O
ASp

Change “Tony” to “Anthony”


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !365


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Choosing a Technique
Choosing a Technique for Relationships
Denormalize
your data!
unless unless All
You docs related
contains arrays items do not fit
of objects… in a single
unless JSON
doc

De
aHr
WB
…and your Your

N
Dom
queries test more

sSt
docs are
Nu
than 1 property when -OC updated
TI
U19
matching objects in
Consider
2L0

frequently
Sr-O

these arrays
ASp

parent/child
1-I
-0N
-
1rt9

unless
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

Consider
nHd
ar
Le

nested docs

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !367


distributing without written permission is strictly prohibited
Kibana Considerations
• Kibana currently has very limited support for nested types
or parent/child relationships
‒ important to consider this limitation if you are using your indices
for dashboards and visualizations in Kibana
• Kibana may better support these relationships in the future
‒ but for now, you will have to decide which is more important:
using nested or parent/child relationships, or being able to

De
aHr
visualize your data in Kibana

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !368


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Denormalizing your data refers to “flattening” your data,
and typically provides the best performance in terms of how
your data is modeled
• The nested type allows arrays of objects to be indexed and
queried independently of each other
• Updating a nested object requires a complete reindexing of
the root object AND all other of its nested objects

De
aHr
WB
N
Dom
• Using a parent/child data type, you can completely
sSt
Nu
-OC
TI

separate two objects while maintaining their relationship


U19
2L0
Sr-O
ASp
1-I
-0N

• One of the key benefits of a parent/child relationship is the


-
1rt9
p0o
rp-2
Spu

ability to modify a child object independently of the parent


e-A
0is2
p-r
r
IeN
Hnt
.t CE

• When modeling your documents, prefer denormalization


.Lo
nHd
ar
Le

whenever possible

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !370


distributing without written permission is strictly prohibited
Quiz
1. True or False: Updating a nested inner object causes the
root object and all other nested objects to be reindexed.
2. True or False: Deleting a child object causes the parent
object and all other siblings to be reindexed.
3. Why not just use a parent/child relationship all the time (as
opposed to nested types) when dealing with relational
objects?

De
aHr
WB
N
Dom
4. True or False: Child objects must be routed to the same
sSt
Nu
-OC
TI

shard as its parent object.


U19
2L0
Sr-O
ASp
1-I
-0N

5. True or False: Deleting a parent object causes all child


-
1rt9
p0o
rp-2
Spu

objects to be deleted.
e-A
0is2
p-r
r
IeN
Hnt
.t CE

6. True or False: You can index a child object whose parent


.Lo
nHd
ar
Le

does not exist.

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !371


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 7
Document Modeling
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 8

De
Monitoring and
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

Alerting
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Monitoring Options
• The Stats APIs
• Task Monitoring
• The cat API
• Diagnosing Performance Issues

De
• The Elastic Monitoring Component

aHr
WB
N
Dom
sSt
Nu
-OC
• The Monitoring UI
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• Alerting
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !374


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Monitoring Options
Monitoring Options
• Elasticsearch has several APIs for monitoring:
‒ Node Stats: _nodes/stats
‒ Cluster Stats: _cluster/stats
‒ Index Stats: my_index/_stats
‒ Pending Cluster Tasks API: _cluster/pending_tasks

De
aHr
WB
N
Dom
sSt
• The above APIs return JSON objects (no surprise) TI
U19
Nu
-OC
2L0
Sr-O
ASp

‒ cat API: a human-readable alternative to the JSON APIs above


1-I
-0N
-
1rt9
p0o
rp-2
Spu

‒ similar results, just formatted differently


e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !376


distributing without written permission is strictly prohibited
Monitoring (the Component)
• While useful, the stats APIs are point-in-time
‒ That is where Monitoring comes in to play (not the verb
“monitoring”, but the Elastic component called “Monitoring”)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

Monitoring collects stats and


nHd
ar
Le

creates visualizations

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !377


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The Stats APIs
Cluster Stats
• The Cluster Stats API provides stats at the cluster level:

{
"_nodes": {
"total": 2,
"successful": 2,
"failed": 0
},
GET _cluster/stats "cluster_name": "my_cluster",
"timestamp": 1486701112018,
"status": "red",
"indices": {
"count": 26,

De
aHr
WB
"shards": {

N
Dom
"total": 162,

sSt
Nu
"primaries": 87,
-OC
TI
"replication": 0.8620689655172413,
U19
2L0

"index": {
Sr-O
ASp

"shards": {
1-I
-0N

"min": 2,
-
1rt9

"max": 10,
p0o
rp-2

"avg": 6.230769230769231
Spu
e-A

},
0is2

Only a very small subset of


p-r

"primaries": {
r
IeN
Hnt

the large amount of stats "min": 1,


.t CE
.Lo

"max": 6,
nHd
ar

"avg": 3.3461538461538463
Le

},
"replication": {

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !379


distributing without written permission is strictly prohibited
Node Stats
• The Node Stats API provides stats at the node level:

{
"_nodes": {
"total": 2,
GET _nodes/stats "successful": 2,
"failed": 0
},
"cluster_name": "my_cluster",
"nodes": {
"OmWJPhToQ0iyNfz-Qd9i8g": {
"timestamp": 1486701705225,
"name": "node1",

De
aHr
"transport_address": "192.168.1.6:9300",

WB
N
Dom
"host": "192.168.1.6",

sSt
Nu
"ip": "192.168.1.6:9300",
-OC
TI
"roles": [
U19
2L0

"master",
Sr-O
ASp

"data"
You can specify a list of
1-I
-0N

],
-

nodes also
1rt9

"attributes": {
p0o
rp-2

"temp": "hot",
Spu
e-A

"server_size": "small",
0is2
p-r

"zone": "zoneA"
r
IeN
Hnt

},
.t CE

GET _nodes/node1/stats ...


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !380


distributing without written permission is strictly prohibited
Indices Stats
• The Indices Stats API provides stats at the index level:

{
"_shards": {
"total": 10,
GET my_index/_stats "successful": 10,
"failed": 0
},
"_all": {
"primaries": {
"docs": {
"count": 2,
"deleted": 0

De
aHr
},

WB
N
Dom
"store": {

sSt
Nu
"size_in_bytes": 7399,
-OC
TI
"throttle_time_in_millis": 0
U19
2L0

},
Sr-O
ASp

"indexing": {
You can get the stats from all
1-I
-0N

"index_total": 0,
-

indices in one request:


1rt9

"index_time_in_millis": 0,
p0o
rp-2

"index_current": 0,
Spu
e-A

"index_failed": 0,
0is2
p-r

"delete_total": 0,
r
IeN
Hnt

"delete_time_in_millis": 0,
.t CE

GET _stats ...


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !381


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Task Monitoring
Pending Tasks
• The Pending Tasks API shows cluster-level changes that
have not been executed yet:

GET _cluster/pending_tasks

De
"tasks": [

aHr
WB
{

N
Dom
"insert_order": 101,

sSt
Nu
-OC
"priority": "URGENT",
TI
U19
"source": "create-index [my_index], cause [api]",
2L0
Sr-O

"time_in_queue_millis": 86,
ASp

"time_in_queue": "86ms"
1-I
-0N

}
-
1rt9
p0o

]
rp-2
Spu

}
e-A
0is2

The response is often empty because


p-r
r
IeN
Hnt

cluster-level changes are fast


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !383


distributing without written permission is strictly prohibited
Task Management API
• The Task Management API shows tasks currently
executing on the nodes
‒ provides a nice view of how busy the cluster is

{
GET _tasks "nodes": {
"OmWJPhToQ0iyNfz-Qd9i8g": {
"name": "node1",
"transport_address": "192.168.1.6:9300",
"host": "192.168.1.6",

De
"ip": "192.168.1.6:9300",

aHr
WB
"roles": [

N
Dom
"master",

sSt
Nu
-OC
"data"
TI
U19
],
2L0
Sr-O

"tasks": {
ASp

"OmWJPhToQ0iyNfz-Qd9i8g:37432": {
1-I
-0N

"node": "OmWJPhToQ0iyNfz-Qd9i8g",
-
1rt9
p0o

"id": 37432,
You can also use this
rp-2
Spu

"type": "direct",
e-A

API to cancel a task


0is2

"action": "cluster:monitor/tasks/lists[n]",
p-r
r
IeN

"start_time_in_millis": 1486702376488,
Hnt
.t CE

"running_time_in_nanos": 2157349,
.Lo
nHd

"cancellable": false,
ar

"parent_task_id": "OmWJPhToQ0iyNfz-Qd9i8g:37431"
Le

},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !384


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The cat API
The cat API
• You have seen the _cat API throughout the labs
‒ it is a wrapper around many of the Elasticsearch JSON APIs,
including the stats APIs discussed so far in this chapter
‒ command-line tool friendly
‒ helpful for simple monitoring (e.g. Nagios)

De
aHr
WB
N
Dom
sSt
Nu
-OC
192.168.1.6 32 98 11 1.83 md * node1
TI
U19

GET _cat/nodes
2L0

192.168.1.6 22 98 11 1.83 d - node2


Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A

Similar information The default columns are displayed, but


0is2
p-r
r
IeN

as node stats there is a lot more information available


Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !386


distributing without written permission is strictly prohibited
Specifying Columns
• Add the “h” parameter to specify which columns to return
‒ use “*” to retrieve all columns
‒ add the “v” parameter to display the column names in the
response

GET _cat/nodes?v&h=name,disk.avail,search.query_total,heap.percent

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o

name disk.avail search.query_total heap.percent


rp-2
Spu
e-A

node1 144.4gb 3800 43


0is2
p-r

node2 144.4gb 0 26
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !387


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Diagnosing
Performance Issues
Thread Pool Queues
• Many cluster tasks (bulk, index, get, search, etc.) use
thread pools to improve performance
‒ these thread pools are fronted by queues
‒ when a queue is full, 429 status code is returned

GET _nodes/thread_pool GET _nodes/stats/thread_pool

De
aHr
WB
...

N
Dom
... “write": {
sSt
Nu
"write": { -OC "threads": 8,
TI
U19
2L0

"type": "fixed", "queue": 0,


Sr-O
ASp

"min": 8, "active": 0,
1-I
-0N
-
1rt9

"max": 8, "rejected": 0,
p0o
rp-2

"queue_size": 200 "largest": 8,


Spu
e-A
0is2

} "completed": 177
p-r
r
IeN

... }
Hnt
.t CE

...
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !389


distributing without written permission is strictly prohibited
Thread Pool Queues
• The _cat API provides a nice view of the thread pools:
GET _cat/thread_pool?v

node_name name active queue rejected


node1 bulk 0 0 0
node1 get 0 0 0
node1 index 0 0 0
node1 management 1 0 0
node2 bulk 0 0 0
node2 get 0 0 0

De
aHr
WB
node2 index 0 0 0

N
Dom
sSt
node2 management 1 0 0
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• A full queue may be good or bad (“It depends!”)


-
1rt9
p0o
rp-2
Spu
e-A
0is2

‒ OK if bulk indexing is faster than ES can handle


p-r
r
IeN
Hnt
.t CE
.Lo
nHd

‒ Bad if search queue is full


ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !390


distributing without written permission is strictly prohibited
The hot_threads API
• The Nodes hot_threads API allows you get to view the
current hot threads on each node
‒ invoke it on all nodes or a specific node

GET _nodes/hot_threads ::: {node1}{QzHFKJscS-2kE-jtsiF0JQ}


{hNgYMs2MS1-9i9wFoYQpiw}{172.18.0.2}
{172.18.0.2:9300}

De
aHr
Hot threads at 2018-04-24T19:56:36.274Z,

WB
GET _nodes/node1/hot_threads

N
Dom
interval=500ms, busiestThreads=3,

sSt
ignoreIdleThreads=true:
Nu
-OC
TI
U19
2L0

0.0% (69.1micros out of 500ms) cpu usage by


Sr-O
ASp

thread 'elasticsearch[node1][[timer]]'
get hot threads just
1-I
-0N

10/10 snapshots sharing following 2 elements


-
1rt9

on this node java.lang.Thread.sleep(Native Method)


p0o
rp-2

org.elasticsearch.threadpool.ThreadPool$
Spu
e-A

CachedTimeThread.run(ThreadPool.java:541)
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !391


distributing without written permission is strictly prohibited
The Indexing Slow Log
• The Indexing Slow Log captures information about long-
running index operations into a log file
• Logs indexing events that take longer than configured
thresholds
‒ already configured in log4j2.properties
PUT my_index/_settings
{

De
aHr
"index.indexing.slowlog" : {

WB
N
Dom
"threshold.index" : {

sSt
Nu
"warn" : "10s", -OC
TI
U19

"info" : "5s",
2L0

Request index slow log level


Sr-O
ASp

"debug" : "2s",
1-I
-0N

"trace" : "0s"
-
1rt9
p0o
rp-2

},
Spu
e-A
0is2
p-r

"level" : “trace", Current index slow log level


r
IeN
Hnt
.t CE
.Lo
nHd

"source" : 1000
ar

The first 1,000 characters of the


Le

}
}
document’s source will be logged

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !392


distributing without written permission is strictly prohibited
The Search Slow Log
• The Search Slow Log captures information about long-
running searches (query and fetch phases) into a log file
‒ the log file is already configured in log4j2.properties, but
disabled by default
‒ useful, but can be limited since it logs per shard
‒ Packetbeat may be a better solution

De
aHr
WB
N
Dom
PUT my_index/_settings

sSt
Nu
{
-OC
TI
"index.search.slowlog": {
U19
2L0
Sr-O

"threshold": {
ASp

"query": { This example sets level


1-I
-0N
-

"info": "5s"
1rt9

to info and defines


p0o
rp-2

},
Spu

thresholds
e-A

"fetch": {
0is2
p-r

"info": "800ms"
r
IeN
Hnt

}
.t CE
.Lo
nHd

},
ar
Le

"level": "info"
}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !393


distributing without written permission is strictly prohibited
The Profile API
• Elasticsearch has a powerful Profile API which can be
used to inspect and analyze your search queries
‒ just set “profile” to true in your query:

GET crimes/_search
{
"size": 20,
"profile": true, Enable profiling for this search
"query": {
"bool": {

De
aHr
"filter": {"match": {"incident.description": "handgun"}}

WB
N
Dom
}

sSt
Nu
},
-OC
TI
U19
"aggs": {
2L0
Sr-O

"crimes_with_an_arrest": {
ASp
1-I

"filter": {"match": {"arrest_made": "false"}},


-0N
-
1rt9

"aggs": {
p0o
rp-2

"types_of_handgun_crimes": {
Spu
e-A

"terms": {"field": "incident.description.keyword"}


0is2
p-r

}
r
IeN
Hnt
.t CE

}
.Lo
nHd

}
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !394


distributing without written permission is strictly prohibited
The Profiler Response…
• …is hard to read (it is a lot of JSON):
"profile": {
"shards": [
{
"id": "[MOb8bhxMQeq7_AskF5hnDA][crimes][0]",
"searches": [
{
"query": [
{
"type": "TermQuery",
"description": "arrest_made:F",
"time_in_nanos": 867806,
"breakdown": {
"score": 0,
"build_scorer_count": 14,
"match_count": 0,

De
aHr
"create_weight": 6393,

WB
N
Dom
"next_doc": 0,

sSt
"match": 0,

Nu
"create_weight_count": 1, -OC
TI
U19
"next_doc_count": 0,
2L0
Sr-O

"score_count": 0,
ASp

"build_scorer": 100244,
1-I
-0N

"advance": 759117,
-
1rt9

"advance_count": 2037
p0o
rp-2

}
Spu
e-A

},
0is2

{
p-r
r
IeN

"type": "BoostQuery",
Hnt
.t CE

"description":
.Lo

"(ConstantScore(incident.description:handgun))^0.0",
nHd
ar

"time_in_nanos": 1875651,
Le

"breakdown": {
"score": 451486,
...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !395


distributing without written permission is strictly prohibited
The Search Profiler
• Search Profiler is a tool that transforms the JSON output
into a visualization that is easy to navigate:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

You can also copy-and-paste the output of


Le

a profiled query into this field

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !396


distributing without written permission is strictly prohibited
The Query Profile Tab
• The query times are shown per shard, along with the
generated Lucene query

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !397


distributing without written permission is strictly prohibited
The Aggregation Profile Tab
• Shows query times per shard for each aggregation that was
executed:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !398


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Component
The Elastic Monitoring
Elastic Monitoring
• Monitoring uses Elasticsearch to monitor Elasticsearch
‒ xpack.monitoring.collection.enabled defaults to false
‒ the stats of all the nodes are indexed into Elasticsearch

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !400


distributing without written permission is strictly prohibited
Configuring Monitoring
• Here are a few common Monitoring settings
‒ which can be configured in elasticsearch.yml:

The indices to collect data from.


xpack.monitoring.collection.indices
Defaults to all indices, but can be a
comma-separated list.

How often data samples are

De
xpack.monitoring.collection.interval

aHr
WB
collected. Defaults to 10s

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O

How long before indices created by


ASp
1-I
-0N

xpack.monitoring.history.duration Monitoring are automatically deleted.


-
1rt9
p0o
rp-2

Defaults to 7d
Spu
e-A
0is2
p-r
r
IeN
Hnt

• The complete list of Monitoring settings is at


.t CE
.Lo
nHd
ar
Le

‒ https://www.elastic.co/guide/en/elasticsearch/reference/current/monitoring-
settings.html
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !401
distributing without written permission is strictly prohibited
Dedicated Monitoring Cluster
• Recommend using a dedicated cluster for Monitoring
‒ reduce the load and storage on your other clusters
‒ access to Monitoring even when other clusters are unhealthy
‒ separate security levels from Monitoring and production clusters

monitoring_cluster

De
aHr
WB
N
Dom
sSt
Monitoring Monitoring
node1
Nu
agent agent -OC
TI
U19
2L0
Sr-O
ASp

master, data, ingest


1-I
-0N
-
1rt9
p0o
rp-2
Spu

dedicated 1-node
e-A
0is2

Elasticsearch cluster just


p-r
r
IeN

Monitoring Monitoring
Hnt
.t CE

agent agent for Monitoring


.Lo
nHd
ar
Le

Production cluster

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !402


distributing without written permission is strictly prohibited
Configuring Dedicated Monitoring Cluster
• If Monitoring is on a different cluster than the one being
monitored, you need to tell the monitored cluster where to
send its stats:
‒ if Elastic Security is enabled on the Monitoring cluster, then
provide credentials
‒ you can also use SSL/TLS (see docs for details:
https://www.elastic.co/guide/en/elasticsearch/reference/current/
monitoring-settings.html#http-exporter-settings)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
configure in elasticsearch.yml
U19
2L0
Sr-O

of each node
ASp
1-I
-0N
-
1rt9

xpack.monitoring.exporters:
p0o
rp-2

id1:
Spu
e-A
0is2

type: http
p-r
r
IeN

host: ["http://monitoring_cluster:9200"]
Hnt
.t CE
.Lo

auth.username: username
nHd
ar

auth.password: changeme
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !403


distributing without written permission is strictly prohibited
Monitoring Multiple Clusters
• Multiple clusters can be monitored in a single monitoring
cluster (Gold/Platinum)

prod_cluster_1
stats are sent every
10 seconds

converted to JSON
and indexed
prod_cluster_2

De
aHr
WB
N
Dom
monitoring_cluster

sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

qa_cluster_1
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !404


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
The Monitoring UI
The Monitoring UI
• Open up Kibana
• Click on the Monitoring shortcut in the left toolbar:

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

Click here
-
1rt9
p0o
rp-2

Turn on monitoring
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !406


distributing without written permission is strictly prohibited
Clusters Dashboard
• The initial Monitoring page is the “Clusters” dashboard
‒ shows all the clusters being monitored
‒ notice it is monitoring your Kibana instances as well

Note the free version of Monitoring


can only monitor a single cluster

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !407


distributing without written permission is strictly prohibited
Nodes Dashboard
breadcrumbs for polling interval (and a Time interval of what you
easy navigation handy “pause” button) are currently viewing

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE

master node
.Lo
nHd

has the “star”


ar
Le

stats update
in real-time

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !408


distributing without written permission is strictly prohibited
Individual Node Dashboard

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo

the actual page has more


nHd
ar
Le

details

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !409


distributing without written permission is strictly prohibited
Indices Dashboard

quick way to find


unhealthy indices

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !410


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Alerting
Configure Alerts
• You should use some type of monitoring tool to fire alerts
for unexpected or extreme situations
‒ send an alert, email, page, etc.
‒ Nagios is a popular tool
‒ third-party tools available
• Or you can use Elastic Alerting

De
aHr
WB
N
Dom
‒ https://www.elastic.co/guide/en/elastic-stack-overview/current/
sSt
Nu
-OC
xpack-alerting.html
TI
U19
2L0
Sr-O
ASp

‒ part of the Gold Subscription


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !412


distributing without written permission is strictly prohibited
Enabling Elastic Alerting

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !413


distributing without written permission is strictly prohibited
Elastic Alerting
• A set of administrative features that enable you to:
‒ watch for changes or anomalies in your data and
‒ perform the necessary actions in response
• For example, you might want to:
‒ open a helpdesk ticket when any servers are running out of free
space

De
aHr
WB
‒ track network activity to detect malicious activity
N
Dom
sSt
Nu
-OC
TI
U19

‒ send immediate notification if nodes leave the cluster


2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !414


distributing without written permission is strictly prohibited
How Watches Work
• A watch is constructed from five simple building blocks:
• Trigger
‒ determines when the watch is executed
• Input
‒ loads data into the watch payload
• Condition

De
aHr
WB
N
Dom
sSt
Nu
‒ controls whether the watch actions are executed -OC
TI
U19
2L0
Sr-O
ASp

• Transform
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A

‒ processes the watch payload to prepare it for the watch actions


0is2
p-r
r
IeN
Hnt
.t CE

• Actions
.Lo
nHd
ar
Le

‒ one or more actions to be executed if condition is true


Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !415
distributing without written permission is strictly prohibited
How Watches Work
PUT _xpack/watcher/watch/log_error_watch
{
"trigger": { "schedule": { "interval": "5m" }}, runs every 5 minutes
"input": {
"search": {
"request": {
"indices": [ "logs*" ],
"body": {
"query": { "bool": { "filter": [ { "range":
{ @timestamp":
{ "gte": "{{ctx.trigger.scheduled_time}}||-5m" }
}
}, search for the term error in
{ "match": { "message": "error" } }
] } } all indices that start with logs

De
}

aHr
WB
}

N
Dom
sSt
}
Nu
-OC
TI
},
U19
2L0

"condition": {
Sr-O
ASp

"compare": { "ctx.payload.hits.total": { "gt": 0 }} any documents returned?


1-I
-0N

},
-
1rt9
p0o

"actions": {
rp-2
Spu
e-A

"log_error" : {
0is2
p-r

"logging" : {
r
IeN

if true, log this message


Hnt

"text" : "Found {{ctx.payload.hits.total}} errors."


.t CE
.Lo

}
nHd
ar

}
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !416


distributing without written permission is strictly prohibited
How Watches Work
• Watches are stored in Elasticsearch
GET .watches/_doc/log_error_watch

GET .watches/_search
Alerting creates some watches
that you can view

• Also, every watch execution is stored in Elasticsearch

De
‒ You can check the watch history to see execution details:

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0

GET .watcher-history*/_search
Sr-O
ASp

{
1-I
-0N
-
1rt9

"sort" : [
p0o
rp-2

{ "result.execution_time" : "desc" }
Spu
e-A
0is2

]
p-r
r
IeN

} retrieves the last ten watch


Hnt
.t CE
.Lo

executions (watch records)


nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !417


distributing without written permission is strictly prohibited
Watcher UI
• Enables you to monitor, manage, create and simulate
watches
• Available when Alerting is enabled
• If Elastic Security is enabled make sure to create users with
Watcher specific roles:
‒ watcher_admin can perform all watcher-related actions

De
aHr
‒ watcher_user can view all existing watches

WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !418


distributing without written permission is strictly prohibited
Watcher UI

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !419


distributing without written permission is strictly prohibited
Watcher UI delete watches

watch execution

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r

watcher state:
r
IeN
Hnt
.t CE

firing, error,
.Lo
nHd
ar

ok, disabled action execution


Le

list of existing watches


Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !420
distributing without written permission is strictly prohibited
Creating a Threshold Alert

1. create new
threshold alert

2. define watch name,


schedule and indices

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt

3. define condition
.t CE

4. check hits
.Lo
nHd

(elasticsearch aggs)
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !421


distributing without written permission is strictly prohibited
Creating a Threshold Alert

5. define action

6. test action

De
aHr
WB
N
Dom
7. save watch
sSt
Nu
-OC
TI
U19
2L0

8. monitor your watches


Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt

This an example watch


.t CE
.Lo

running with high frequency


nHd
ar

(10s). Remember to delete it.


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !422


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• Elasticsearch has Stats APIs for retrieving statistics about
your cluster, nodes, indices and pending tasks
• The cat API provides a human-readable wrapper around
the Stats API (and other Elasticsearch APIs)
• Slow logs, thread pools, and hot threads can help you
diagnose performance issues.
• The Elastic Monitoring component uses Elasticsearch to

De
aHr
WB
N
Dom
monitor Elasticsearch
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• Best practice is to use a dedicated cluster for Monitoring


ASp
1-I
-0N
-
1rt9
p0o
rp-2

• Elastic Alerting (Gold license) allows you to create alerts


Spu
e-A
0is2

based on different criteria


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

• Watchers can be configured using JSON or the Watcher UI


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !424


distributing without written permission is strictly prohibited
Quiz
1. True or False: The Elastic Monitoring can monitor
multiple clusters.
2. What are the benefits of using a dedicated cluster for the
Monitoring component?
3. The default Monitoring collection interval is ____ seconds.
4. How would you check to see if the queue for the index
thread pool was full?

De
aHr
WB
N
Dom
sSt
5. Name three of the five watch building blocks. Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

6. True or False: Any user can create alerts using the


-0N
-
1rt9
p0o
rp-2

Watcher UI.
Spu
e-A
0is2
p-r
r
IeN
Hnt

7. True or False: You can use Elastic Alerting to send an


.t CE
.Lo
nHd
ar

email if a node is running out of disk space?


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !425


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 8
Monitoring and Alerting
1 Elasticsearch Internals

2 Field Modeling

3 Fixing Data

4 Advanced Search & Aggregations

5 Cluster Management

6 Capacity Planning
Chapter 9

De
From Dev to
aHr
WB
7 Document Modeling

N
Dom
sSt
Nu
-OC

Production
TI
U19

8 Monitoring and Alerting


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

9 From Dev to Production


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le
Topics covered:
• Disabling Dynamic Indexes
• Development vs. Production Mode
• Best Practices
• JVM Settings
• Common Causes of Poor Query Performance

De
• Cross Cluster Search

aHr
WB
N
Dom
sSt
Nu
-OC
• Overview of Upgrades
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• Cluster Restart
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !428


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Indexes
Disabling Dynamic
Dynamic Indexes
• An index will dynamically be created during a document
index request if the index does not already exist:

a new logs index will be


created if it is not defined yet

De
aHr
PUT logs/log/1

WB
N
Dom
{

sSt
Nu
"level" : "ERROR", -OC
TI
U19

"message" : "Unable to reach host"


2L0
Sr-O
ASp

}
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !430


distributing without written permission is strictly prohibited
Disabling Dynamic Indexes
• You can disable this dynamic behavior completely using the
dynamic action.auto_create_index setting:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index" : false
}
}

De
aHr
WB
N
Dom
sSt
• Or, you can whitelist certain patterns: TI
U19
Nu
-OC
2L0
Sr-O
ASp

PUT _cluster/settings
1-I
-0N
-
1rt9

{
p0o
rp-2

"persistent": {
Spu
e-A
0is2

"action.auto_create_index" : ".monitoring-es*,logstash-*"
p-r
r
IeN

}
Hnt
.t CE
.Lo

}
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !431


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Development vs.
Production Mode
HTTP vs. Transport
• There are two important network communication
mechanisms in Elasticsearch to understand:
‒ HTTP: address and port to bind to for HTTP communication,
which is how the Elasticsearch REST APIs are exposed
‒ transport: used for internal communication between nodes
within the cluster
• The defaults are fine for downloading and playing with

De
aHr
Elasticsearch, but not useful for production systems

WB
N
Dom
sSt
Nu
-OC
TI
U19

‒ bind to localhost by default


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

binds to first available


p0o
rp-2
Spu

node port in the range


e-A
0is2

9200-9299
p-r
r
IeN

HTTP
Hnt
.t CE
.Lo
nHd
ar

transport binds to first available


Le

port in the range


9300-9399
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !433
distributing without written permission is strictly prohibited
Development vs. Production Mode
• Every Elasticsearch instance 5.x or later is either in
development mode or production mode:
‒ development mode: if it does not bind transport to an external
interface (the default)
‒ production mode: if it does bind transport to an external
interface
“I am just a local “I am running in a
cluster used for production

De
aHr
WB
development.” environment.”

N
Dom
sSt
Nu
-OC
TI
U19
2L0

my_dev_cluster my_production_cluster
Sr-O
ASp
1-I
-0N
-
1rt9

http.port: 9200 http.port: 9200


p0o
rp-2

http.host: localhost http.host: 192.168.1.21


Spu
e-A
0is2
p-r
r
IeN

transport.tcp.port: 9300 transport.tcp.port: 9300


Hnt
.t CE
.Lo

transport.bind_host: localhost transport.bind_host: 192.168.1.21


nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !434


distributing without written permission is strictly prohibited
Bootstrap Checks
• Elasticsearch has bootstrap checks upon startup:
‒ inspect a variety of Elasticsearch and system settings
‒ compare them to values that are safe for the operation of
Elasticsearch
• Bootstrap checks behave differently depending on the
mode:

De
‒ development mode: any bootstrap checks that fail appear as

aHr
WB
N
Dom
warnings in the Elasticsearch log
sSt
Nu
-OC
TI
U19
2L0
Sr-O

‒ production mode: any bootstrap checks that fail will cause


ASp
1-I
-0N

Elasticsearch to refuse to start


-
1rt9
p0o
rp-2
Spu
e-A

‒ https://www.elastic.co/guide/en/elasticsearch/reference/master/
0is2
p-r
r
IeN
Hnt

bootstrap-checks.html
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !435


distributing without written permission is strictly prohibited
Bootstrap Checks
• A node in production mode must pass all of the checks, or
the node will not start
‒ the bootstrap checks fit into two categories

JVM Checks Linux Checks


heap size maximum map count

De
aHr
maximum size virtual memory

WB
disable swapping

N
Dom
sSt
maximum number of threads
Nu
-OC
not use serial collector
TI
U19
2L0

file descriptor
Sr-O
ASp

OnError and OnOutOfMemoryError


1-I
-0N

system call filter


-
1rt9
p0o
rp-2

server JVM
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !436


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Best Practices
Networking Best Practices
• Avoid running over WAN links between datacenters
‒ not officially supported by Elastic
• Try to have zero (or very few) hops between nodes
• If you have multiple network cards, separate transport and
http traffic
‒ bind to different network interfaces

De
aHr
WB
N
Dom
‒ use separate firewall rules for each kind of traffic
sSt
Nu
-OC
TI
U19
2L0

• Use long-lived HTTP connections


Sr-O
ASp
1-I
-0N
-
1rt9
p0o

‒ client libraries support this


rp-2
Spu
e-A
0is2
p-r
r
IeN

‒ or use a proxy/load-balancer
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !438


distributing without written permission is strictly prohibited
Storage Best Practices
• Prefer solid state disks (SSDs)
‒ segments are immutable, so the write amplification factor
approaches one and is a non-issue
• Local disk is king!
‒ in other words, avoid NFS or SMB, AWS EFS, Azure filesystem
• Elasticsearch does not need redundant storage

De
aHr
WB
N
Dom
‒ replicas/software provide HA
sSt
Nu
-OC
TI
U19
2L0

‒ local disks are better than SAN


Sr-O
ASp
1-I
-0N
-
1rt9

‒ RAID1/5/10 is not necessary


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !439


distributing without written permission is strictly prohibited
Storage Best Practices
• If you have multiple disks in the same server, you can set
RAID 0 or path.data
• RAID 0
‒ splits ("stripes") data evenly across two or more disks
‒ perfect distribution of the data across the disks
‒ if you lose one disk, you lose the data on all disks

De
aHr
WB
N
Dom
• path.data
sSt
Nu
-OC
TI
U19
2L0

‒ allows you to distribute your index across multiple SSDs


Sr-O
ASp
1-I
-0N
-
1rt9

‒ potential to an unbalanced distribution (all files belonging to a


p0o
rp-2
Spu
e-A

shard will be stored on the same data path)


0is2
p-r
r
IeN
Hnt
.t CE
.Lo

‒ if you lose one disk, the data on the other disks are preserved
nHd
ar
Le

may generate node level watermark issues if disks have different sizes

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !440


distributing without written permission is strictly prohibited
Storage Best Practices
• Use noop or deadline scheduler in the OS when using SSD
‒ For details, see:
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_disks

echo noop > /sys/block/{DEVICE}/queue/scheduler

• Spinning disks are OK for warm nodes

De
aHr
WB
‒ but, disable concurrent merges
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

index.merge.scheduler.max_thread_count: 1
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r

• Trim your SSD’s:


r
IeN
Hnt
.t CE
.Lo
nHd
ar

‒ https://www.elastic.co/blog/is-your-elasticsearch-trimmed
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !441


distributing without written permission is strictly prohibited
Hardware Selection
• In general, choose medium machines over large
machines
‒ loss of a large node has a greater impact
‒ prefer six 4cpu x 64gb x 4 1tb drives
‒ avoid 2 12cpu x 256gb x 12 1tb drives
• Avoid running multiple nodes on one server

De
aHr
WB
N
Dom
‒ one Elasticsearch instance can fully consume a machine
sSt
Nu
-OC
TI
U19
2L0

• Larger machines can be helpful as warm nodes


Sr-O
ASp
1-I
-0N
-
1rt9

‒ configure shard allocation filtering as previously discussed


p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !442


distributing without written permission is strictly prohibited
Cloud Strategies
• On Cloud, use the discovery plugin
‒ because IPs can change frequently, the plugin dynamically
configure the unicast hosts
• Span the cluster across more than one AZ
• Prefer ephemeral storage over network storage
• Snapshot to cloud storage with the repository plugins

De
aHr
WB
N
Dom
• Use shard awareness and forced awareness
sSt
Nu
-OC
TI
U19
2L0
Sr-O

• Avoid instances marked with low networking performance


ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !443


distributing without written permission is strictly prohibited
Throttles
• Elasticsearch has relocation and recovery throttles to
ensure these tasks do not have a negative impact
• Recovery:
‒ for faster recovery, temporarily increase the number of
concurrent recoveries:
PUT _cluster/settings
{
"transient": {

De
aHr
WB
"cluster.routing.allocation.node_concurrent_recoveries": 2

N
Dom
sSt
}
Nu
} -OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

• Relocation:
-
1rt9
p0o
rp-2
Spu
e-A
0is2

‒ for faster rebalancing of shards, increase:


p-r
r
IeN
Hnt
.t CE
.Lo
nHd

"cluster.routing.allocation.cluster_concurrent_rebalance" : 2
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !444


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
JVM Settings
JVM Configuration
• Since Elasticsearch 6.0, only 64-bit JVMs are supported
• You can configure the Java Virtual Machine (JVM) two
ways:
‒ the config/jvm.options file (preferred)

-Xms30g
-Xmx30g

De
aHr
WB
N
Dom
‒ setting the ES_JAVA_OPTS environment variable
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp

ES_JAVA_OPTS="-Xms30g -Xmx30g" bin/elasticsearch


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r

• Elasticsearch has very good JVM defaults


r
IeN
Hnt
.t CE
.Lo
nHd
ar

‒ avoid and be careful when changing them


Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !446


distributing without written permission is strictly prohibited
What Goes on the Heap?
• Some of the major usage of the heap by Elasticsearch
includes:

stores newly-indexed docs


remembers if a doc
matches a filter
indexing buffer caches

node query cache (10%)

De
completion suggester

aHr
WB
N
Dom
sSt
Nu
-OC
TI
cluster state shard query cache (1%)
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9

fielddata caches results of


p0o
rp-2

…and more (unbounded)


Spu

a query
e-A
0is2
p-r
r
IeN
Hnt
.t CE

JVM Heap
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !447


distributing without written permission is strictly prohibited
JVM Heap Size
• By default, the JVM heap size is 1 GB
‒ likely not high enough for production
‒ you can change it using Xms (min heap) and Xmx (max heap)
ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch

• Some guidelines for configuring the heap size:


‒ set Xms and Xmx to the same size (bootstrap check)

De
aHr
WB
N
Dom
sSt
‒ set Xmx to no more than 50% of your physical RAM TI
U19
Nu
-OC
2L0
Sr-O
ASp

• Rule of thumb for setting the JVM heap is:


1-I
-0N
-
1rt9
p0o
rp-2
Spu

‒ do not exceed more than 30GB of memory (to not exceed the
e-A
0is2
p-r

compressed ordinary object pointers limit)


r
IeN
Hnt
.t CE
.Lo
nHd
ar

‒ https://www.elastic.co/blog/a-heap-of-trouble
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !448


distributing without written permission is strictly prohibited
Production JVM Settings
1. JDKs have two modes of a JVM: client and server
• server JVM is required in production mode
2. Configure the JVM to disable swapping
• by requesting the JVM to lock the heap in memory through
mlockall (Unix) or virtual lock (Windows)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !449


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Common Causes of
Poor Query Performance
Common Causes of Poor Query Performance
• Let’s take a look at some common mistakes developers
make that can have a negative effect on query performance
‒ and discuss better ways to write your search queries!

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !451


distributing without written permission is strictly prohibited
How could we improve this query?
• Suppose we want to search for blogs that mention “elk”
between 2016-2018
‒ easy to find with a couple of must queries:
GET blogs/_search
{
"query": {
"bool": {
"must": [
{"match": {"content": "elk"}},
{

De
aHr
"range": {

WB
N
Dom
"publish_date": {

sSt
Nu
"gte": 2016, TI
U19
-OC
"lte": 2018
2L0
Sr-O

}
ASp
1-I
-0N

}
-
1rt9

}
p0o
rp-2

]
Spu
e-A

}}}
0is2
p-r
r
IeN
Hnt
.t CE

• Any thoughts on why this might not be the most efficient


.Lo
nHd
ar

query?
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !452


distributing without written permission is strictly prohibited
Issue: Not Using Filters
• We could improve the performance of this query
‒ a range query could take advantage of filter caching if it
appeared in a filter clause (remember bit sets?)
GET blogs/_search
{
"query": {
"bool": {
"must": {
"match": {
"content": "elk"

De
aHr
}

WB
N
Dom
},

sSt
Nu
"filter": { -OC
TI
U19

"range": {
2L0
Sr-O
ASp

"publish_date": {
1-I
-0N

"gte": 2016,
-
1rt9
p0o
rp-2

"lte": 2018
Spu
e-A

} If the range query gets cached,


0is2
p-r

} it can execute faster on


r
IeN
Hnt
.t CE

}
subsequent searches
.Lo
nHd

}
ar
Le

}
}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !453


distributing without written permission is strictly prohibited
Issue: Aggregating Too Many Docs
• Aggregations are powerful, but they can consume a great
deal of memory
• Whenever possible, limit the number of docs that an agg
is computed over by using a query or filter
‒ Let's look at a few examples of how to do this…

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !454


distributing without written permission is strictly prohibited
Solution: Limit the Scope with a Query
• Limit the scope of the aggregation by adding a query:
GET logs*/_search
{
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"runtime_ms": { Adding a query block
"lt": 200 limits the scope of an agg
}
}
}

De
aHr
}

WB
N
Dom
},

sSt
Nu
"aggs": {
-OC
TI
"my_aggs": {
U19
2L0
Sr-O

"range": {
ASp

"field": "runtime_ms",
1-I
-0N
-

"ranges": [
1rt9
p0o
rp-2

{
Spu
e-A

"from": 0,
0is2
p-r

"to": 100
r
IeN
Hnt

},
.t CE
.Lo
nHd

{
ar
Le

"from": 100,
"to": 200
}]}}}}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !455


distributing without written permission is strictly prohibited
Solution: Use a Filter Bucket
• The filter bucket aggregation is useful when the query
scope is different than the desired aggregation scope:
GET logs*/_search
{
"size": 20, The query is searching for
"query": { requests to blogs that contain
"match": {
"language.url": "time-based indices" time, based, or indices
}
},
"aggs": {
"not_found_requests": {
The filter bucket selects

De
"filter": {

aHr
WB
requests that returned a 404

N
Dom
"match": {

sSt
Nu
"status_code": 404
-OC
TI
}
U19
2L0
Sr-O

},
ASp

"aggs": {
1-I
-0N
-

"top_countries": {
1rt9
p0o
rp-2

"terms": {
Spu
e-A

"field": "geoip.country_name.keyword"
0is2
p-r

}
r
IeN
Hnt

} The agg is over requests to


.t CE
.Lo

}
nHd

blogs that contain time, based,


ar

}
Le

} or indices that returned 404


}

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !456


distributing without written permission is strictly prohibited
Solution: The Sampler Aggregation
• Another way to limit the scope of an agg is to use the
sampler bucket aggregation
‒ it filters out a sample of the top-scoring hits
‒ can improve analytics by filtering out the long tail of low-quality
matches
• The sampler aggregation is a great solution for scenarios
where you do not want to analyze the entire dataset

De
aHr
WB
N
Dom
‒ and limiting the scope with a query or filter is not a viable option
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !457


distributing without written permission is strictly prohibited
Solution: The Sampler Aggregation
GET logs*/_search
{ Requests to blogs that contain
"size": 0,
"query": { time, based, or indices
"bool": {
"filter": {"match": {"language.url": "time-based indices"}}
}
},
"aggs": {
"my_sample": {
"sampler": {
Sample 100 of the top hits
"shard_size": 100 from each shard
},
"aggs": {

De
"top_countries": {

aHr
WB
"terms": {

N
Dom
sSt
"field": "geoip.country_name.keyword"
Nu
-OC
TI
}
U19
2L0

}
Sr-O
ASp

} 1500 out of 31063


1-I
-0N

}
-
1rt9

"aggregations": {
p0o

}
rp-2

"my_sample": {
Spu
e-A

} "doc_count": 1500,
0is2

Aggregation executed on top of the best


p-r

"top_countries": {
r
IeN
Hnt

results (likely an AND instead of an OR)


.t CE

"doc_count_error_upper_bound": 4,
.Lo
nHd

"sum_other_doc_count": 251,
ar
Le

"buckets": [
...

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !458


distributing without written permission is strictly prohibited
Issue: Confusing Elasticsearch w/ RDBMS
• Elasticsearch is not a relational database
‒ So don’t try to make it look like one!
• Trying to design indexes that look like relational tables in a
database is not a good idea
‒ and will only result in poor performance issues

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A

table1 index1
0is2
p-r

table2 index2
r
IeN
Hnt
.t CE

table3 index3
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !459


distributing without written permission is strictly prohibited
Solution: Denormalize Your Data
• It is very important that you denormalize your data
‒ do not use nested types when it is not required
‒ avoid using parent/child relationships (denormalize instead!)

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N

table1
-
1rt9
p0o
rp-2

table2 index1
Spu
e-A

table3
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !460


distributing without written permission is strictly prohibited
Issue: Too Many Shards
• Remember: a query has to hit every shard
‒ More shards == slower query performance
• The default of 5 shards can actually be too high for some
scenarios
‒ Having thousands of 100MB indices with 5 shards each is not
good

De
aHr
WB
N
Dom
sSt
Nu
-OC
• The solution?
TI
U19
2L0
Sr-O
ASp
1-I
-0N

‒ Shards can be fairly large (up to 40GB is good)…


-
1rt9
p0o
rp-2
Spu
e-A

‒ …so create as few of them as needed


0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !461


distributing without written permission is strictly prohibited
Issue: Unnecessary Scripting
• In particular, using scripts at query time when index time
might be a better option
‒ For example, the length of the title field has to be calculated on
each document each time this query is executed:
GET blogs/_search
{
"query": {
"bool": {
"must": {
"match": {"title": "network"}

De
aHr
WB
},

N
Dom
sSt
"filter": [
Nu
{ -OC
TI
U19
2L0

"script": {
Sr-O
ASp

"script": {
1-I
-0N

"source": "doc['title.keyword'].value.length() > 50"


-
1rt9
p0o

}
rp-2
Spu

}
e-A
0is2

}
p-r
r
IeN

]
Hnt
.t CE

}
.Lo
nHd
ar

}
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !462


distributing without written permission is strictly prohibited
Solution: Index Common Computations
• Instead of using a search-time script, compute the value
once and index it
‒ The length of the title field can now be easily and quickly
searched on

PUT _ingest/pipeline/comment_length
{
"processors" : [

De
aHr
{

WB
N
Dom
"script": {

sSt
Nu
"lang": "painless",
-OC
TI
"source": "ctx.title_length = ctx.title.length();"
U19
2L0
Sr-O

}
ASp

}
1-I
-0N
-

]
1rt9
p0o
rp-2

}
Spu
e-A
0is2
p-r
r

Compute the length once


IeN
Hnt
.t CE

at index time (by perhaps


.Lo
nHd
ar

using a pipeline)
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !463


distributing without written permission is strictly prohibited
Issue: Expensive Regular Expressions
• You have to be careful with searches that use regular
expressions
‒ These two regexp queries seem quite similar, but one is much
more expensive. (Which one?)

GET blogs/_search GET blogs/_search


{ {

De
aHr
"query": { "query": {

WB
N
Dom
"regexp": { "regexp": {
sSt
Nu
"title": "net.*" -OC "title": ".*work"
TI
U19
2L0

} }
Sr-O
ASp

} }
1-I
-0N

} }
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !464


distributing without written permission is strictly prohibited
Solution: Use regexp with Caution
• Avoid leading wildcards
‒ Perhaps index your data in such a way as to avoid the need for
leading wildcards
• And understand that, in general, regular expressions can be
expensive

De
aHr
GET blogs/_search

WB
N
Dom
{

sSt
Nu
"query": { -OC
TI
U19

"regexp": { A regex that has


2L0
Sr-O
ASp

"title": ".*work" lookarounds or short


1-I
-0N

} prefixes can be expensive


-
1rt9
p0o
rp-2

}
Spu
e-A

}
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !465


distributing without written permission is strictly prohibited
Solution: Clever Indexing
• If you really need to search the end of your tokens, simply
index them using the reverse token filter
‒ and search them using a prefix regular expression:

GET blogs/_search
{
"query": {
"regexp": {

De
aHr
WB
"title.reversed": "krow.*"

N
Dom
sSt
}
Nu
} -OC
TI
U19
2L0

} "Brewing in Beats: New community


Sr-O
ASp
1-I
-0N

Beat for network devices"


-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !466


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Cross Cluster Search
What is Cross Cluster Search?
• Cross Cluster Search is a feature that allows any node to
act as a client for executing queries across multiple clusters
‒ introduced in Elasticsearch 5.3, it replaces the tribe node
functionality

Client Tribe node Client Now any node


provided search can search
across clusters across clusters

De
aHr
cluster_1

WB
N
Dom
sSt
Tribe node
Nu
-OC node1
TI
U19
2L0
Sr-O

node2
cluster_3
ASp
1-I
-0N

node3
-
1rt9

cluster_1
p0o
rp-2

node1
Spu
e-A
0is2

cluster_2 node2
p-r
r
IeN

cluster_2 node3
Hnt
.t CE
.Lo

node4
nHd

node1
ar

node5
Le

node2
cluster_3
node3

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !468


distributing without written permission is strictly prohibited
Registering Remote Clusters
• Remote clusters are configured in the cluster settings
‒ using the cluster.remote property
‒ seeds is a list of nodes in the remote cluster used to retrieve the
cluster state when registering the remote cluster

PUT _cluster/settings
A name you assign to the
remote cluster

De
aHr
{

WB
N
Dom
"persistent": {

sSt
Nu
"cluster.remote" : { -OC
TI
U19

"germany_cluster" : {
2L0
Sr-O
ASp

"seeds" : ["my_server:9300","64.33.90.170:9300"]
1-I
-0N

}
-
1rt9
p0o
rp-2

}
Spu
e-A

}
0is2
p-r

}
r
IeN
Hnt

List of seed nodes


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !469


distributing without written permission is strictly prohibited
Searching Remotely
• To search an index on a remote cluster, prefix the index
name with the remote cluster name:

cluster name index name I am looking for titles


about “network” on
germany_cluster.
GET germany_cluster:blogs/_search

De
aHr
{

WB
N
Dom
"query": {
sSt
Nu
"match": { -OC
TI
U19
2L0

"title": "network"
Sr-O
ASp

}
1-I
-0N
-
1rt9

}
p0o
rp-2

}
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !470


distributing without written permission is strictly prohibited
Cross Cluster Searching
• To perform a search across multiple clusters, simply list the
cluster names and indices in the _search:

local blogs index remote blogs index

GET blogs,germany_cluster:blogs/_search
{

De
aHr
WB
"query": {

N
Dom
sSt
"match": {
Nu
"title": "network" -OC
TI
U19
2L0

}
Sr-O
ASp

}
1-I
-0N
-
1rt9

}
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !471


distributing without written permission is strictly prohibited
How It Works
• From a search execution perspective, there is no difference
between local indices and remote indices
‒ as long as the coordinating node can reach some nodes
belonging to the remote clusters
‒ the coordinating node resolves the shards of remote indices by
sending one _search_shards request per cluster

De
aHr
WB
N
Dom
1. A search request is 2. node3 fetches information
sSt
Nu
-OC about remote indices and shards
sent to node3
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-

my_cluster
1rt9

germany_cluster
p0o
rp-2
Spu
e-A
0is2

node1
p-r

node1
r
IeN
Hnt

node2
.t CE

Client node2
.Lo
nHd

node3
ar

node3
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !472


distributing without written permission is strictly prohibited
How It Works
• Once the details are retrieved of where the remote shards
are located, the search is executed just like any other
search

3. The query gets executed on …and the query gets executed

De
aHr
WB
the relevant local shards… on the relevant remote shards

N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-

my_cluster
1rt9

germany_cluster
p0o
rp-2
Spu
e-A
0is2

node1
p-r

node1
r
IeN
Hnt

node2
.t CE

Client node2
.Lo
nHd

node3
ar

node3
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !473


distributing without written permission is strictly prohibited
How It Works
• The coordinating node gets “size” hits from each shard
(local and remote) and performs reduction
• Then the top hits are fetched from the relevant shards
‒ and returned to the client

4. node3 gets “size” hits from 5. The top hits are fetched by

De
each shard and determines the node3 and returned to the client

aHr
WB
N
Dom
top hits
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-

my_cluster
1rt9

germany_cluster
p0o
rp-2
Spu
e-A
0is2

node1
p-r

node1
r
IeN
Hnt

node2
.t CE

Client node2
.Lo
nHd

node3
ar

node3
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !474


distributing without written permission is strictly prohibited
The Response
• All results retrieved from a remote index will be prefixed
with their remote cluster name
GET blogs,germany_cluster:blogs/_search
{
"query": {
"match": {
"hits": [
"title": "network" {
} "_index": "germany_cluster:blogs",
} "_type": "doc",
"_id": "3s1CKmIBCLh5xF6i7Y2g",
} "_score": 4.8329377,
"_source": {

De
aHr
"title": "Using Nmap + Logstash to Gain

WB
N
Dom
Insight Into Your Network",

sSt
...
Nu
-OC }
TI
U19
2L0

},
Sr-O
ASp

{
1-I
-0N

"_index": "blogs",
-
1rt9

"_type": "doc",
p0o
rp-2

"_id": "Mc1CKmIBCLh5xF6i7Y",
Spu
e-A

"_score": 4.561167,
0is2
p-r

"_source": {
r
IeN
Hnt

"title": "Brewing in Beats: New community


.t CE

Beat for network devices",


.Lo
nHd

...
ar
Le

}
},

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !475


distributing without written permission is strictly prohibited
Using Wildcards
• You can use wildcards for the names of the remote clusters
‒ The following search queries blogs on the local cluster and all
registered remote clusters:

GET blogs,*:blogs/_search
{
"query": {
"match": {
"title": "network"

De
aHr
WB
}

N
Dom
sSt
}
Nu
-OC
TI
}
U19

Same hits as the query on


2L0
Sr-O
ASp

the previous slide


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !476


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Overview of Upgrades
Versions
• Elasticsearch versions are denoted as X.Y.Z
‒ X is the major version
‒ Y is the minor version
‒ Z is the patch level or maintenance release
• Elasticsearch can use indices created in the previous major
version, but older indices must be reindexed or deleted.

De
aHr
WB
N
Dom
‒ 6.x can use indices created in 5.x
sSt
Nu
-OC
TI
U19
2L0

‒ 6.x cannot use indices created in 2.x or before


Sr-O
ASp
1-I
-0N
-
1rt9

‒ 5.x can use indices created in 2.x


p0o
rp-2
Spu
e-A
0is2
p-r

‒ 5.x cannot use indices created in 1.x or before


r
IeN
Hnt
.t CE
.Lo
nHd
ar

‒ ...
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !478


distributing without written permission is strictly prohibited
Upgrading Old Versions
• Elasticsearch can read indices from the previous major
version only
‒ so an index in 5.x can be used in a 6.x cluster
‒ but an index in 2.x can not be used in a 6.x cluster
‒ Elasticsearch will fail to start if
• Upgrading from 2.x or 1.x to 6.x requires all your old

De
indices to be reindexed

aHr
WB
N
Dom
sSt
Nu
-OC
can read
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE

2.x 5.x 6.x


.Lo
nHd
ar
Le

can not read

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !479


distributing without written permission is strictly prohibited
Overview of Upgrades
• Elasticsearch has a fairly rapid release cycle
‒ at some point, you will want to upgrade a cluster to a newer
version
• In general, there are two possible scenarios that occur
when upgrading a cluster:
‒ a rolling upgrade can occur (no downtime)

De
‒ a full cluster restart is required (some downtime)

aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r

6.0 6.1 2.x 5.x


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

rolling upgrade full cluster restart

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !480


distributing without written permission is strictly prohibited
Overview of Upgrades

Upgrade From Upgrade To Supported Upgrade Type


5.x 5.y Rolling upgrade
5.6 6.x Rolling upgrade
5.0 - 5.5 6.x Full cluster restart

De
aHr
<= 2.x 6.y Full cluster restart

WB
N
Dom
sSt
Nu
-OC
6.x 6.y Rolling upgrade
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2

Major version upgrade


p-r
r
IeN
Hnt

without full cluster restart?


.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !481


distributing without written permission is strictly prohibited
Upgrade 5.x to 6.x

rolling rolling
upgrade upgrade
5.6 6.x

If upgrading to
6.3 or later, no

De
aHr
need to reinstall

WB
N
Dom
X-Pack
sSt
Nu
-OC
5.3
TI
U19
2L0
Sr-O
ASp

full cluster
1-I
-0N

restart
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

6.x

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !482


distributing without written permission is strictly prohibited
Upgrading
• Unfortunately, there are too many options and details in the
Elastic upgrade process
• It would be impossible to cover all possible combinations in
a few minutes
• Elastic created an upgrade guide in which
‒ you answer a few questions

De
aHr
‒ and we build a list of the steps you should follow

WB
N
Dom
sSt
Nu
-OC
TI
‒ https://www.elastic.co/products/upgrade_guide
U19
2L0
Sr-O
ASp
1-I
-0N

• A cluster restart will be part of any list, so let's talk about it...
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !483


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Cluster Restart
Cluster Restart
• Rolling restart
‒ zero downtime
‒ reads and writes continue to operate normally
• Full cluster restart
‒ cluster unavailable during update
‒ updates are usually faster

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !485


distributing without written permission is strictly prohibited
Steps for a Rolling Restart
• A rolling restart allows the nodes in the cluster to be
upgraded one at a time
‒ by using a rolling restart
Step 0: backup your
cluster!
• To perform a rolling restart:
1. stop non-essential indexing (if possible)
2. disable shard allocation

De
aHr
WB
N
Dom
3. stop and update one node
sSt
Nu
-OC
TI
U19
2L0

4. start the node


Sr-O
ASp
1-I
-0N
-
1rt9

5. re-enable shard allocation and wait


p0o
rp-2
Spu
e-A
0is2
p-r

6. GOTO step 2 (until all nodes are updated)


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !486


distributing without written permission is strictly prohibited
Rolling Restart
• Step 1: stop indexing data (if possible)
‒ if you still need to index, that is OK but shard recovery will take
longer

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !487


distributing without written permission is strictly prohibited
Rolling Restart
• Step 2: Disable shard allocation
‒ and perform a synced flush
‒ recall the default is “1m”, but better to just disable it entirely:

PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable" : "none"

De
aHr
}

WB
N
Dom
}
sSt
Nu
-OC
TI
U19
2L0

POST _flush/synced
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2

• Use new_primaries instead of none to allow the creation


Spu
e-A
0is2
p-r

of new indices during the rolling restart


r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !488


distributing without written permission is strictly prohibited
Rolling Restart
• Step 3: Update the node:
‒ stop the node you want to update
‒ update the machine/application

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !489


distributing without written permission is strictly prohibited
Rolling Restart
• Step 4: Start the node up again
‒ and wait for it to join the cluster
‒ you can use the following command to confirm: GET _cat/nodes

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !490


distributing without written permission is strictly prohibited
Rolling Restart
• Step 5: Reenable shard allocation,

PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable" : "all"
}
}

De
• then wait for the cluster to be green again:

aHr
WB
N
Dom
sSt
Nu
-OC
‒ if green is not possible, check there are no initializing or
TI
U19
2L0
Sr-O

relocating shards before continuing


ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A

GET _cat/health
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !491


distributing without written permission is strictly prohibited
Repeat for Each Node
• Step 6 is to start the process over for the next node

I am upgraded.
Who is next? I will go next!

De
my_cluster

aHr
WB
N
Dom
sSt
Nu
node1 TI
-OC node2 node3
U19
2L0
Sr-O
ASp
1-I
-0N

version 6.1 version 5.6 version 5.6


-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !492


distributing without written permission is strictly prohibited
Full Cluster Restart
• The cluster will be unavailable during the update
Step 0: backup
your cluster!
• The steps are very similar to a rolling upgrade:
1. stop indexing (e.g. disable writes, or make ES unreachable)
2. disable shard allocation
make sure to use “persistent”
3. perform a synced flush

De
aHr
WB
N
Dom
sSt
Nu
4. shutdown and update all nodes TI
-OC
U19
2L0

downtime is here
Sr-O
ASp

5. start all dedicated master nodes


1-I
-0N
-
1rt9
p0o

and wait for the election


rp-2
Spu

6. start the other nodes


e-A
0is2
p-r
r
IeN
Hnt
.t CE

7. wait for yellow


.Lo
nHd
ar
Le

8. reenable shard allocation

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !493


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter Review
Summary
• A node in production mode must pass a series of checks,
or the node will not start
• For best performance, choose SSD over spinning disks
• Local disks are preferred - the software provides HA
• In general, choose medium machines over large
machines

De
aHr
WB
• Cross cluster search allows you to search multiple clusters
N
Dom
sSt
within the same request. Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

• Upgrading to a new major version of Elasticsearch usually


-0N
-
1rt9
p0o
rp-2

requires a full cluster restart, but can also be done with a


Spu
e-A
0is2

rolling upgrade from 5.6 to 6.x


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

• A rolling upgrade allows the nodes in the cluster to be


Le

restarted one at a time


Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !495
distributing without written permission is strictly prohibited
Quiz
1. True or False: In production mode, if a bootstrap check
fails then the node will not start.
2. True or False: SAN storage is preferred over local disks to
provide high availability of data.
3. True or False: It is a good idea to separate the transport
and HTTP traffic over different network interfaces.
4. Why would you use cross cluster search?

De
aHr
WB
N
Dom
sSt
5. True or False: You can search and index documents Nu
-OC
TI
U19
2L0
Sr-O

during a rolling restart.


ASp
1-I
-0N
-
1rt9
p0o
rp-2

6. What is the benefit of performing a synced flush right before


Spu
e-A
0is2

a node restart?
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !496


distributing without written permission is strictly prohibited
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Lab 9
From Dev to Production
Le
ar
nHd
.Lo
.t CE
Hnt

Conclusions
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Resources
• https://www.elastic.co/learn
‒ https://www.elastic.co/training
‒ https://www.elastic.co/community
‒ https://www.elastic.co/docs
• https://discuss.elastic.co

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !499


distributing without written permission is strictly prohibited
Elastic Training
Empowering Your People

FOUNDATION
Immersive Learning
Lab-based exercises and knowledge
checks to help master new skills

Solution-based Curriculum
Real-world examples and common use
cases

De
aHr
WB
N
Dom
sSt
Nu
Experienced Instructors TI
U19
-OC
2L0

Expertly trained and deeply rooted in


Sr-O
ASp
1-I
-0N

everything Elastic
-
1rt9
p0o

SPECIALIZATIONS
rp-2
Spu
e-A

LOGGING METRICS APM


0is2
p-r

Performance-based Certification
r
IeN
Hnt
.t CE
.Lo

Apply practical knowledge to real-world


nHd
ar
Le

‹#› use cases, in real-time


ADVANCED SECURITY DATA
SEARCH ANALYTICS SCIENCE
Elastic Consulting Services

ACCELERATING YOUR PROJECT SUCCESS

FLEXIBLE SCOPING
Shifts resource as your
requirements change

De
aHr
WB
PHASE-BASED

N
Dom
GLOBAL CAPABILITY
sSt
Nu
PACKAGES -OC Provide expert, trusted
TI
U19

Align to project milestones at


2L0
Sr-O

services worldwide
ASp

any stage in your journey


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r

EXPERT ADVISORS PROJECT GUIDANCE


r
IeN
Hnt
.t CE

Understand your specific Ensures your goals and


.Lo
nHd
ar

use cases accelerate timelines


Le

‹#›
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Thank you!
Please complete the online survey
Le
ar
nHd
.Lo
.t CE
Hnt
IeN
rp-r
0is2
e-A

Quiz Answers
Spu
rp-2
p0o
1rt9
- -0N
1-I
ASp
Sr-O
2L0
U19
TI
-OC
Nu
sSt
Dom
N
WB
aHr
De
Chapter 1 Quiz Answers
1. True
2. Field names, term dictionary, term frequency, term
proximity, deleted documents, stored fields, normalization
factors
3. False! Only call _forcemerge on read-only indices
4. True

De
aHr
WB
5. The response is only returned after the document is
N
Dom
sSt
searchable (in a segment) Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

6. True. The write operation is written in the translog, which is


-0N
-
1rt9
p0o
rp-2

fsynced to disk. So, when a client gets an OK for the write


Spu
e-A
0is2

operation, it means that it is fsynced in the translog.


p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar

7. Any time that you will not have more writes and would like
Le

to speedup recoveries, e.g. cluster restart.


Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !504
distributing without written permission is strictly prohibited
Chapter 2 Quiz Answers
1. The field would produce better search results if the
individual developer environments were split apart and
stored in an array, instead of as a delimited string
2. It is a numeric value but it is stored as a string, so any
situation that requires its value would require parsing and
be tedious. This would be much better modeled as a float
3. Disable the field by setting “enabled” to false

De
aHr
WB
N
Dom
4. Set “dynamic” to “strict”
sSt
Nu
-OC
TI
U19
2L0
Sr-O

5. “intersect”
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !505


distributing without written permission is strictly prohibited
Chapter 3 Quiz Answers
1. True
2. False. Only documents indexed after the mapping change
will pick up the new field
3. Documents in the source index that have changed (have a
higher _version number) will overwrite older documents in
the destination
4. All of the the documents in the index! It is a match_all

De
aHr
WB
N
Dom
query, so all documents are hits
sSt
Nu
-OC
TI
U19
2L0
Sr-O

5. The document will be indexed into the index named


ASp
1-I
-0N

clientip.country_iso_code
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !506


distributing without written permission is strictly prohibited
Chapter 4 Quiz Answers
1. False. They are powerful, but expensive and should NOT be
used often
2. Using the exist query inside a must_not
3. True
4. Lots of reasons, including to avoid repeating code in multiple
places, to minimize mistakes, make it easier to test and
execute your queries, share queries between applications,

De
aHr
WB
and allow users to only execute a few predefined queries
N
Dom
sSt
Nu
-OC
TI
U19
2L0

5. In buckets_path, when a pipeline agg is defined at a higher


Sr-O
ASp
1-I
-0N

level then the sub-aggregation it needs to refer to


-
1rt9
p0o
rp-2
Spu
e-A
0is2

6. True
p-r
r
IeN
Hnt
.t CE
.Lo
nHd

7. Use a percentile aggregation


ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !507


distributing without written permission is strictly prohibited
Chapter 5 Quiz Answers
1. The current day’s index could be a hot node handling
indexing and queries, while all previous indices could be on
warm nodes (assuming they do not get queried as often)
2. The index will be defined, but all of its shards will be
unallocated and the cluster will go into a red status
3. Filtering is not “aware” of the physical configuration of your
hardware. Awareness ensures that shards are distributed

De
across “zones” that you define

aHr
WB
N
Dom
sSt
Nu
-OC
TI

4. Forced awareness never allows copies of the same shard


U19
2L0
Sr-O
ASp

to be in the same zone - shard allocation awareness does


1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !508


distributing without written permission is strictly prohibited
Chapter 6 Quiz Answers
1. True
2. In terms of searching, it would essentially be equivalent
3. Overallocate! It allows for future scaling of the cluster
4. False. There are many scenarios where 1 shard might
actually be optimal, especially if the data all fits in a single
shard

De
aHr
WB
5. 9 or 10, depending upon if you want some extra buffer
N
Dom
sSt
room. And that depending up on other requirements it may Nu
-OC
TI
U19
2L0
Sr-O

be a single index or multiple indices that total the 9 or 10


ASp
1-I
-0N

shards
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !509


distributing without written permission is strictly prohibited
Chapter 7 Quiz Answers
1. True
2. False
3. There is an overhead to parent/child - they are separate
documents that must be joined. The join is fast, but nested
objects are all a single document which will always be
faster for searches (but more expensive for updates)
4. True

De
aHr
WB
N
Dom
sSt
5. False Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I

6. True
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !510


distributing without written permission is strictly prohibited
Chapter 8 Quiz Answers
1. True, but only the commercial version does. The free version
can only monitor one
2. If a cluster fails, you will be able to view its history and perhaps
diagnose the issue. There are also performance and security
benefits.
3. 10 seconds
4. Look at the thread pool queues either with _nodes/

De
aHr
WB
thread_pool or _cat/thread_pool
N
Dom
sSt
Nu
-OC
TI
U19

5. Trigger, Input, Condition, Transform, Action


2L0
Sr-O
ASp
1-I
-0N
-
1rt9

6. False. Only users with watcher_admin roles


p0o
rp-2
Spu
e-A
0is2
p-r

7. True, using Monitoring data. create a watch on `monitoring-


r
IeN
Hnt
.t CE

es*` in the `node_stats.fs.total.free_in_bytes` field


.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !511


distributing without written permission is strictly prohibited
Chapter 9 Quiz Answers
1. True
2. False: SAN is not needed
3. True
4. To have a single view of the data spread across multiple
clusters.
5. True. You can, but it slows down the recovery time

De
aHr
WB
N
Dom
sSt
6. It greatly speeds up the recovery time of indices that have TI
U19
Nu
-OC

not changed while the node (or nodes) was down


2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !512


distributing without written permission is strictly prohibited
Elasticsearch Engineer II
Course: Elasticsearch Engineer II

Version 6.6.0

© 2015-2019 Elasticsearch BV. All rights reserved. Decompiling, copying, publishing and/or distribution without written consent of Elasticsearch BV is
strictly prohibited.

De
aHr
WB
N
Dom
sSt
Nu
-OC
TI
U19
2L0
Sr-O
ASp
1-I
-0N
-
1rt9
p0o
rp-2
Spu
e-A
0is2
p-r
r
IeN
Hnt
.t CE
.Lo
nHd
ar
Le

Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or !513


distributing without written permission is strictly prohibited

You might also like