Es Lab Final
Es Lab Final
2 Lab Instructions 5
2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Downloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Part 1 - Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Virtual Machine Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.3 Install Ubuntu 18.04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.4 Install Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.5 Initial Elasticsearch config . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.6 Clone es-master-a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.7 Configure Elasticsearch on the new nodes . . . . . . . . . . . . . . . . . . . 8
2.2.8 Submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Part 2 - Inserting and Deleting Elasticsearch Data . . . . . . . . . . . . . . . . . . 11
2.3.1 Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Create a New Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Add Documents to the car index . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 Delete a Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.5 Unexpected Node Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.6 Lab Submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Conclusion 17
4 Further Study 17
1
1 Introduction
Elasticsearch is a distributed and open source search engine used to index various types of un-
structured data [1]. Each independent machine running an instance of Elasticsearch is referred to
in this lab as a server. Many servers who coordinate with one another comprise a cluster. In this
lab, you will create a three-node cluster that is able to recover from an unexpected outage without
any user intervention. The Elasticsearch documentation is extensive and will prove useful during
the lab. You can find it here: https://www.elastic.co/guide/en/elasticsearch/reference/
current/index.html
1.1 Indices
The essential components of an Elasticsearch cluster are indices, shards, and replicas. Indices
are a logical namespace which map to one or more primary shards and have zero or more replica
shards [2]. An index, in some cases, can be thought of similarly to a relational database. See the
table below for a correlation of the terms used in each. The advantage that Elasticsearch provides
is in the sharding of indices. Depending on your architecture, shards can be distributed across
multiple servers and replicated.
2
1.2 Shards and Replicas
The concept of shards and replicas is important to understand. A shard is a self-contained index
that contains a subset of that index’s documents [3]. This is how Elasticsearch distributes its data
across multiple physical nodes. There are two types of shards: primary and replica.
Primary shards are the original copy of the shard. Primary shards are then copied to other
machines as replicas. When a server is powered off or loses its network connection, the replica
shards are used to create new primary shards to ensure the availability of the index data.
In this lab, each index will have three primary shards (one for each node) and one replica of each
of the primary shards. See Figure 2.
The squares labelled P0, P1, and P2 all represent primary shards. These are the original copies
of the documents contained in the index. The squares labelled R0, R1, and R2 are the respec-
tive replicas of each primary shard. You can see that if any one of the nodes were to go down
unexpectedly, any of the shards could be reliably replicated to recreate the 0 through 2 shards.
3
Figure 3: The car index distributed across three nodes.
Each shard contains its own set of documents from the index, in this case multiple models of cars
available at the dealership. It’s important to know that the shards do not overlap, and each shard
is required for the complete index. Because of the replicas, any of the three nodes could go down
unexpectedly and the entire index (via Shards 1, 2, and 3) would still be available. This is one of
the advantages to using a multi-node cluster with Elasticsearch.
4
curl -XPOST "http://localhost:9200/car/toyota" -d’
{
"model": "Camry"
"year": "2009"
"color": "green"
}’
2 Lab Instructions
This lab will be separated into two separate portions: installing and configuring Elasticsearch,
and inserting and deleting data in the cluster.
The Elasticsearch cluster for this lab will consist of three nodes, each running on an independent
virtual machine with its own IP address (your IP addresses may differ):
In production clusters, the master nodes generally do not do any data processing. However, for
the purposes of this lab, the master nodes will process data using the node.data: true option
in /etc/elasticsearch/elasticsearch.yml.
2.1 Prerequisites
This lab will use Oracle VirtualBox as a hypervisor for three virtual machines. You are free to
use different software if you prefer. You will also need 60 GB of free space on your drive and at
least 8GB of memory.
2.1.1 Downloads
You will need Oracle VirtualBox installed on your computer.
https://www.virtualbox.org/wiki/Downloads
5
2.2 Part 1 - Cluster Setup
See Figure 4 for the cluster architecture.
• Name: es-master-a
• Type: Linux
• Version: Ubuntu (64-bit)
• Memory Size: 2560 MB
• File Size: 20.0 GB
2. Open the Preferences window found in the File menu and select Network on the sidebar.
Click the icon to add a NAT 1 Network.
3. Set the Network Name to dis and the Network CIDR to 192.168.128.0/24.
1
Network Address Translation
6
4. Open the Settings window for your newly created VM.
5. Choose Network on the sidebar and select NAT Network from the Attached to: dropdown
menu. Choose the dis NAT Network.
2. Choose the Ubuntu image file you downloaded previously when prompted.
3. Choose the default options during the installer. When you get to the Profile setup dialog,
enter these details:
7
8. Update the repository information again with sudo apt update and then install Elastic-
search with sudo apt install elasticsearch
2. Note that most of the lines in this file are commented. Find the lines listed below, uncomment
them if needed, and change their values as shown. The lines after the line break beginning
with node. will need to be added yourself.
cluster.name: discluster
node.name: es-master-a
# You need to run ifconfig to find your node’s IP address
network.host: 192.168.128.4
discovery.seed_hosts: ["192.168.128.4", "192.168.128.7", "192.168.128.8"]
cluster.initial_master_nodes: ["192.168.128.4", "192.168.128.8"]
4. Use the Generate new MAC addresses for all network adapters option when cloning.
8
3. Make a note of the IP addresses on the clones after the reboot. Go to the cluster array lines
in the Elasticsearch configuration file and make sure the IP addresses match the three VMs.
4. The configuration file will still be present on the clones, but you must change the following
lines to be unique for each node.
# es-master-b
node.name: es-master-b
# Use ifconfig to find IP address
network.host: 192.168.128.7
# es-data-a
node.name: es-data-a
# Use ifconfig to find IP address
network.host: 192.168.128.8
node.voting_only: true
5. Start and enable the Elasticsearch service on each of the virtual machines:
6. If there is no output, the service started successfully. You can check with systemctl status
elasticsearch.
2.2.8 Submission
1. On each node, take screenshots with your name and student ID visible of the following
commands to verify the cluster formation was successful. You will need to use the host’s IP
address on each VM, e.g. 192.168.128.4 on es-master-a, etc. See example below. Send these
screenshots to your TA.
9
Figure 5: Example of lab submission screenshot - es-master-a.
10
Figure 7: Example of lab submission screenshot - es-data-a.
2. Demonstrate the running cluster to your TA by running the following commands with suc-
cessful return codes:
The benefits of using a NoSQL database such as Elasticsearch are being free to store unstructured
data. In this section, we will use the curl tool to perform HTTP requests such as GET and POST
on our Elasticsearch cluster. We will also observe how nodes react when a member of the cluster
is powered off unexpectedly.
11
2.3.2 Create a New Index
Resources
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.
html
To create the car index, send a file containing JSON data to Elasticsearch using curl. Create a
car.json file in the text editor of your choice with the following contents:
{
"settings" : {
"index" : {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
}
You should see an output line including the index name (car), the health state of the index (green),
the shard count (3) and the document count (1). See Figure 8. As new indices are created on
single nodes in the cluster, they are replicated to the other members of the cluster automatically.
12
{"index":{"_id":"1"}}
{"make":"Toyota","model":"Camry","year":"1990","color":"green"}
{"index":{"_id":"2"}}
{"make":"Toyota","model":"Corolla","year":"2012","color":"blue"}
{"index":{"_id":"3"}}
{"make":"Toyota","model":"Celica","year":"2003","color":"white"}
{"index":{"_id":"4"}}
{"make":"Toyota","model":"Prius","year":"2016","color":"grey"}
{"index":{"_id":"5"}}
{"make":"Toyota","model":"Corolla","year":"2016","color":"white"}
{"index":{"_id":"6"}}
{"make":"Toyota","model":"Supra","year":"1994","color":"red"}
{"index":{"_id":"7"}}
{"make":"Toyota","model":"Yaris","year":"2014","color":"blue"}
{"index":{"_id":"8"}}
{"make":"Toyota","model":"Camry","year":"2017","color":"grey"}
{"index":{"_id":"9"}}
{"make":"Toyota","model":"Prius","year":"2014","color":"black"}
You can also download this file using wget in the terminal:
wget https://gitlab.com/spencerriner/dis_lab/-/raw/master/inventory.json
Search for one of the documents on es-data-a using curl and the id (1-9) to verify successful
document creation. See Figure 9 for the expected output.
13
Figure 9: Listing document with id 9.
As shown, as documents are added, they are immediately replicated to the other nodes in the
cluster. You can also use curl to search documents based on their properties. Use this curl
command to find documents with the color:white property:
14
2.3.4 Delete a Document
Resources
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html
You can also delete documents in an index with curl using their id property:
Issue a sudo poweroff command to the current master node (in this case, es-master-b) then
list all nodes on the other master-eligible node to verify the newly elected master node (it should
have an asterisk next to its name). Then verify the cluster health to make sure that all 6 shards
have been reallocated to the remaining nodes. See Figure 13.
15
curl -XGET "http://192.168.128.4:9200/_cat/nodes?pretty"
curl -XGET "http://192.168.128.4:9200/_cluster/health?pretty"
Power on the old master and run the cluster health command again to verify number of nodes is
back to 3.
3. Power off the master node, show the new master and a cluster health state of green
16
3 Conclusion
You have now successfully created a distributed Elasticsearch cluster from scratch and demon-
strated how it provides a high level of availability by tolerating sporadic node loss. Elasticsearch
is highly scalable, as more nodes can be added to distribute the storage and compute needs of a
growing dataset.
4 Further Study
For more experience with Elasticsearch, you may try these additional projects.
1. Install the Elastic Stack (Elasticsearch, Logstash, Kibana) on a cluster of VMs and
visualize data using the Kibana web interface.
Tutorial: https://www.digitalocean.com/community/tutorials/
how-to-install-elasticsearch-logstash-and-kibana-elastic-stack-on-ubuntu-18-04
3. After installing the Elastic Stack, install Filebeat on the nodes to ingest data to the cluster.
Tutorial: https://www.elastic.co/guide/en/beats/filebeat/current/
filebeat-getting-started.html
References
[1] Elastic, “What is elasticsearch?,” January 2020. [Online]. Available: https://www.elastic.
co/what-is/elasticsearch. [Accessed January 28, 2020].
[2] Zachary Tong, “What is an elasticsearch index?,” February 2013. [Online]. Available: https:
//www.elastic.co/blog/what-is-an-elasticsearch-index. [Accessed January 26, 2020].
[3] Elastic, “Scalability and resilience: clusters, nodes, and shards,” December 2019.
[Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/
current/scalability.html. [Accessed January 28, 2020].
[5] Adam Vanderbush, “Avoiding the split brain problem in elasticsearch,” June 2017. [On-
line]. Available: https://qbox.io/blog/split-brain-problem-elasticsearch. [Accessed
February 2, 2020].
[6] “Rest api tutorial,” February 2020. [Online]. Available: https://restfulapi.net/. [Ac-
cessed February 4, 2020].
[7] Daniel Stenberg, Everything curl. February 2020. [Online]. Available: https://ec.haxx.se/.
[Accessed February 4, 2020].
17
[8] “Gnupg,” January 2020. [Online]. Available: https://gnupg.org. [Accessed February 8,
2020].
[9] Elastic, “Adding nodes to your cluster,” December 2019. [Online]. Avail-
able: https://www.elastic.co/guide/en/elasticsearch/reference/current/
add-elasticsearch-nodes.html. [Accessed January 26, 2020].
[10] Elastic, “Index some documents,” December 2019. [Online]. Available: https://www.
elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.
html. [Accessed February 9, 2020].
18