Lab1 BigData
Lab1 BigData
Hadoop : HDFS
Rania Yangui
Installing Docker
docs.docker.com].
Prerequisites
Throughout this lab, we will use three containers representing respectively a master
node (Namenode) and two slave nodes (Datanodes).
To do this, you must have Docker installed on your machine and have it correctly
configured.
2. Create the three containers from the downloaded image. For that:
2.2 Create and launch the three containers (the -p instructions makes a mapping
between the ports of the host machine and those of the container)
Master
docker run -itd --net=hadoop -p 8031:8031 --name hadoop-master --hostname
hadoop-master csturm/hadoop-python:h3.2-p3.9.10-j11
Slaves
docker run -itd -p 8040:8042 --net=hadoop --name hadoop-slave1 --hostname
hadoop-slave1 csturm/hadoop-python:h3.2-p3.9.10-j11
We will find ourselves in the NameNode shell, and we will be able to manipulate
the cluster as we wish. The first thing to do, once in the container, is to launch
Hadoop and Yarn. A script is provided for this, called start-all.sh (in the sbin
folder). Run this script:
# ls -l
Cd hadoop
cd sbin
ls -l
./start-all.sh
All commands interacting with the Hadoop system begin with “hdfs dfs”. Then,
the added options are largely inspired by standard Unix commands.
1. Create a directory in HDFS, called input. To do this, type:
exit
2.2 Copy the file from the local to the docker container
c:/dblp.json hadoop-master:/purchases.txt
2.4 Check the existence and display the contents of the file
# tail purchases.txt
2.5 Load the purchases file into the input directory you created