0% found this document useful (0 votes)

6 views

Lab1 BigData

Uploaded by

montaest100

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lab1 BigData

Uploaded by

montaest100

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Lab 1

Hadoop : HDFS
Rania Yangui

Objectives of the Lab

⁃ Introduction to the Hadoop Framework
⁃ Use of Docker to launch a Hadoop cluster of 3 nodes: 1 master and 2 slaves.
⁃ Learn the concepts and commands to properly manage files on HDFS.
Tools

⁃ Apache Hadoop [http://hadoop.apache.org/]

⁃ Docker [https://www.docker.com/]
⁃ Python
⁃ Unix-like or Unix-based Systems (Various Linux and MacOS)
Basic concepts
⁃ Apache Hadoop: is an open-source framework for storing and processing
large data on a cluster. It is used by many contributors and users.
⁃ HDFS (Hadoop Distributed File System): Distributed file system for storing
very large files.
Hadoop and Docker

To deploy the Hadoop Framework, we will use Docker containers

[https://www.docker.com/]. The use of containers will guarantee consistency
between development environments and will considerably reduce the complexity of
machine configuration as well as the cumbersome execution (if we opt for the use
of a virtual machine).

Installing Docker

Download the Windows version of Docker [Docker Desktop for Windows

docs.docker.com].

Prerequisites

⁃ Windows 10 64-bit: Pro, Enterprise, or Education (Build 16299 or later).

⁃ For Windows 10 Home, see Install Docker Desktop on Windows Home.
⁃ Hyper-V and Containers Windows features must be enabled.
⁃ The following hardware prerequisites are required to successfully run Client
Hyper-V on Windows 10:
o 64 bit processor with Second Level Address Translation (SLAT)
o 4GB system RAM

o BIOS-level hardware virtualization support must be enabled in the BIOS

settings. For more information, see Virtualization.

Installing Hadoop containers

Throughout this lab, we will use three containers representing respectively a master
node (Namenode) and two slave nodes (Datanodes).

To do this, you must have Docker installed on your machine and have it correctly
configured.

1. Open the command line, and type the following instructions:

docker pull csturm/hadoop-python:h3.2-p3.9.10-j11

2. Create the three containers from the downloaded image. For that:

2.1 Create a network that will connect the three containers

docker network create --driver=bridge Hadoop

2.2 Create and launch the three containers (the -p instructions makes a mapping
between the ports of the host machine and those of the container)

Master
docker run -itd --net=hadoop -p 8031:8031 --name hadoop-master --hostname
hadoop-master csturm/hadoop-python:h3.2-p3.9.10-j11

Slaves
docker run -itd -p 8040:8042 --net=hadoop --name hadoop-slave1 --hostname
hadoop-slave1 csturm/hadoop-python:h3.2-p3.9.10-j11

docker run -itd -p 8041:8042 --net=hadoop --name hadoop-slave2 --hostname

hadoop-slave2 csturm/hadoop-python:h3.2-p3.9.10-j11

3. Go to the master container to start using it

docker exec -it hadoop-master bash

The result of this execution will be as follows: root@hadoop-master:~#

We will find ourselves in the NameNode shell, and we will be able to manipulate
the cluster as we wish. The first thing to do, once in the container, is to launch
Hadoop and Yarn. A script is provided for this, called start-all.sh (in the sbin
folder). Run this script:
# ls -l
Cd hadoop
cd sbin
ls -l
./start-all.sh

Getting Started with Hadoop: Manipulating Files on HDFS

All commands interacting with the Hadoop system begin with “hdfs dfs”. Then,
the added options are largely inspired by standard Unix commands.
1. Create a directory in HDFS, called input. To do this, type:

# hdfs dfs -mkdir -p /user/root/input

2. We will use the purchases.txt file as input for MapReduce processing.

[https://www.kaggle.com/datasets/dsfelix/purchasestxt?resource=download]

2.1 Leave the container and return to the local

exit

2.2 Copy the file from the local to the docker container

c:/dblp.json hadoop-master:/purchases.txt

2.3 Connect to container again

docker exec -it hadoop-master bash

2.4 Check the existence and display the contents of the file

# tail purchases.txt

2.5 Load the purchases file into the input directory you created

hdfs dfs -put purchases.txt /user/root/input

2.6 To display the contents of the input directory, the command is

hdfs dfs -ls /user/root/input

Pascha Mobile Mouse Server 9099 Metasploit Privilege Escalation Overwriting Service Windows
No ratings yet
Pascha Mobile Mouse Server 9099 Metasploit Privilege Escalation Overwriting Service Windows
5 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
T24 - Navigation
100% (1)
T24 - Navigation
60 pages
From C To Assembly To Shellcode - Hasherezade
No ratings yet
From C To Assembly To Shellcode - Hasherezade
35 pages
LipiScan Networking Guide SECTIONS 1-6
No ratings yet
LipiScan Networking Guide SECTIONS 1-6
18 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
BDA unit-4
No ratings yet
BDA unit-4
38 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Lab 1
No ratings yet
Lab 1
12 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
8 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
Docker
No ratings yet
Docker
9 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
Running Hadoop On Ubuntu Linux
No ratings yet
Running Hadoop On Ubuntu Linux
15 pages
Big Data
No ratings yet
Big Data
23 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Hadoop
No ratings yet
Hadoop
27 pages
Program 1 & 2 DS
No ratings yet
Program 1 & 2 DS
4 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Unit 3
No ratings yet
Unit 3
61 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
BDM Hdfs
No ratings yet
BDM Hdfs
37 pages
Bda Record
No ratings yet
Bda Record
83 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
HANDS Hadoop Cloud
No ratings yet
HANDS Hadoop Cloud
10 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Hadoop1
No ratings yet
Hadoop1
15 pages
Big Data File
No ratings yet
Big Data File
16 pages
BIG data master
No ratings yet
BIG data master
24 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Bdafile
No ratings yet
Bdafile
9 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
Big Data Analytics Laboratory
No ratings yet
Big Data Analytics Laboratory
57 pages
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
Lab2_BigData-HDFSp
No ratings yet
Lab2_BigData-HDFSp
4 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
RadioDJ - Frequently Asked Questions - RadioDJ - Free Radio Automation Software
No ratings yet
RadioDJ - Frequently Asked Questions - RadioDJ - Free Radio Automation Software
2 pages
G.I.T TEACHERS GUIDE 2017 NEW SYLLABUS TAMIL MEDIUM AlevelApi. Com PDF
No ratings yet
G.I.T TEACHERS GUIDE 2017 NEW SYLLABUS TAMIL MEDIUM AlevelApi. Com PDF
131 pages
Hyper V TDP Install UserGuide PDF
No ratings yet
Hyper V TDP Install UserGuide PDF
270 pages
SAP System Administration Made Easy009
No ratings yet
SAP System Administration Made Easy009
75 pages
Microsoft Endpoint Manager Overview
No ratings yet
Microsoft Endpoint Manager Overview
4 pages
Veritas Netbackup 8.0 Blueprint Accelerator
No ratings yet
Veritas Netbackup 8.0 Blueprint Accelerator
29 pages
Hostingultraso Com
No ratings yet
Hostingultraso Com
5 pages
MinitabDeploymentGuide en
No ratings yet
MinitabDeploymentGuide en
4 pages
PRIMEQUEST 2000series InstallManual
No ratings yet
PRIMEQUEST 2000series InstallManual
162 pages
PACS AA Diagnostic Utility Installation and Execution Guide
No ratings yet
PACS AA Diagnostic Utility Installation and Execution Guide
9 pages
GU Dct4+ Standalone Manual
No ratings yet
GU Dct4+ Standalone Manual
7 pages
AZ-104 Part2
No ratings yet
AZ-104 Part2
108 pages
NCERT Solutions IT unit 1
No ratings yet
NCERT Solutions IT unit 1
4 pages
Powervault-Md3200 - Deployment Guide - En-Us PDF
No ratings yet
Powervault-Md3200 - Deployment Guide - En-Us PDF
74 pages
Sra Virtual Appliance Setup
No ratings yet
Sra Virtual Appliance Setup
50 pages
Central Policy Manager (CPM)
No ratings yet
Central Policy Manager (CPM)
1 page
Installing Virtual Box
No ratings yet
Installing Virtual Box
46 pages
Las - Ict 7 - Special Programs - Q4 - Week 4
No ratings yet
Las - Ict 7 - Special Programs - Q4 - Week 4
12 pages
Fire Monitoring Syst Data Sheet EnUS 84521619595
No ratings yet
Fire Monitoring Syst Data Sheet EnUS 84521619595
4 pages
O'Reilly - Windows XP in A Nutshell
No ratings yet
O'Reilly - Windows XP in A Nutshell
289 pages
CIS Ubuntu Linux 14.04 LTS Benchmark v2.0.0
No ratings yet
CIS Ubuntu Linux 14.04 LTS Benchmark v2.0.0
296 pages
Updownload Softwarehandleiding Eng
No ratings yet
Updownload Softwarehandleiding Eng
48 pages
Preventive Maintenance COC 4
No ratings yet
Preventive Maintenance COC 4
31 pages
QPWPPWPWPW
No ratings yet
QPWPPWPWPW
2 pages
PDF Siemens Teamcenter PLM Guide - Compress
No ratings yet
PDF Siemens Teamcenter PLM Guide - Compress
120 pages
UPSentry Smart 2000 For Mac OSX 10.1 (EN)
No ratings yet
UPSentry Smart 2000 For Mac OSX 10.1 (EN)
8 pages

Lab1 BigData

Uploaded by

Lab1 BigData

Uploaded by

Lab 1

Objectives of the Lab

⁃ Apache Hadoop [http://hadoop.apache.org/]

To deploy the Hadoop Framework, we will use Docker containers

Download the Windows version of Docker [Docker Desktop for Windows

⁃ Windows 10 64-bit: Pro, Enterprise, or Education (Build 16299 or later).

o BIOS-level hardware virtualization support must be enabled in the BIOS

Installing Hadoop containers

1. Open the command line, and type the following instructions:

docker pull csturm/hadoop-python:h3.2-p3.9.10-j11

2.1 Create a network that will connect the three containers

docker network create --driver=bridge Hadoop

docker run -itd -p 8041:8042 --net=hadoop --name hadoop-slave2 --hostname

3. Go to the master container to start using it

docker exec -it hadoop-master bash

The result of this execution will be as follows: root@hadoop-master:~#

Getting Started with Hadoop: Manipulating Files on HDFS

# hdfs dfs -mkdir -p /user/root/input

2. We will use the purchases.txt file as input for MapReduce processing.

2.1 Leave the container and return to the local

2.3 Connect to container again

docker exec -it hadoop-master bash

hdfs dfs -put purchases.txt /user/root/input

2.6 To display the contents of the input directory, the command is

hdfs dfs -ls /user/root/input

You might also like