0% found this document useful (0 votes)

55 views20 pages

6 Hadoop

Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of computers. It allows for the distributed processing of large datasets across clusters of computers using simple programming models. It scales up from single servers to thousands of machines, with very high fault tolerance. The core of Hadoop includes Hadoop Distributed File System for storage, and MapReduce for distributed computing.

Uploaded by

Suresh Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views20 pages

6 Hadoop

Uploaded by

Suresh Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Hadoop

By Dinesh Amatya
Hadoop

 The exponential growth of data first presented

challenges to cutting-edge businesses such as
Google, Yahoo, Amazon, and Microsoft
 Google publicize GFS, MapReduce
 Doug Cutting led the charge to develop an open
source version of this MapReduce system called
Hadoop
 Yahoo supported

Hadoop

 Hadoop is an open source framework for writing and running

distributed applications that process large amounts of data
– Hdfs - distributed storage

– Mapreduce – distributed computation

 transfers code instead of data
 data replication

–
Building blocks of Hadoop

 NameNode
 DataNode
 JobTracker
 TaskTracker
 Secondary NameNode



Setting up SSH for a Hadoop
cluster
Define a common account
Verify SSH installation Sudo apt-get install openssh-server
or
[hadoop-user@master]$ which ssh
sudo dpkg -i openssh.deb
/usr/bin/ssh
[hadoop-user@master]$ which sshd
/usr/bin/sshd
[hadoop-user@master]$ which ssh-keygen
/usr/bin/ssh-keygen
Setting up SSH for a Hadoop
cluster
Generate SSH key pair
[hadoop-user@master]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop-user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.

Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub.
Setting up SSH for a Hadoop
cluster
Distribute public key and validate logins
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key
[hadoop-user@target]$mkdir ~/.ssh
[hadoop-user@target]$chmod 700 ~/.ssh
[hadoop-user@target]$mv ~/master_key ~/.ssh/authorized_keys
[hadoop-user@target]$chmod 600 ~/.ssh/authorized_keys
[locally :: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ]
[hadoop-user@master]$ ssh target
Last login: Sun Jan 4 15:32:49 2009 from master
Running Hadoop

[hadoop-user@master]$gedit .bashrc

export JAVA_HOME = /opt/jdk1.7.0

export PATH = $PATH:$JAVA_HOME/bin
Running Hadoop

[hadoop-user@master]$ cd $HADOOP_HOME/conf

hadoop-env.sh
export JAVA_HOME=/usr/share/jdk
Running Hadoop
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_tmp</value>
</property>
</configuration>
Running Hadoop

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Running Hadoop

hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Running Hadoop

[hadoop-user@master]$ cat masters

localhost
[hadoop-user@master]$ cat slaves
localhost

[hadoop-user@master]$ bin/hadoop namenode -format

[hadoop-user@master]$ bin/start-all.sh
Running Hadoop

In file .bashrc

export HADOOP_HOME=/opt/programs/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin
Web-based cluster UI
Web-based cluster UI
Working with files in HDFS

Basic file commands

hadoop fs -cmd <args>

hadoop fs –ls /
hadoop fs –mkdir /user/chuck
hadoop fs -put example.txt .
hadoop fs -put example.txt /user/chuck
hadoop fs -get example.txt .
Working with files in HDFS

hadoop fs -cat example.txt | head

hadoop fs –rm example.txt

hadoop fs –rmr /user/hdfs/dir1

hadoop fs -chmod 777 -R example.txt

hadoop fs -chown hdfs:hadoop example.txt

Working with files in HDFS

hadoop copyFromLocal example.txt .

hadoop copyToLocal example.txt .

hadoop fs -getmerge files/ mergedFile.txt

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2

hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
hadoop fs -du /user/hadoop/file1
References

 http://opensource.com/life/14/8/intro-apache-hadoop-big-data
 Hadoop In Action
 Hadoop : The definitive guide

Internship Report Data Analysis
No ratings yet
Internship Report Data Analysis
37 pages
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models For The Text Classification
No ratings yet
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models For The Text Classification
16 pages
Spribe Aviator
0% (1)
Spribe Aviator
2 pages
Semester 1 Midterm Exam PLSQL
100% (2)
Semester 1 Midterm Exam PLSQL
15 pages
AnmolLipi Key Map
100% (2)
AnmolLipi Key Map
1 page
LAS - Empowerment Technologies - Week3-4
No ratings yet
LAS - Empowerment Technologies - Week3-4
12 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
BDAO
No ratings yet
BDAO
23 pages
Lab 1
No ratings yet
Lab 1
12 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Hadoop
No ratings yet
Hadoop
18 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Hadoop
No ratings yet
Hadoop
27 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Hadoop 6
No ratings yet
Hadoop 6
5 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Hadoop Multi Node Cluster
No ratings yet
Hadoop Multi Node Cluster
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Unit 3 PART 2
No ratings yet
Unit 3 PART 2
11 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
5 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Exp 1
No ratings yet
Exp 1
24 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
Bda A2
No ratings yet
Bda A2
17 pages
Hadoop Multinode Cluster Installation
No ratings yet
Hadoop Multinode Cluster Installation
4 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Hadoop Cluster Setup
No ratings yet
Hadoop Cluster Setup
10 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Band Theory
No ratings yet
Band Theory
15 pages
Clippers Clampers Rectifiers-03
No ratings yet
Clippers Clampers Rectifiers-03
43 pages
Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)
No ratings yet
Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)
32 pages
Evolving Neural Network Weights For Time-Series Prediction of General Aviation Flight Data
No ratings yet
Evolving Neural Network Weights For Time-Series Prediction of General Aviation Flight Data
11 pages
Engineering Chemistry PDF
No ratings yet
Engineering Chemistry PDF
87 pages
NAC Competitive Exam Note - 5th Level Note
No ratings yet
NAC Competitive Exam Note - 5th Level Note
25 pages
LCL RNA Daily - Landscape
No ratings yet
LCL RNA Daily - Landscape
12 pages
99992ALS1-rh Main Landing Gear
No ratings yet
99992ALS1-rh Main Landing Gear
2 pages
ICT IGCSE - Hardware and Software - Computers - Quizizz
No ratings yet
ICT IGCSE - Hardware and Software - Computers - Quizizz
5 pages
Digital Version of The Rapid Automatized Naming (RAN) : A Contribution To Early Detection of Reading Problems in Children
No ratings yet
Digital Version of The Rapid Automatized Naming (RAN) : A Contribution To Early Detection of Reading Problems in Children
9 pages
Advanced Computer Network Security Issues
No ratings yet
Advanced Computer Network Security Issues
6 pages
PGD BIM 2 Semester Course Outline
No ratings yet
PGD BIM 2 Semester Course Outline
14 pages
Hardware Maintenance
No ratings yet
Hardware Maintenance
10 pages
2nd Generation Computers
No ratings yet
2nd Generation Computers
3 pages
Demo Script
No ratings yet
Demo Script
32 pages
SFRA6 US Web
No ratings yet
SFRA6 US Web
2 pages
Aditya Shah CV PDF
No ratings yet
Aditya Shah CV PDF
2 pages
An Online Hotel Booking System
50% (2)
An Online Hotel Booking System
6 pages
Coresight Mtb-M0+: Technical Reference Manual
No ratings yet
Coresight Mtb-M0+: Technical Reference Manual
55 pages
SE Answer Key
No ratings yet
SE Answer Key
17 pages
Cv-Mohd Salman
No ratings yet
Cv-Mohd Salman
4 pages
3D Photography
No ratings yet
3D Photography
2 pages
Unit 2: Classes, Objects and Class Diagrams
No ratings yet
Unit 2: Classes, Objects and Class Diagrams
17 pages
WEEK 007 008 MODULE Selecting and Cutting Out Part of An Image
No ratings yet
WEEK 007 008 MODULE Selecting and Cutting Out Part of An Image
3 pages
Assignment 1: Operations Research
No ratings yet
Assignment 1: Operations Research
2 pages
Group4 Asm Final Nguyenviettien BH00785
No ratings yet
Group4 Asm Final Nguyenviettien BH00785
154 pages
Open VPN
No ratings yet
Open VPN
11 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
M1120 Calculus (III) Lecture
No ratings yet
M1120 Calculus (III) Lecture
10 pages
Tales From The Evil Empire - Recovering The Admin Password in Orchard
No ratings yet
Tales From The Evil Empire - Recovering The Admin Password in Orchard
3 pages
COMP8780 Assignment Two - 2021-Final
No ratings yet
COMP8780 Assignment Two - 2021-Final
10 pages
Crit - B - Record - of - Tasks IA
No ratings yet
Crit - B - Record - of - Tasks IA
3 pages
3 DOF Gyroscope Data Sheet
No ratings yet
3 DOF Gyroscope Data Sheet
2 pages

6 Hadoop

Uploaded by

6 Hadoop

Uploaded by

Hadoop

 The exponential growth of data first presented

 Hadoop is an open source framework for writing and running

– Mapreduce – distributed computation

Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.

export JAVA_HOME = /opt/jdk1.7.0

[hadoop-user@master]$ cat masters

[hadoop-user@master]$ bin/hadoop namenode -format

Basic file commands

hadoop fs -cat example.txt | head

hadoop fs –rm example.txt

hadoop fs -chmod 777 -R example.txt

hadoop fs -chown hdfs:hadoop example.txt

hadoop copyFromLocal example.txt .

hadoop fs -getmerge files/ mergedFile.txt

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2

You might also like