0% found this document useful (0 votes)
55 views20 pages

6 Hadoop

Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of computers. It allows for the distributed processing of large datasets across clusters of computers using simple programming models. It scales up from single servers to thousands of machines, with very high fault tolerance. The core of Hadoop includes Hadoop Distributed File System for storage, and MapReduce for distributed computing.

Uploaded by

Suresh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views20 pages

6 Hadoop

Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of computers. It allows for the distributed processing of large datasets across clusters of computers using simple programming models. It scales up from single servers to thousands of machines, with very high fault tolerance. The core of Hadoop includes Hadoop Distributed File System for storage, and MapReduce for distributed computing.

Uploaded by

Suresh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Hadoop

By Dinesh Amatya
Hadoop

 The exponential growth of data first presented


challenges to cutting-edge businesses such as
Google, Yahoo, Amazon, and Microsoft
 Google publicize GFS, MapReduce
 Doug Cutting led the charge to develop an open
source version of this MapReduce system called
Hadoop
 Yahoo supported

Hadoop

 Hadoop is an open source framework for writing and running


distributed applications that process large amounts of data
– Hdfs - distributed storage

– Mapreduce – distributed computation


 transfers code instead of data
 data replication


Building blocks of Hadoop

 NameNode
 DataNode
 JobTracker
 TaskTracker
 Secondary NameNode



Setting up SSH for a Hadoop
cluster
Define a common account
Verify SSH installation Sudo apt-get install openssh-server
or
[hadoop-user@master]$ which ssh
sudo dpkg -i openssh.deb
/usr/bin/ssh
[hadoop-user@master]$ which sshd
/usr/bin/sshd
[hadoop-user@master]$ which ssh-keygen
/usr/bin/ssh-keygen
Setting up SSH for a Hadoop
cluster
Generate SSH key pair
[hadoop-user@master]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop-user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.


Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub.
Setting up SSH for a Hadoop
cluster
Distribute public key and validate logins
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key
[hadoop-user@target]$mkdir ~/.ssh
[hadoop-user@target]$chmod 700 ~/.ssh
[hadoop-user@target]$mv ~/master_key ~/.ssh/authorized_keys
[hadoop-user@target]$chmod 600 ~/.ssh/authorized_keys
[locally :: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ]
[hadoop-user@master]$ ssh target
Last login: Sun Jan 4 15:32:49 2009 from master
Running Hadoop

[hadoop-user@master]$gedit .bashrc

export JAVA_HOME = /opt/jdk1.7.0


export PATH = $PATH:$JAVA_HOME/bin
Running Hadoop

[hadoop-user@master]$ cd $HADOOP_HOME/conf

hadoop-env.sh
export JAVA_HOME=/usr/share/jdk
Running Hadoop
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_tmp</value>
</property>
</configuration>
Running Hadoop

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Running Hadoop

hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Running Hadoop

[hadoop-user@master]$ cat masters


localhost
[hadoop-user@master]$ cat slaves
localhost

[hadoop-user@master]$ bin/hadoop namenode -format


[hadoop-user@master]$ bin/start-all.sh
Running Hadoop

In file .bashrc

export HADOOP_HOME=/opt/programs/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin
Web-based cluster UI
Web-based cluster UI
Working with files in HDFS

Basic file commands


hadoop fs -cmd <args>

hadoop fs –ls /
hadoop fs –mkdir /user/chuck
hadoop fs -put example.txt .
hadoop fs -put example.txt /user/chuck
hadoop fs -get example.txt .
Working with files in HDFS

hadoop fs -cat example.txt | head

hadoop fs –rm example.txt


hadoop fs –rmr /user/hdfs/dir1

hadoop fs -chmod 777 -R example.txt

hadoop fs -chown hdfs:hadoop example.txt


Working with files in HDFS

hadoop copyFromLocal example.txt .


hadoop copyToLocal example.txt .

hadoop fs -getmerge files/ mergedFile.txt

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2


hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
hadoop fs -du /user/hadoop/file1
References

 http://opensource.com/life/14/8/intro-apache-hadoop-big-data
 Hadoop In Action
 Hadoop : The definitive guide

You might also like