0% found this document useful (0 votes)

129 views27 pages

Hadoop

Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers. It provides reliable data storage with the Hadoop Distributed File System (HDFS) and supports distributed processing of large datasets with its MapReduce programming model. HDFS stores data reliably in a master-slave architecture, with one NameNode master and multiple DataNodes. MapReduce allows distributed processing of large datasets across a cluster in a fault-tolerant manner using a map and reduce paradigm. Hadoop can be configured in both single-node and multi-node cluster modes.

Uploaded by

Narasimha Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views27 pages

Hadoop

Uploaded by

Narasimha Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Overview

Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework. Hadoop wiki

HDFS
Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster. It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time. Hadoop Distributed File System Goals: Store large data sets Cope with hardware failure Emphasize streaming data access

Map Reduce
The Hadoop Map/Reduce framework harnesses a cluster of machines and executes user defined Map/Reduce jobs across the nodes in the cluster. A Map/Reduce computation has two phases, a map phase and a reduce phase. The input to the computation is a data set of key/value pairs. Tasks in each phase are executed in a fault-tolerant manner, if node(s) fail in the middle of a computation the tasks assigned to them are re-distributed among the remaining nodes. Having many map and reduce tasks enables good load balancing and allows failed tasks to be re-run with small runtime overhead. Hadoop Map/Reduce Goals: Process large data sets Cope with hardware failure High throughput

http://labs.google.com/papers/mapreduce.html

Architecture
Like Hadoop Map/Reduce, HDFS follows a master/slave architecture. An HDFS installation consists of a single Namenode, a master server that manages the filesystem namespace and regulates access to files by clients. In addition, there are a number of Datanodes, one per node in the cluster, which manage storage attached to the nodes that they run on. The Namenode makes filesystem namespace operations like opening, closing, renaming etc. of files and directories available via an RPC interface. It also determines the mapping of blocks to Datanodes. The Datanodes are responsible for serving read and write requests from filesystem clients, they also perform block creation, deletion, and replication upon instruction from the Namenode.

Architecture

Downloading and installing Hadoop

Hadoop can be downloaded from one of the Apache download mirrors. Select a directory to install Hadoop under (let's say /foo/bar/hadoop-install) and untar the tarball in that directory. A directory corresponding to the version of Hadoop downloaded will be created under the /foo/bar/hadoopinstall directory. For instance, if version 0.6.0 of Hadoop was downloaded untarring as described above will create the directory /foo/bar/hadoop-install/hadoop-0.6.0. The examples in this document assume the existence of an environment variable $HADOOP_INSTALL that represents the path to all versions of Hadoop installed. In the above instance HADOOP_INSTALL=/foo/bar/hadoopinstall. They further assume the existence of a symlink named hadoop in $HADOOP_INSTALL that points to the version of Hadoop being used. For instance, if version 0.6.0 is being used then $HADOOP_INSTALL/hadoop -> hadoop-0.6.0. All tools used to run Hadoop will be present in the directory $HADOOP_INSTALL/hadoop/bin. All configuration files for Hadoop will be present in the directory $HADOOP_INSTALL/hadoop/conf

Single-node setup of Hadoop

Configurations
Files to configure:
hadoop-env.sh
Open the file <HADOOP_INSTALL>/conf/hadoop-env.sh in the editor of your choice and set the JAVA_HOME environment variable to the Sun JDK/JRE 1.5.0 directory.

------------------------------------------------------------------# # The java implementation to use. Required. export JAVA_HOME=/usr/lib/j2sdk1.5-sun

---------------------------------------------------------- hadoop-site.xml
Any site-specific configuration of Hadoop is configured in <HADOOP_INSTALL>/conf/hadoop-site.xml. Here we will configure the directory where Hadoop will store its data files, the ports it listens to, etc. You can leave the settings below as is with the exception of the hadoop.tmp.dir variable which you have to change to the directory of your choice, for example /usr/local/hadoop-datastore/hadoop-${user.name}.

-------------------------------------------------------------------<property> <name>hadoop.tmp.dir</name>

<value>/your/path/to/hadoop/tmp/dir/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description> </property>

----------------------------------------------------------------------

Starting the single-node cluster

Formatting the name node:
The first step to starting up your Hadoop installation is formatting the Hadoop file system which is implemented on top of the local file system of your "cluster. You need to do this the first time you set up a Hadoop cluster. cluster. Do not format a running Hadoop filesystem, this will cause all your data to be erased. run the command : hadoop@ubuntu:~$ <HADOOP_INSTALL>/hadoop/bin/hadoop namenode format

Starting cluster:
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker .
Run the command: hadoop@ubuntu:~$ <HADOOP_INSTALL>/bin/start-all.sh

Stopping cluster:
To stop all the daemons running on your machine,
run the command: hadoop@ubuntu:~$ <HADOOP_INSTALL>/bin/stop-all.sh

Multi-Node setup on Hadoop

We will build a multi-node cluster using two Ubuntu boxes in this tutorial. The best way to do this is to install, configure and test a "local" Hadoop setup for each of the two Ubuntu boxes, and in a second step to "merge" these two single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master (but also act as a slave with regard to data storage and processing), and the other box will become only a slave. The master node will run the "master" daemons for each layer: namenode for the HDFS storage layer, and jobtracker for the MapReduce processing layer. Both machines will run the "slave" daemons: datanode for the HDFS layer, and tasktracker for MapReduce processing layer. Basically, the "master" daemons are responsible for coordination and management of the "slave" daemons while the latter will do the actual data storage and data processing work. It's recommended to use the same settings (e.g., installation locations and paths) on both machines.

Configurations
Now we will modify the Hadoop configuration to make one Ubuntu box the master (which will also act as a slave) and the other Ubuntu box a slave. We will call the designated master machine just the master from now and the slave-only machine the slave. Both machines must be able to reach each other over the network Shutdown each single-node cluster with <HADOOP_INSTALL>/bin/stop-all.sh before continuing if you haven't done so already.

Configurations
Files to configure:
conf/masters (master only) The conf/masters file defines the master nodes of our multi-node cluster. In our case, this is just the master machine. On master, update <HADOOP_INSTALL>/conf/masters that it looks like this: ---------------------master --------------------conf/slaves (master only) This conf/slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will run. We want both the master box and the slave box to act as Hadoop slaves because we want both of them to store and process data. On master, update <HADOOP_INSTALL>/conf/slaves that it looks like this: -----------------Master

slave ------------------If you have additional slave nodes, just add them to the conf/slaves file, one per line.

Configurations
conf/hadoop-site.xml (all machines): Assuming you configured conf/hadoop-site.xml on each machine as described in the single-node cluster tutorial, you will only have to change a few variables. Important: You have to change conf/hadoop-site.xml on ALL machines as follows. First, we have to change the fs.default.name variable which specifies the NameNode (the HDFS master) host and port. In our case, this is the master machine. -----------------------------------------<property>

<name>fs.default.name</name>
<value>hdfs://master:54310</value> <description>The name of the default file system. . . </property> ---------------------------------------

Second, we have to change the mapred.job.tracker variable which specifies the JobTracker (MapReduce master) host and port. Again, this is the master in our case.
------------------------------------------------------<property> <name>mapred.job.tracker</name> <value>master:54311</value> <description>The host and port that the MapReduce job tracker runs at . . . </description> </property> -------------------------------------------------

Configurations
Third, we change the dfs.replication variable which specifies the default block replication. It defines how many machines a single file should be replicated to before it becomes available. If you set this to a value higher than the number of slave nodes that you have available, you will start seeing a lot of type errors in the log files. --------------------------------<property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. . .</description> </property> ----------------------------------

Additional settings: conf/hadoop-site.xml

You can change the mapred.local.dir variable which determines where temporary MapReduce data is written. It also may be a list of directories.

Starting the multi-node cluster

:Formatting the namenode
Before we start our new multi-node cluster, we have to format Hadoop's distributed filesystem (HDFS) for the namenode. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop namenode, this will cause all your data in the HDFS filesytem to be erased. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable on the namenode), run the command (from the master): -------------------------------------------bin/hadoop namenode -format ---------------------------------------------

Starting the multi-node cluster:

Starting the cluster is done in two steps. First, the HDFS daemons are started: the namenode daemon is started on master, and datanode daemons are started on all slaves (here: master and slave). Second, the MapReduce daemons are started: the jobtracker is started on master, and tasktracker daemons are started on all slaves (here: master and slave).

Starting the multi-node cluster

HDFS daemons: Run the command <HADOOP_INSTALL>/bin/start-dfs.sh on the machine you want the namenode to run on. This will bring up HDFS with the namenode running on the machine you ran the previous command on, and datanodes on the machines listed in the conf/slaves file. In our case, we will run bin/start-dfs.sh on master:

------------------------bin/start-dfs.sh --------------------------On slave, you can examine the success or failure of this command by inspecting the log file <HADOOP_INSTALL>/logs/hadoop-hadoop-datanode-slave.log. At this point, the following Java processes should run on master: ----------------------------------hadoop@master:/usr/local/hadoop$ jps 14799 NameNode 15314 Jps 14880 DataNode 14977 SecondaryNameNode ------------------------------------

Starting the multi-node cluster

and the following Java processes should run on slave: -------------------------------------hadoop@slave:/usr/local/hadoop$ jps 15183 DataNode 15616 Jps ---------------------------------------

MapReduce daemons:
Run the command <HADOOP_INSTALL>/bin/start-mapred.sh on the machine you want the jobtracker to run on. This will bring up the MapReduce cluster with the jobtracker running on the machine you ran the previous command on, and tasktrackers on the machines listed in the conf/slaves file.

In our case, we will run bin/start-mapred.sh on master:

------------------------------------bin/start-mapred.sh ------------------------------------On slave, you can examine the success or failure of this command by inspecting the log file <HADOOP_INSTALL>/logs/hadoop-hadoop-tasktracker-slave.log.

Starting the multi-node cluster

At this point, the following Java processes should run on master: ---------------------------------------------------hadoop@master:/usr/local/hadoop$ jps 16017 Jps 14799 NameNode 15686 TaskTracker 14880 DataNode 15596 JobTracker 14977 SecondaryNameNode

----------------------------------------------------

And the following Java processes should run on slave:

--------------------------------------hadoop@slave:/usr/local/hadoop$ jps 15183 DataNode 15897 TaskTracker 16284 Jps

-------------------------------------------

Stopping the multi-node cluster

First, we begin with stopping the MapReduce daemons: the jobtracker is stopped on master, and tasktracker daemons are stopped on all slaves (here: master and slave). Second, the HDFS daemons are stopped: the namenode daemon is stopped on master, and datanode daemons are stopped on all slaves (here: master and slave).

MapReduce daemons:
Run the command <HADOOP_INSTALL>/bin/stop-mapred.sh on the jobtracker machine. This will shut down the MapReduce cluster by stopping the jobtracker daemon running on the machine you ran the previous command on, and tasktrackers on the machines listed in the conf/slaves file. In our case, we will run bin/stop-mapred.sh on master: ------------------------------bin/stop-mapred.sh

------------------------------At this point, the following Java processes should run on master: -------------------------------------hadoop@master:/usr/local/hadoop$ jps 14799 NameNode 18386 Jps 14880 DataNode 14977 SecondaryNameNode --------------------------------------------

Stopping the multi-node cluster

And the following Java processes should run on slave: ------------------------------hadoop@slave:/usr/local/hadoop$ jps 15183 DataNode 18636 Jps --------------------------------

HDFS daemons:
Run the command <HADOOP_INSTALL>/bin/stop-dfs.sh on the namenode machine. This will shut down HDFS by stopping the namenode daemon running on the machine you ran the previous command on, and datanodes on the machines listed in the conf/slaves file. In our case, we will run bin/stop-dfs.sh on master: --------------------------------bin/stop-dfs.sh --------------------------------At this point, the only following Java processes should run on master: ------------------------------hadoop@master:/usr/local/hadoop$ jps 18670 Jps ------------------------------

Stopping the multi-node cluster

And the following Java processes should run on slave: -------------------------------hadoop@slave:/usr/local/hadoop$ jps 18894 Jps --------------------------------

Running a MapReduce job

We will now run your first Hadoop MapReduce job. We will use the WordCount example job which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occurred, separated by a tab.

Download example input data:

The Notebooks of Leonardo Da Vinci

Download the ebook as plain text file in us-ascii encoding and store the uncompressed file in a temporary directory of choice, for example /tmp/gutenberg.

Restart the Hadoop cluster

Restart your Hadoop cluster if it's not running already. -------------------------

hadoop@ubuntu:~$ <HADOOP_INSTALL>/bin/start-all.sh

Copy local data file to HDFS

Before we run the actual MapReduce job, we first have to copy the files from our local file system to Hadoop's HDFS -----------------------------

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/source destination

Running a MapReduce job

Run the MapReduce job Now, we actually run the WordCount example job. This command will read all the files in the HDFS destination directory , process it, and store the result in the HDFS directory output. -----------------------------------------

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop hadoop-example wordcount destination output

----------------------------------------You can check if the result is successfully stored in HDFS directory output. Retrieve the job result from HDFS

To inspect the file, you can copy it from HDFS to the local file system. ------------------------------------hadoop@ubuntu:/usr/local/hadoop$ mkdir /tmp/output hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs copyToLocal output/part-00000 /tmp/output ---------------------------------------Alternatively, you can read the file directly from HDFS without copying it to the local file system by using the command : --------------------------------------------hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs cat output/part-00000

Hadoop Web Interfaces

MapReduce Job Tracker Web Interface
The job tracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed jobs and a job history log file. It also gives access to the local machine's Hadoop log files (the machine on which the web UI is running on). By default, it's available at http://localhost:50030/

Task Tracker Web Interface

The task tracker web UI shows you running and non-running tasks. It also gives access to the local machine's Hadoop log files. By default, it's available at http://localhost:50060/

HDFS Name Node Web Interface

The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It also gives access to the local machine's Hadoop log files. By default, it's available at http://localhost:50070/

Writing An Hadoop MapReduce Program

Even though the Hadoop framework is written in Java, programs for Hadoop need not to be coded in Java but can also be developed in other languages like Python or C++ (the latter since version 0.14.1).

Creating a launching program for your application

The launching program configures: The Mapper and Reducer to use The output key and value types (input types are inferred from the InputFormat) The locations for your input and output The launching program then submits the job and typically waits for it to complete
A Map/Reduce may specify how its input is to be read by specifying an InputFormat to be used A Map/Reduce may specify how its output is to be written by specifying an OutputFormat to be used

Bibliography

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(SingleNode_Cluster)#Running_a_MapReduce_job http://wiki.apache.org/hadoop/

Challan For JNTUA Fee Payment-JNTUWORLD
No ratings yet
Challan For JNTUA Fee Payment-JNTUWORLD
2 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Bda A2
No ratings yet
Bda A2
17 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Lab 1
No ratings yet
Lab 1
12 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
6 Hadoop
No ratings yet
6 Hadoop
20 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
BDA Lab File
No ratings yet
BDA Lab File
4 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Bda Record
No ratings yet
Bda Record
27 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Hadoop 6
No ratings yet
Hadoop 6
5 pages
Hadoop
No ratings yet
Hadoop
31 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
How To Install and Set Up A 3-Node Hadoop Cluster
No ratings yet
How To Install and Set Up A 3-Node Hadoop Cluster
36 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
4 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Assignment 1 Write-Up
No ratings yet
Assignment 1 Write-Up
8 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Lab 0-Cluster With Multiple VMs-30-01-2024
No ratings yet
Lab 0-Cluster With Multiple VMs-30-01-2024
6 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Hadoop Multi Node Cluster
No ratings yet
Hadoop Multi Node Cluster
7 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop 3
No ratings yet
Hadoop 3
8 pages
Hadoop Installation Cluster
No ratings yet
Hadoop Installation Cluster
9 pages
Hadoop Week 2
No ratings yet
Hadoop Week 2
40 pages
CLD 7
No ratings yet
CLD 7
3 pages
Start Hadoop
No ratings yet
Start Hadoop
4 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Unit 3
No ratings yet
Unit 3
25 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Unit IV
No ratings yet
Unit IV
10 pages
Unit 3 PART 2
No ratings yet
Unit 3 PART 2
11 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Department of Computer Engineering Istanbul S. Zaim University, Istanbul, Turkey
No ratings yet
Department of Computer Engineering Istanbul S. Zaim University, Istanbul, Turkey
42 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Running Hadoop On Ubuntu Linux
No ratings yet
Running Hadoop On Ubuntu Linux
15 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop
No ratings yet
Hadoop
18 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
Hadoop 1
No ratings yet
Hadoop 1
26 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Review On Early Strength Concrete
No ratings yet
Review On Early Strength Concrete
6 pages
Review Article: Mechanical Characteristics of Hardened Concrete With Different Mineral Admixtures: A Review
No ratings yet
Review Article: Mechanical Characteristics of Hardened Concrete With Different Mineral Admixtures: A Review
16 pages
173-01714 Calcium Chloride in Portland Cement Concrete
No ratings yet
173-01714 Calcium Chloride in Portland Cement Concrete
2 pages
A Comparative Study of Bis and Bs (Doe) Method of Concrete Mix Design Using Alccofine As A Partial Replacement of Cement"
No ratings yet
A Comparative Study of Bis and Bs (Doe) Method of Concrete Mix Design Using Alccofine As A Partial Replacement of Cement"
9 pages
Development of High Strength Fly Ash Based Geopolymer Concrete With Alccofine
No ratings yet
Development of High Strength Fly Ash Based Geopolymer Concrete With Alccofine
4 pages
Study of Strength Characteristics of Crushed Glass Used As Fine Aggregate in Concrete
No ratings yet
Study of Strength Characteristics of Crushed Glass Used As Fine Aggregate in Concrete
4 pages
Experiment: - 3 Elongation Test
No ratings yet
Experiment: - 3 Elongation Test
2 pages
Water Resources Engineering
No ratings yet
Water Resources Engineering
2 pages
Front Material
No ratings yet
Front Material
18 pages
Chapter - 10: Logic-Based Testing
No ratings yet
Chapter - 10: Logic-Based Testing
43 pages
9D04102 Finite Element Methods PDF
No ratings yet
9D04102 Finite Element Methods PDF
2 pages
Structural Analysis - II
No ratings yet
Structural Analysis - II
2 pages
Integrating HADOOP With Eclipse On A Virtual Machine: Moheeb Alwarsh
No ratings yet
Integrating HADOOP With Eclipse On A Virtual Machine: Moheeb Alwarsh
17 pages
Bus 370
No ratings yet
Bus 370
74 pages
Jntuworld: Structural Analysis - I
No ratings yet
Jntuworld: Structural Analysis - I
1 page

Hadoop

Uploaded by

Hadoop

Uploaded by

Overview

Downloading and installing Hadoop

Single-node setup of Hadoop

------------------------------------------------------------------# # The java implementation to use. Required. export JAVA_HOME=/usr/lib/j2sdk1.5-sun

Starting the single-node cluster

Multi-Node setup on Hadoop

Additional settings: conf/hadoop-site.xml

Starting the multi-node cluster

Starting the multi-node cluster:

Starting the multi-node cluster

Starting the multi-node cluster

In our case, we will run bin/start-mapred.sh on master:

Starting the multi-node cluster

And the following Java processes should run on slave:

Stopping the multi-node cluster

Stopping the multi-node cluster

Stopping the multi-node cluster

Running a MapReduce job

Download example input data:

The Notebooks of Leonardo Da Vinci

Restart the Hadoop cluster

Restart your Hadoop cluster if it's not running already. -------------------------

Copy local data file to HDFS

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/source destination

Running a MapReduce job

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop hadoop-example wordcount destination output

Hadoop Web Interfaces

Task Tracker Web Interface

HDFS Name Node Web Interface

Writing An Hadoop MapReduce Program

Creating a launching program for your application

You might also like