Day 5
Day 5
Execution
The corresponding Tez or map reduce job is executed
on the Hadoop cluster
INSIDE NAMESPACE
Inside Namespace
NameNode
HDFS has a master slave architecture and hdfs cluster
consists of a single name node which is a master server
that manages the file system ,namespace and
regulates the access to the files by clients
fsimage _N
It contains the entire file system ,Namespace
including the mapping of the blocks to files and file
system properties
edit _N
a transaction log that records every change that
occurs to file system meta data
DataNode
HDFS exposes a file system namespace and allows
users data to be stored in files
Internally a file is spilt into one or more blocks and
these blocks are stored in a set of data nodes
The data nodes are responsible for
1- Handling read and write requests from clients
2- Performing block creation ,deletion and
replication upon instructions from the name
node.the name node make all decisions
regarding replications of blocks
3- Sending heartbeats to the name node
4- Sending block report to the name node
HDFS hdfs-site.xml
Benefits
Scalability Supports horizontal scaling
Multiple Namespace using this can divide the big
data
Hadoop@ip-123-45-55-245 ~ here Hadoop is
username and ip-123-45-55-245 is hostname
Default storage path for file in Hadoop is /user/Hadoop
hdfs dfs -ls /user/hadoop
hue
TASKS
Complete lab assing 3
Use python stdin library and do the read and write
operation on a sample text file