0% found this document useful (0 votes)
8 views7 pages

Day 5

Uploaded by

sidhrajsz112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Day 5

Uploaded by

sidhrajsz112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

In the above image swap the hive and hive server 2

Issuing commands using hive cli ,or awave interface or


JDBC / ODBC client a hive query is submitted to the hive
server
Hive query client the hive query is compiled optimized
and planned as a tez/mapreduce job

Execution
The corresponding Tez or map reduce job is executed
on the Hadoop cluster

Hive is distributed parallel processing data ware house


engine
Hive is not a database but it uses a database called
metastore to restore the tables that u define
Hive uses by default derby database .A hive table
consists of a schema stored in meta store and data
stored in HDFS

INSIDE NAMESPACE
Inside Namespace

Sync time by default  30 secs

Fsimage _N ->The fsimage file contains the entire


metadata of the file system, including:
 File and directory structure (the hierarchical
namespace).
 File permissions, replication factors, and ownership
information.
 Block locations and the association of blocks with
files.
Edit _N --> has changelog

Daemon Processes are running inside the Namespace

NameNode
HDFS has a master slave architecture and hdfs cluster
consists of a single name node which is a master server
that manages the file system ,namespace and
regulates the access to the files by clients

The NameNode has the following features


1- Acts as a master of the data node
2- Execute file system namespace operations like
opening,closing and renaming files and directory
3- Determines the mapping of the blocks to data
nodes
4- Maintains the file syste namespace
5- The name node performs above tasks by
maintaining two files 1- fsimage _N 2- edit _N

fsimage _N
It contains the entire file system ,Namespace
including the mapping of the blocks to files and file
system properties

edit _N
a transaction log that records every change that
occurs to file system meta data

DataNode
HDFS exposes a file system namespace and allows
users data to be stored in files
Internally a file is spilt into one or more blocks and
these blocks are stored in a set of data nodes
The data nodes are responsible for
1- Handling read and write requests from clients
2- Performing block creation ,deletion and
replication upon instructions from the name
node.the name node make all decisions
regarding replications of blocks
3- Sending heartbeats to the name node
4- Sending block report to the name node

NameNode HA(High availability)


The HDFS name node has a single point failure to
address this Hadoop introduced high availability of
name node. In nutshell
HA provides the option of running two name nodes in
the same cluster in a active/passive configuration with
a hot(ON rehta hai) standby

HDFS  hdfs-site.xml

HDFS Fedration (MC Kinsey def asks in interview)(also is


used in AWS EMR)
HDFS FEDRATION

Refers to the ability of name node to work


independently of each other Hadoop 2.x introduces a
scaling mechanism for the name node reffered as HDFS
Fedration as opposed to a single name node the new
Hadoop infrastructure provides for multiple name nodes
that runs independently of each other

Benefits
Scalability Supports horizontal scaling
Multiple Namespace  using this can divide the big
data
Hadoop@ip-123-45-55-245 ~  here Hadoop is
username and ip-123-45-55-245 is hostname
Default storage path for file in Hadoop is /user/Hadoop
hdfs dfs -ls /user/hadoop

emr web interfaces aws documentation


View web interfaces hosted on Amazon EMR clusters - Amazon EMR

hue
TASKS
Complete lab assing 3
Use python stdin library and do the read and write
operation on a sample text file

You might also like