0% found this document useful (0 votes)

179 views38 pages

1 Hdfs Notes

Big data has 3 V's - volume, velocity, and variety. Hadoop is a framework for distributed storage and processing of big data across clusters of commodity hardware. It uses HDFS for storage and MapReduce for distributed processing. HDFS stores data reliably across machines as blocks and uses a namenode and datanodes. The namenode manages file metadata and datanodes store blocks. HDFS provides high throughput access to large datasets.

Uploaded by

Sandeep Boyina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views38 pages

1 Hdfs Notes

Uploaded by

Sandeep Boyina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

What is BIGDATA?

3 V’s of BIGDATA

Volume Velocity
Social
Petabyte scale Sensor

Big Data

Variety
Structured
Semi-structured
Unstructured
What is Hadoop?
New Hardware & Software Approach
to handle BIGDATA
New Hardware Approach New Software Approach
HDFS
A self-healing distributed filesystem running on
clusters of commodity hardware, intended for storing
large files with streaming data access patterns.
Principles of HDFS
• Highly fault-tolerant
• Designed to be deployed on low-cost hardware
• Highly scalable
• Provides high throughput access to application data
• Suitable for applications that have large data
sets(typically GBs to TBs)
• Portable across heterogeneous hardware and
operating system platforms
• No support for random updates but append is
allowed
HDFS Concepts
• File is split into blocks for storing in HDFS. Blocks of the
same file are distributed across multiple machines in the
cluster.

• Concept of block
• minimum amount of data that can be read or written

• Size on normal file system: few kilobytes(~2KB)

• Size in HDFS: Default 64 MB but can be increased upto

128 MB.
Namenode = master
• Manages filesystem namespace(filesystem
tree, metadata for dirs and files)and
maintains editlog file. The namespace image
and editlogs are stored persistently on disk.

• Stores the info about which datanodes

stores the blocks of given file. This info is
stored in RAM.
Datanode = slave
• Serve as storage for data blocks

• Responsible for serving read and write

requests from the clients

• Sends periodic "heartbeat" to Namenode and

also sends block reports
Write Path in HDFS
Read Path in HDFS
Fault Tolerance and
Self-Healing in HDFS
Detecting DataNode Failures: HeartBeat
Filesystem Metadata
• The HDFS namespace is stored by Namenode.
• Namenode uses a transaction log called the EditLog
to record every change that occurs to the
filesystem meta data.
– For example, creating a new file.
– Change replication factor of a file
– EditLog is stored in the Namenode’s local filesystem
• Entire filesystem namespace including mapping of
blocks to files and file system properties is stored in
a file FsImage. Stored in Namenode’s local
filesystem.
• It periodically merges Edit log with FsImage.
HDFS Access

 WebHDFS
HDFS Shell Command
Command Operation
Lists the contents of the directory specified by path, showing
-ls path the names, permissions, owner, size and modification date
for each entry.
Behaves like -ls, but recursively displays entries in all
-lsr path
subdirectories of path.
Shows disk usage, in bytes, for all files which match path;
-du path
filenames are reported with the full HDFS protocol prefix.
Moves the file or directory indicated by src to dest, within
-mv src dest
HDFS.
Copies the file or directory identified by src to dest, within
-cp src dest
HDFS.
-rm path Removes the file or empty directory identified by path.
Removes the file or directory identified by path. Recursively
-rmr path
deletes any child entries (i.e., files or subdirectories of path).
Copies the file or directory from the local file system
-put localSrc dest
identified by localSrc to dest within the DFS.
-copyFromLocal localSrc dest Identical to -put
Copies the file or directory from the local file system
-moveFromLocal localSrc dest identified by localSrc to dest within HDFS, then deletes the
local copy on success.
Copies the file or directory in HDFS identified by src to the
-get [-crc] src localDest
local file system path identified by localDest.
HDFS Shell Command
Command Operation
-copyToLocal [-crc] src localDest Identical to -get
-moveToLocal [-crc] src localDest Works like -get, but deletes the HDFS copy on success.
-cat filename Displays the contents of filename on stdout.
Creates a directory named path in HDFS. Creates any parent
-mkdir path directories in path that are missing (e.g., like mkdir -p in
Linux).
Returns 1 if path exists; has zero length; or is a directory, or
-test -[ezd] path
0 otherwise.
Prints information about path. format is a string which
-stat [format] path accepts file size in blocks (%b), filename (%n), block size
(%o), replication (%r), and modification date (%y, %Y).
-tail [-f] file Shows the lats 1KB of file on stdout.
Changes the file permissions associated with one or more
objects identified by path.... Performs changes recursively
-chmod [-R] mode,mode,... path... with -R. mode is a 3-digit octal mode, or {augo}+/-{rwxX}.
Assumes a if no scope is specified and does not apply a
umask.
Sets the owning user and/or group for files or directories
-chown [-R] [owner][:[group]] path...
identified by path.... Sets owner recursively if -R is specified.
Returns usage information for one of the commands listed
-help cmd
above. You must omit the leading '-' character in cmd
Enable WebHDFS in Your Cluster

Step1: Add the following property into hdfs-site.xml to enabling

HDFS access:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

Location of hdfs-site.xml: /etc/hadoop/conf/hdfs-site.xml

Step2: Restart hdfs service from cloudera manager

1) Create a directory called temp under /user/cloudera

http://localhost:50070/webhdfs/v1/user/cloudera/temp?user.name=cl
oudera&op=MKDIRS

2) Get the status of a directory /user/cloudera

http://localhost:50070/webhdfs/v1/user/cloudera?user.name=clouder
a&op=GETFILESTATUS

3) Create and write into a file

4) Open and read a file

Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Hive Lab
No ratings yet
Hive Lab
33 pages
Assessing Marketing Information Needs
100% (1)
Assessing Marketing Information Needs
34 pages
Hands-On Hadoop Tutorial
100% (1)
Hands-On Hadoop Tutorial
13 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Hadoop Fundamentals and Hive Interview Questions
No ratings yet
Hadoop Fundamentals and Hive Interview Questions
8 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
Top 100 Hadoop Interview Questions and Answers 2016
No ratings yet
Top 100 Hadoop Interview Questions and Answers 2016
21 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Hadoop
No ratings yet
Hadoop
34 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Pyspark Learning Hub
No ratings yet
Pyspark Learning Hub
7 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
ApacheSpark MyNotes
No ratings yet
ApacheSpark MyNotes
6 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
Interview
No ratings yet
Interview
86 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
Hadoop, Hbase, and Hive
No ratings yet
Hadoop, Hbase, and Hive
25 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Day 4-01-Spark
No ratings yet
Day 4-01-Spark
43 pages
Database Services in AWS: Relational Databases
No ratings yet
Database Services in AWS: Relational Databases
9 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
Explain in Detail About Hadoop Framework
No ratings yet
Explain in Detail About Hadoop Framework
4 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
50 PySpark Interview Questions PDF
No ratings yet
50 PySpark Interview Questions PDF
7 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
DBT Flow
No ratings yet
DBT Flow
15 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
AWS S3 Interview Questions
No ratings yet
AWS S3 Interview Questions
4 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Cloudera Spark
No ratings yet
Cloudera Spark
70 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Apache Spark RDD API Examples
No ratings yet
Apache Spark RDD API Examples
38 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Data Engineer Interview Questions
No ratings yet
Data Engineer Interview Questions
16 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
No ratings yet
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
8 pages
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
100% (1)
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
72 pages
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
No ratings yet
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
11 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
ManiManjari by Chilakamarthi
No ratings yet
ManiManjari by Chilakamarthi
83 pages
Jookalu Jaajulu by Vedagiri Rambabu (AB Monthly May2015)
No ratings yet
Jookalu Jaajulu by Vedagiri Rambabu (AB Monthly May2015)
39 pages
4.b-cdh Installation Via Cloudera Manager
No ratings yet
4.b-cdh Installation Via Cloudera Manager
17 pages
Mrogindi Veenaa (AB Monthly June2015)
No ratings yet
Mrogindi Veenaa (AB Monthly June2015)
33 pages
Missing (AB Monthly March, April, 2015)
No ratings yet
Missing (AB Monthly March, April, 2015)
66 pages
Mysql Document
No ratings yet
Mysql Document
60 pages
Swathi Weekly 16th October 2015
100% (2)
Swathi Weekly 16th October 2015
69 pages
SV Labs
No ratings yet
SV Labs
6 pages
Module2 1
100% (1)
Module2 1
84 pages
Swathi Weekly 4th December 2015 PDF
No ratings yet
Swathi Weekly 4th December 2015 PDF
69 pages
ISSC 2011 Vector Divergence Considerations For Latency Optimized High Speed Asynchronous CDC v0 7 Camera Ready Submission
No ratings yet
ISSC 2011 Vector Divergence Considerations For Latency Optimized High Speed Asynchronous CDC v0 7 Camera Ready Submission
6 pages
Vlsi de Sign Vlsi de Sign: Lecture-3 Design Styles
No ratings yet
Vlsi de Sign Vlsi de Sign: Lecture-3 Design Styles
13 pages
Swathi Weekly 30th October 2015 PDF
No ratings yet
Swathi Weekly 30th October 2015 PDF
66 pages
Swathi Weekly 30th October 2015
No ratings yet
Swathi Weekly 30th October 2015
66 pages
Layout Design8
No ratings yet
Layout Design8
34 pages
War and Peace
No ratings yet
War and Peace
2,882 pages
Non-Exclusive Partnership Agreement
No ratings yet
Non-Exclusive Partnership Agreement
17 pages
Unisex Pajama Pattern 1
No ratings yet
Unisex Pajama Pattern 1
29 pages
Property Database (Market Value Finder)
No ratings yet
Property Database (Market Value Finder)
4 pages
Project Report of RICHA
No ratings yet
Project Report of RICHA
31 pages
BANK DISPUTES (How To Get Refunds On Transactions)
No ratings yet
BANK DISPUTES (How To Get Refunds On Transactions)
16 pages
WinterTech Inventions Volume 1
No ratings yet
WinterTech Inventions Volume 1
14 pages
Basic Exception Handling
No ratings yet
Basic Exception Handling
7 pages
Bagi Flynn 2014 Do They Have What It Takes A Review of The Literature On Knowledge Competencies and Skills Necessary For
No ratings yet
Bagi Flynn 2014 Do They Have What It Takes A Review of The Literature On Knowledge Competencies and Skills Necessary For
24 pages
Igs Bill 1
No ratings yet
Igs Bill 1
9 pages
AFI Changemakers and UNCTAD Delegates Report On Technology 2019
No ratings yet
AFI Changemakers and UNCTAD Delegates Report On Technology 2019
35 pages
Module 5: Design and Investigation of Steel Beams Lesson Outcomes
No ratings yet
Module 5: Design and Investigation of Steel Beams Lesson Outcomes
8 pages
Forrester Predictions2025 B2C CX
No ratings yet
Forrester Predictions2025 B2C CX
9 pages
TestReach Corfe Co 21
No ratings yet
TestReach Corfe Co 21
1 page
Varun Job CV New
No ratings yet
Varun Job CV New
2 pages
Are QSM Manual Rev 08
No ratings yet
Are QSM Manual Rev 08
43 pages
Adda247 - No. 1 APP For Banking & SSC Preparation
No ratings yet
Adda247 - No. 1 APP For Banking & SSC Preparation
6 pages
III Term Paper EM
No ratings yet
III Term Paper EM
5 pages
Date Sheet For The BS in Computer Science 4 Year Programe 1st 2nd 3rd 69039
No ratings yet
Date Sheet For The BS in Computer Science 4 Year Programe 1st 2nd 3rd 69039
3 pages
Paper of Alexander Huth, Austin, Texas University On fMRI
No ratings yet
Paper of Alexander Huth, Austin, Texas University On fMRI
20 pages
Internship 3+1 Participant Form 1
No ratings yet
Internship 3+1 Participant Form 1
2 pages
Modul Praktik Network
No ratings yet
Modul Praktik Network
74 pages
The Use of Multimedia Instruction in Enhancing Learners' Performance in Filipino 6
No ratings yet
The Use of Multimedia Instruction in Enhancing Learners' Performance in Filipino 6
33 pages
LC 33
No ratings yet
LC 33
2 pages
ThirdYear CSBS-2023
No ratings yet
ThirdYear CSBS-2023
87 pages
SM M315F (MobileRdx - Com)
No ratings yet
SM M315F (MobileRdx - Com)
36 pages
Rhea Vendors Lioness XS Manual
No ratings yet
Rhea Vendors Lioness XS Manual
49 pages
Rosemount Level Switch
No ratings yet
Rosemount Level Switch
24 pages
50 Jenkins Interview Questions and Answers 2023
No ratings yet
50 Jenkins Interview Questions and Answers 2023
10 pages
Wo 17088 (11P) PLC Panel 04082024
100% (1)
Wo 17088 (11P) PLC Panel 04082024
43 pages

1 Hdfs Notes

Uploaded by

1 Hdfs Notes

Uploaded by

What is BIGDATA?

• Size on normal file system: few kilobytes(~2KB)

• Size in HDFS: Default 64 MB but can be increased upto

• Stores the info about which datanodes

• Responsible for serving read and write

• Sends periodic "heartbeat" to Namenode and

Step1: Add the following property into hdfs-site.xml to enabling

Location of hdfs-site.xml: /etc/hadoop/conf/hdfs-site.xml

Step2: Restart hdfs service from cloudera manager

2) Get the status of a directory /user/cloudera

3) Create and write into a file

4) Open and read a file

You might also like