10 Dfs

this is data

Uploaded by

Shruti More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views5 pages

10 Dfs

this is data

Uploaded by

Shruti More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Experiment 10

Aim: To demonstrate distributed file system

Theory: Distributed file system (DFS) is a method of storing and accessing files based in
a client/server architecture. In a distributed file system, one or more central servers store
files that can be accessed, with proper authorization rights, by any number of remote clients
in the network. Distributed file systems can be advantageous because they make it easier to
distribute documents to multiple clients and they provide a centralized storage system so
that client machines are not using their resources to store files.

Hadoop Distributed File System (HDFS)

Introduction:
The Hadoop Distributed File System (HDFS) is the primary data storage system used
by Hadoop applications. It employs a NameNode and DataNode architecture to implement
a distributed file system that provides high-performance access to data across highly
scalable Hadoop clusters. HDFS is a key part of the many Hadoop ecosystem technologies,
as it provides a reliable means for managing pools of big data and supporting related big
data analytics applications. Hadoop File System was developed using distributed file system
design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly
fault tolerant and designed using low-cost hardware. HDFS holds very large amount of data
and provides easier access. To store such huge data, the files are stored across multiple
machines. These files are stored in redundant fashion to rescue the system from possible
data losses in case of failure. HDFS also makes applications available to parallel processing.

HDFS Architecture:
Hadoop Distributed File System follows the master–slave data architecture. Each cluster
comprises a single Namenode that acts as the master server in order to manage the file
system namespace and provide the right access to clients. The next terminology in the HDFS
cluster is the Datanode that is usually one per node in the HDFS cluster. The Datanode is
assigned with the task of managing the storage attached to the node that it runs on. HDFS
also includes a file system namespace that is being executed by the Namenode for general
operations like file opening, closing, and renaming, and even for directories. The Namenode
also maps the blocks to Datanodes. HDFS data platform format follows a strictly hierarchical
file system.
HDFS exposes a file system namespace and allows user data to be stored in files. Internally,
a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The
NameNode executes file system namespace operations like opening, closing, and renaming
files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes
are responsible for serving read and write requests from the file system’s clients. The
DataNodes also perform block creation, deletion, and replication upon instruction from the
NameNode.
Generally, the user data is stored in the files of HDFS. The file in a file system will be divided
into one or more segments and/or stored in individual data nodes. These file segments are
called as blocks. In other words, the minimum amount of data that HDFS can read or write is
called a Block. The default block size is 64MB, but it can be increased as per the need to
change in HDFS configuration.

Working of HDFS:
To store and process any data, the client submits the data and program to the Hadoop
cluster. Hadoop HDFS stores the data, MapReduce processes the data stored in HDFS, and
YARN divides the tasks and assigns resources.

1. HDFS
The data in Hadoop is stored in the Hadoop Distributed File System. There are two daemons
running in Hadoop HDFS that are NameNode and DataNode.

2. MapReduce
MapReduce is the processing layer in Hadoop. It processes the data in parallel across
multiple machines in the cluster. It works by dividing the task into independent subtasks and
executes them in parallel across various DataNodes. MapReduce processes the data into
two-phase, that is, the Map phase and the Reduce phase. The programmer specifies the
two functions, that is, map function and the reduce function.
3. YARN
YARN is the resource management layer in Hadoop. It schedules the task in the Hadoop
cluster and assigns resources to the applications running in the cluster. It is responsible for
providing the computational resources needed for executing the applications.
There are two YARN daemons running in the Hadoop cluster for serving YARN core services.

They are:
a. ResourceManager
b. NodeManager
c. ApplicationMaster

The NameNode is the master node and all requests go through the NameNode. The
DataNodes on the other hand are the nodes where processing is done on receiving the
request from the NameNode. NameNode is the daemon running of the master machine. It is
the centrepiece of an HDFS file system. NameNode stores the directory tree of all files in the
file system. It tracks where across the cluster the file data resides. It does not store the data
contained in these files. When the client applications want to add/copy/move/delete a file,
they interact with NameNode. The NameNode responds to the request from client by
returning a list of relevant DataNode servers where the data lives.
DataNode daemon runs on the slave nodes. It stores data in the Hadoop File System. In
functional file system data replicates across many DataNodes. On start-up, a DataNode
connects to the NameNode. It keeps on looking for the request from NameNode to access
data. Once the NameNode provides the location of the data, client applications can talk
directly to a DataNode, while replicating the data, DataNode instances can talk to each
other.

Features of HDFS:
 Distributed data storage.
 Blocks reduce seek time.
 The data is highly available as the same block is present at multiple datanodes.
 Even if multiple datanodes are down we can still do our work, thus making it highly
reliable.
 High fault tolerance.

Commands:
Command Description
Hadoop fs -ls List the files
hadoop version Shows the version of hadoop installed.
hdfs dfs -mkdir <path> This command takes the <path> as an
argument and creates the directory.
hdfs dfs -ls <path> This command displays the contents of the
directory specified by <path>. It shows the
name, permissions, owner, size and
modification date of each entry.
hdfs dfs -put <src> <dest> This command copies the file in the local
filesystem to the file in DFS.
hdfs dfs copyFrom Local <localsrc> <dest> This command is similar to put command.
But the source should refer to local file.
hdfs dfs get <src> <localdest> This Hadoop shell command copies the file in
HDFS identified by <src> to file in local file
system identified by <localdest>
hdfs dfs cat <file-name> This Hadoop shell command displays the
contents of file on console or stdout.
hdfs dfs mv <src> <dest> This Hadoop shell command moves the file
from the specified source to destination
within HDFS.
hdfs dfs cp <src> <dest> This Hadoop shell command copies the file
or directory from given source to destination
within HDFS.

Advantages of Hadoop

1. Varied Data Sources

Hadoop accepts a variety of data. Data can come from a range of sources like email
conversation, social media etc. and can be of structured or unstructured form.
2. Cost-effective
Hadoop is an economical solution as it uses a cluster of commodity hardware to store data.
Commodity hardware is cheap machines hence the cost of adding nodes to the framework
is not much high.
3. Performance
Hadoop with its distributed processing and distributed storage architecture processes huge
amounts of data with high speed. It divides the input data file into a number of blocks and
stores data in these blocks over several nodes.
4. High Throughput
Throughput means job done per unit time. A given job in Hdfs gets divided into small jobs
which work on chunks of data in parallel thereby giving high throughput.
5. Open Source
Hadoop is an open source technology i.e. its source code is freely available. We can modify
the source code to suit a specific requirement.
6. Scalable
Hadoop works on the principle of horizontal scalability i.e. we need to add the entire
machine to the cluster of nodes and not change the configuration of a machine.

Disadvantages of Hadoop

1. Issue with Small Files

Hadoop is suitable for a small number of large files but when it comes to the application
which deals with a large number of small files, Hadoop fails here.
2. Vulnerable By Nature
Hadoop is written in Java which is a widely used programming language hence it is easily
exploited by cyber criminals which makes Hadoop vulnerable to security breaches.
3. Processing Overhead
In Hadoop, the data is read from the disk and written to the disk which makes read/write
operations very expensive when we are dealing with tera and petabytes of data.
4. Supports Only Batch Processing
At the core, Hadoop has a batch processing engine which is not efficient in stream
processing. It cannot produce output in real-time with low latency.
5. Security
For security, Hadoop uses Kerberos authentication which is hard to manage. It is missing
encryption at storage and network levels which are a major point of concern.

Conclusion: Thus we have studied distributed file system.

Excel Project Plan Template
No ratings yet
Excel Project Plan Template
14 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
TM-42 Manual (ENG) 20050707-97
83% (6)
TM-42 Manual (ENG) 20050707-97
8 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
HDFS
No ratings yet
HDFS
16 pages
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
100% (2)
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
3 pages
Big Data Unit-III
No ratings yet
Big Data Unit-III
39 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
CH - 4 Discrete Fourier Transform
100% (1)
CH - 4 Discrete Fourier Transform
64 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
Project Management
100% (1)
Project Management
30 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
HDFS
No ratings yet
HDFS
11 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Unit - II
No ratings yet
Unit - II
64 pages
Introduction To Hadoop - Chapter-2
No ratings yet
Introduction To Hadoop - Chapter-2
59 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Bda Unit-Iv
No ratings yet
Bda Unit-Iv
37 pages
Unit 3
No ratings yet
Unit 3
61 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
Power Plant and Calculations - Why Does Load Hunting Occur in Steam Turbines
No ratings yet
Power Plant and Calculations - Why Does Load Hunting Occur in Steam Turbines
7 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
Big Data Unit 4 Own
No ratings yet
Big Data Unit 4 Own
18 pages
Arcgis Online Tutorial 2017 PDF
No ratings yet
Arcgis Online Tutorial 2017 PDF
22 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Hadoop
No ratings yet
Hadoop
31 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
HDFS
No ratings yet
HDFS
14 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Unit 3 Big Data - 240516 - 090400
No ratings yet
Unit 3 Big Data - 240516 - 090400
20 pages
Fire CR Dental
No ratings yet
Fire CR Dental
64 pages
3 - HDFS Hive HBase Pig
No ratings yet
3 - HDFS Hive HBase Pig
8 pages
Hadoop: OREIN IT Technologies
No ratings yet
Hadoop: OREIN IT Technologies
65 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
HDFS
No ratings yet
HDFS
13 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
6 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Hadoop
No ratings yet
Hadoop
4 pages
Computer Science Apprenticeship Bigdata Assignement3
No ratings yet
Computer Science Apprenticeship Bigdata Assignement3
3 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
FICHA TÉCNICA BATERIA UCG55-12-Ultracell
No ratings yet
FICHA TÉCNICA BATERIA UCG55-12-Ultracell
2 pages
BDA Lab Assignment 2
No ratings yet
BDA Lab Assignment 2
18 pages
Root Login Error
No ratings yet
Root Login Error
12 pages
SAP Class Notes
No ratings yet
SAP Class Notes
15 pages
Veolia BR Lithium en
No ratings yet
Veolia BR Lithium en
8 pages
Non Isolated High Gain DC DC Converters PDF
No ratings yet
Non Isolated High Gain DC DC Converters PDF
3 pages
Reviewer in Stas (Finals) 2.0
No ratings yet
Reviewer in Stas (Finals) 2.0
8 pages
Lecture4 (Piecewise Interpolation)
No ratings yet
Lecture4 (Piecewise Interpolation)
7 pages
DTIN Assg. Q
No ratings yet
DTIN Assg. Q
5 pages
Intergrating Digital Technologies and PH For COVID19
No ratings yet
Intergrating Digital Technologies and PH For COVID19
50 pages
Module 3 - Data Analysis - S RM
No ratings yet
Module 3 - Data Analysis - S RM
63 pages
Carlon Zip Box Blue Switch and Outlet Boxes: Nonmetallic
No ratings yet
Carlon Zip Box Blue Switch and Outlet Boxes: Nonmetallic
22 pages
Csec French p2 2019 MJ
No ratings yet
Csec French p2 2019 MJ
14 pages
IK122 Pro Manual v1.0
No ratings yet
IK122 Pro Manual v1.0
13 pages
Camozzi
No ratings yet
Camozzi
24 pages
Open Source Wifi Hotspot Implementation
No ratings yet
Open Source Wifi Hotspot Implementation
10 pages
04-Master Functional Programming in Java With Five Interfaces
No ratings yet
04-Master Functional Programming in Java With Five Interfaces
3 pages
Intel It Annual Performance Report 2021 2022 Paper
No ratings yet
Intel It Annual Performance Report 2021 2022 Paper
19 pages
OS Assignment
No ratings yet
OS Assignment
6 pages
Exp 08 Paytm
No ratings yet
Exp 08 Paytm
4 pages
ADL400 Manual
No ratings yet
ADL400 Manual
2 pages
JD - Evalueserve - FS - Equity Strategy - BASBA
No ratings yet
JD - Evalueserve - FS - Equity Strategy - BASBA
2 pages
7 Election
No ratings yet
7 Election
4 pages
MCCExp 6 TH
No ratings yet
MCCExp 6 TH
3 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
AISC Exp10
No ratings yet
AISC Exp10
2 pages
Riley Nelson Resume
No ratings yet
Riley Nelson Resume
2 pages
Silver Charm Bracelet With Heart Clasp Sterling Silver Pandora US
No ratings yet
Silver Charm Bracelet With Heart Clasp Sterling Silver Pandora US
1 page
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

10 Dfs

Uploaded by

10 Dfs

Uploaded by

Experiment 10

Aim: To demonstrate distributed file system

Hadoop Distributed File System (HDFS)

1. Varied Data Sources

1. Issue with Small Files

Conclusion: Thus we have studied distributed file system.

You might also like