0% found this document useful (0 votes)

35 views3 pages

Big Data Assignment 3

The document discusses key concepts of HDFS, including block abstraction, data replication, and the importance of compression and serialization for efficiency. It outlines the steps for cluster setup and highlights the advantages and challenges of using cloud environments for big data. Additionally, it provides a sample code for reading a file from HDFS using Java.

Uploaded by

jaiswal20januarydevesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

Big Data Assignment 3

Uploaded by

jaiswal20januarydevesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment-3: Big Data (BCS-061)

Q1.

Block abstraction in HDFS divides large files into fixed-size blocks (default 128MB) for storage.

These blocks are distributed across the cluster to optimize performance and scalability. Block size

matters because larger blocks reduce the overhead of metadata and improve throughput, while

smaller blocks may cause excessive load on the NameNode.

Q2.

HDFS stores files by splitting them into blocks and distributing them across DataNodes. For reading,

the client contacts the NameNode for block locations, then reads directly from DataNodes. Writing

involves the client writing to a pipeline of DataNodes. Data flows through the client to multiple

DataNodes in a sequence, ensuring replication.

Q3.

Data replication in HDFS ensures fault tolerance and high availability. Each block is typically

replicated three times across different nodes. If a node fails, the data is still accessible from replicas.

It also helps in load balancing and improves data locality during processing.

Q4.

Compression reduces data size, saving storage and bandwidth. Serialization converts objects into

byte streams for transmission and storage. Both are crucial in Hadoop I/O to improve efficiency and

performance. Efficient serialization and compression accelerate data transfer between nodes and

reduce storage overhead.

Q5.

Cluster setup involves:

1. Installing Java and Hadoop.

2. Configuring environment variables.

3. Editing core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml.

4. Formatting the NameNode.

5. Starting HDFS and YARN daemons.

6. Verifying the setup using web interfaces or command-line tools.

Q6.

Advantages: Scalability, cost-efficiency, and easy resource management. Cloud providers offer

flexibility and eliminate hardware maintenance.

Challenges: Data security, latency, compliance issues, and dependence on internet connectivity.

Performance tuning and configuration management can also be complex in cloud environments.

Q7.

```

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.*;

public class ReadFromHDFS {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);

Path path = new Path("/user/hadoop/file.txt");

FSDataInputStream input = fs.open(path);

byte[] buffer = new byte[1024];

input.readFully(0, buffer);

System.out.println(new String(buffer));

input.close();

fs.close();

}
}

```

Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
315325-31-33 (1) - Watermark
No ratings yet
315325-31-33 (1) - Watermark
3 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
5 pages
Big Data All Units by MultiAtoms 1
No ratings yet
Big Data All Units by MultiAtoms 1
49 pages
BG 345
No ratings yet
BG 345
26 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
Big Data Refers To Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers To Extremely Large and Complex Datasets That 1
421 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
Hadoop
No ratings yet
Hadoop
71 pages
Bdav QB
No ratings yet
Bdav QB
88 pages
Hadoop HDFS Notes
No ratings yet
Hadoop HDFS Notes
4 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
7 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
HDFS Essentials for Data Engineers
No ratings yet
HDFS Essentials for Data Engineers
22 pages
HDFS Data Replication Explained
No ratings yet
HDFS Data Replication Explained
65 pages
Hadoop HDFS
No ratings yet
Hadoop HDFS
3 pages
Overview of Hadoop Distributed File System
No ratings yet
Overview of Hadoop Distributed File System
5 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Set Up Single Node Hadoop Cluster
No ratings yet
Set Up Single Node Hadoop Cluster
20 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
Notes - 3 Unit Neha
No ratings yet
Notes - 3 Unit Neha
25 pages
Hadoop Basics and HDFS Overview
No ratings yet
Hadoop Basics and HDFS Overview
126 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
HDFS Concepts and Command Line Guide
No ratings yet
HDFS Concepts and Command Line Guide
42 pages
Unit 3
No ratings yet
Unit 3
61 pages
Overview of Hadoop Distributed File System
No ratings yet
Overview of Hadoop Distributed File System
12 pages
BDA CW Chapter 2
No ratings yet
BDA CW Chapter 2
6 pages
Module - 2
No ratings yet
Module - 2
84 pages
Chap4 BigDataStorageAndManagement
No ratings yet
Chap4 BigDataStorageAndManagement
46 pages
Hadoop Configuration Guide
No ratings yet
Hadoop Configuration Guide
22 pages
Overview of HDFS Architecture and Features
No ratings yet
Overview of HDFS Architecture and Features
51 pages
Introduction to Hadoop & MapReduce Basics
No ratings yet
Introduction to Hadoop & MapReduce Basics
27 pages
4.1 HDFS Federation Namenode
No ratings yet
4.1 HDFS Federation Namenode
22 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
6 - HDFS
No ratings yet
6 - HDFS
37 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
Introduction to Hadoop and HDFS Concepts
No ratings yet
Introduction to Hadoop and HDFS Concepts
52 pages
HDFS: Scalable Big Data Storage
No ratings yet
HDFS: Scalable Big Data Storage
1 page
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Big Data and Hadoop Ecosystem Overview
No ratings yet
Big Data and Hadoop Ecosystem Overview
260 pages
Unit 3
No ratings yet
Unit 3
5 pages
Unit - II
No ratings yet
Unit - II
64 pages
BDA 3rd Unit QB
No ratings yet
BDA 3rd Unit QB
4 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
HDFS
100% (2)
HDFS
6 pages
Unit 3 Mapreduce
No ratings yet
Unit 3 Mapreduce
14 pages
bdcc-2 2
No ratings yet
bdcc-2 2
12 pages
Introduction To Hadoop and MapReduce Programming
No ratings yet
Introduction To Hadoop and MapReduce Programming
29 pages
HDFS
No ratings yet
HDFS
11 pages
Git Command Cheat Sheet Guide
No ratings yet
Git Command Cheat Sheet Guide
2 pages
Unit 2 Os
No ratings yet
Unit 2 Os
138 pages
M-4 U-3 Combined Notes - 32846544 - 2024 - 06 - 10 - 00 - 11
No ratings yet
M-4 U-3 Combined Notes - 32846544 - 2024 - 06 - 10 - 00 - 11
166 pages
AKTU WALA OS Unit-4-Memory Management
No ratings yet
AKTU WALA OS Unit-4-Memory Management
13 pages
Tushar Chhabra: Education Skills Courses and Tools
No ratings yet
Tushar Chhabra: Education Skills Courses and Tools
1 page
Delta Lake: High-Performance ACID Table Storage Over Cloud Object Stores
No ratings yet
Delta Lake: High-Performance ACID Table Storage Over Cloud Object Stores
14 pages
Our History
No ratings yet
Our History
2 pages
Big Data Storage: Made by Urmil Sehgal 6 Semseter (E) (02524302011)
No ratings yet
Big Data Storage: Made by Urmil Sehgal 6 Semseter (E) (02524302011)
22 pages
Datamites Certified Data Analyst Brochure INDIA V9
No ratings yet
Datamites Certified Data Analyst Brochure INDIA V9
18 pages
Cloud Computing Question Bank Unit IV and Unit V Updated
No ratings yet
Cloud Computing Question Bank Unit IV and Unit V Updated
25 pages
Big Data Expert Resume - Pune, India
No ratings yet
Big Data Expert Resume - Pune, India
3 pages
ZooKeeper: Distributed Coordination Service
100% (1)
ZooKeeper: Distributed Coordination Service
42 pages
Apache Hop
No ratings yet
Apache Hop
8 pages
Geospatial Data Analytics On AWS 1st Edition Scott Bateman Download
100% (2)
Geospatial Data Analytics On AWS 1st Edition Scott Bateman Download
53 pages
Neetu Kumari: Senior Business Analyst in US Healthcare Data Analytics
No ratings yet
Neetu Kumari: Senior Business Analyst in US Healthcare Data Analytics
5 pages
Ude My For Business Course List New
No ratings yet
Ude My For Business Course List New
64 pages
Big Data Presentation
No ratings yet
Big Data Presentation
24 pages
Pig Latin Queries
No ratings yet
Pig Latin Queries
6 pages
Secured WBAN Data Retrieval with ECDSA
No ratings yet
Secured WBAN Data Retrieval with ECDSA
78 pages
IOT - Unit - 4
No ratings yet
IOT - Unit - 4
62 pages
B.Tech Data Science Course Guide
No ratings yet
B.Tech Data Science Course Guide
20 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
60 pages
Monitoring Hadoop Using Ambari
No ratings yet
Monitoring Hadoop Using Ambari
72 pages
Big Data Notes
No ratings yet
Big Data Notes
51 pages
Cloud File
No ratings yet
Cloud File
33 pages
Big Data: Characteristics & Platforms
No ratings yet
Big Data: Characteristics & Platforms
11 pages
2.big Data Storage
No ratings yet
2.big Data Storage
96 pages
Spark for Data Engineers
No ratings yet
Spark for Data Engineers
10 pages
ST1 KCS061 - Updated
No ratings yet
ST1 KCS061 - Updated
2 pages
Migrating Sybase ASA with Azure Data Factory
No ratings yet
Migrating Sybase ASA with Azure Data Factory
3,614 pages
Hadoop Notes
No ratings yet
Hadoop Notes
21 pages
Katta & Hadoop: Distributed Indexing
100% (2)
Katta & Hadoop: Distributed Indexing
22 pages
Data Science Path for Job Seekers
No ratings yet
Data Science Path for Job Seekers
28 pages
Seminar PPT On Hadoop
No ratings yet
Seminar PPT On Hadoop
13 pages

Big Data Assignment 3

Uploaded by

Big Data Assignment 3

Uploaded by

Assignment-3: Big Data (BCS-061)

smaller blocks may cause excessive load on the NameNode.

DataNodes in a sequence, ensuring replication.

reduce storage overhead.

Cluster setup involves:

1. Installing Java and Hadoop.

2. Configuring environment variables.

4. Formatting the NameNode.

5. Starting HDFS and YARN daemons.

6. Verifying the setup using web interfaces or command-line tools.

flexibility and eliminate hardware maintenance.

public class ReadFromHDFS {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Path path = new Path("/user/hadoop/file.txt");

FSDataInputStream input = fs.open(path);

byte[] buffer = new byte[1024];

You might also like