Haddop Archive

Hadoop Archive (HAR) is designed to efficiently manage small files in Hadoop by packing them into a single compact HDFS block, reducing the memory burden on the namenode. The archiving process involves running a MapReduce job to create an archive file with a .har extension, which can be used as input for further MapReduce jobs. However, HAR files have limitations, including the need for additional disk space during creation, the inability to modify archives without recreation, and potential inefficiencies due to the requirement for many map tasks.

Uploaded by

prasannadwivedi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views2 pages

Haddop Archive

Uploaded by

prasannadwivedi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

What is hadoop archive Hadoop is created to deal with large files data .

So
Hadoop archive is a facility which packs up small files into small files are problematic and to be handled
one compact HDFSblock to avoid memory wastage of name efficiently.
node.name node stores the metadata information of the As large input file is splitted into number of small
the HDFS data.SO,say 1GB file is broken in 1000 pieces then input files and stored across all the data nodes, all
namenode will have to store metadata about all those 1000 these huge number of records are to be stored in
small files.In that manner,namenode memory willbe name node which makes name node inefficient. To
wasted it storing and managing a lot of data. handle this problem, Hadoop Archieve has been
HAR is created from a collection of files and the archiving created which packs the HDFS files into archives and
tool will run a MapReduce job.these Maps reduce jobs to we can directly use these files an as input to the MR
process the input files in parallel to create an archive file. jobs. It always comes with *.har extension.
HAR Syntax:
HAR command hadoop archive -archiveName NAME -p <parent
hadoop archive -archiveName myhar.har path> <src>* <dest>
/input/location /output/location Example:
hadoop archive -archiveName foo.har -p
/user/hadoop dir1 dir2 /user/zoo

Pawan Kumar Singh, AP, Deptt of Cse 1

Limitations of HAR Files:
If you have a hadoop archive stored in HDFS in 1) Creation of HAR files will create a copy of the original
/user/zoo/foo.har then for using this archive for MapReduce files. So, we need as much disk space as size of original
input, all you need to specify the input directory as files which we are archiving. We can delete the original
har:///user/zoo/foo.har. files after creation of archive to release some disk
If we list the archive file: space.
$hadoop fs -ls /data/myArch.har 2) Once an archive is created, to add or remove files
/data/myArch..har/_index from/to archive we need to re-create the archive.
3) HAR file will require lots of map tasks which are
/data/myArch..har/_masterindex inefficient.
/data/myArch..har/part-0

part files are the original files concatenated together with big
files and index files are to look up for the small files in the big
part file.

Pawan Kumar Singh, AP, Deptt of Cse 2

Unit 3 Topic 9 Hadoop Archives
No ratings yet
Unit 3 Topic 9 Hadoop Archives
32 pages
Unit2 HDFS and Map Reduce
No ratings yet
Unit2 HDFS and Map Reduce
119 pages
Big-Data Computing: B. Ramamurthy
100% (1)
Big-Data Computing: B. Ramamurthy
55 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
BDA Record
No ratings yet
BDA Record
58 pages
Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
Unit 1,2,3,4
No ratings yet
Unit 1,2,3,4
116 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
ASCP Plan Attributes
No ratings yet
ASCP Plan Attributes
6 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Introduction To Hadoop - Chapter-2
No ratings yet
Introduction To Hadoop - Chapter-2
59 pages
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
No ratings yet
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
49 pages
Lecture 4 Introduction To Hadoop
No ratings yet
Lecture 4 Introduction To Hadoop
25 pages
W Java132
No ratings yet
W Java132
14 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Another Intro To Hadoop
No ratings yet
Another Intro To Hadoop
23 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
BigData Hadoop Online Training by Experts
No ratings yet
BigData Hadoop Online Training by Experts
41 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Unit 2
No ratings yet
Unit 2
56 pages
Unit 2
No ratings yet
Unit 2
19 pages
Gartner Global Supply Chain Top 25 For 2024
No ratings yet
Gartner Global Supply Chain Top 25 For 2024
13 pages
Hadoop 2
No ratings yet
Hadoop 2
31 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Hadoop
No ratings yet
Hadoop
30 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Big Data
No ratings yet
Big Data
28 pages
Storage: BDA Asignment-1 - Diagran Processing
No ratings yet
Storage: BDA Asignment-1 - Diagran Processing
16 pages
Hadoop Presentation
No ratings yet
Hadoop Presentation
19 pages
CISSP Common Body of Knowledge Review in
No ratings yet
CISSP Common Body of Knowledge Review in
145 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
44 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Bda Unit2
No ratings yet
Bda Unit2
24 pages
Unit-2 Imp Ques Ans
No ratings yet
Unit-2 Imp Ques Ans
8 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
HADOOP Notes Unit 3 and 4
No ratings yet
HADOOP Notes Unit 3 and 4
14 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
dt209x Manual
No ratings yet
dt209x Manual
68 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
Unit-4-Unit-4-Bda EDIT
No ratings yet
Unit-4-Unit-4-Bda EDIT
16 pages
ADBMDatabase System Development Lifecycle
No ratings yet
ADBMDatabase System Development Lifecycle
7 pages
Cainta Catholic College Senior High School Department Cainta, Rizal
No ratings yet
Cainta Catholic College Senior High School Department Cainta, Rizal
33 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
43 1480754329 - 03-12-2016 PDF
No ratings yet
43 1480754329 - 03-12-2016 PDF
4 pages
DA Lab Program-1
No ratings yet
DA Lab Program-1
3 pages
Saudia Arabia
No ratings yet
Saudia Arabia
2 pages
eSthenos-Mobility Solutions For MFI/Banks/SBL
No ratings yet
eSthenos-Mobility Solutions For MFI/Banks/SBL
8 pages
Introduction To Files
No ratings yet
Introduction To Files
5 pages
Kshitij Tiwari: Qualification
No ratings yet
Kshitij Tiwari: Qualification
3 pages
Characteristics of New Media
No ratings yet
Characteristics of New Media
1 page
Internal Analysis: Resources, Capabilities, and Core Competencies
No ratings yet
Internal Analysis: Resources, Capabilities, and Core Competencies
59 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Survey Results Report Guide
No ratings yet
Survey Results Report Guide
21 pages
Explain Briefly The Different Building Blocks of Algorithms
No ratings yet
Explain Briefly The Different Building Blocks of Algorithms
19 pages
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
No ratings yet
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
3 pages
Linear Programming Project PPT Improved
No ratings yet
Linear Programming Project PPT Improved
10 pages
Mess Management System
No ratings yet
Mess Management System
13 pages
HP Laserjet Pro M404 Series
No ratings yet
HP Laserjet Pro M404 Series
5 pages
1973 Eldorado
No ratings yet
1973 Eldorado
70 pages
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
No ratings yet
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
40 pages
Math 5 Reviewer
100% (1)
Math 5 Reviewer
2 pages
COS 101.use. Lecture 1
No ratings yet
COS 101.use. Lecture 1
16 pages
Digital Lifestyle of Connected Nigerians
No ratings yet
Digital Lifestyle of Connected Nigerians
14 pages
Wireless Personal Communications Systems - CSE5807: School of Computer Science and Software Engineering
No ratings yet
Wireless Personal Communications Systems - CSE5807: School of Computer Science and Software Engineering
32 pages
Arithmetic 2 Teacher Edition
No ratings yet
Arithmetic 2 Teacher Edition
8 pages
Sap Powerdesigner: Object-Oriented Model Report
No ratings yet
Sap Powerdesigner: Object-Oriented Model Report
13 pages
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
No ratings yet
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
7 pages
MP22 8259
No ratings yet
MP22 8259
8 pages
3 - Offline Participant Information and Consent Form
No ratings yet
3 - Offline Participant Information and Consent Form
3 pages
How To Make Micro-SIM From Usual SIM Card
No ratings yet
How To Make Micro-SIM From Usual SIM Card
1 page
Faith in Mind PDF
No ratings yet
Faith in Mind PDF
2 pages
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Learning HBase
From Everand
Learning HBase
Shashwat Shriparv
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Haddop Archive

Uploaded by

Haddop Archive

Uploaded by

What is hadoop archive Hadoop is created to deal with large files data .

Pawan Kumar Singh, AP, Deptt of Cse 1

Pawan Kumar Singh, AP, Deptt of Cse 2

You might also like