0% found this document useful (0 votes)

34 views15 pages

Unit 3

Hadoop is an open-source framework that allows for the distributed storage and processing of large datasets across clusters of computers. It uses HDFS for storage, MapReduce as its processing framework, and YARN for resource management. Hadoop provides advantages like scalability, fault tolerance, flexibility with different data types, and cost effectiveness. Common tools used with Hadoop include Flume for data ingestion, Oozie for workflow scheduling, and Zookeeper for coordination across nodes.

Uploaded by

xcgfxgvx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views15 pages

Unit 3

Uploaded by

xcgfxgvx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Introduction to Hadoop

Hadoop is an open source framework from Apache and is used to store

process and analyze data which are very huge in volume. Hadoop is
written in Java and is not OLAP (online analytical processing). It is used
for batch/offline processing. It is being used by Facebook, Yahoo,
Google, Twitter, LinkedIn and many more. Moreover it can be scaled up
just by adding nodes in the cluster.

Modules of Hadoop
HDFS: Hadoop Distributed File System. Google published its paper GFS
and on the basis of that HDFS was developed. It states that the files will
be broken into blocks and stored in nodes over the distributed
architecture.

Yarn: Yet another Resource Negotiator is used for job scheduling and
manages the cluster.

Map Reduce: This is a framework which helps Java programs to do the

parallel computation on data using key value pair. The Map task takes
input data and converts it into a data set which can be computed in Key
value pair. The output of Map task is consumed by reduce task and
then the out of reducer gives the desired result.

Hadoop Common: These Java libraries are used to start Hadoop and
are used by other Hadoop modules.

Features of Hadoop
1. Open Source:
Hadoop is open-source, which means it is free to use. Since it is an
open-source project the source-code is available online for anyone to
understand it or make some modifications as per their industry
requirement.

2. Highly Scalable Cluster:

Hadoop is a highly scalable model. A large amount of data is divided

into multiple inexpensive machines in a cluster which is processed
parallelly. The number of these machines or nodes can be increased or
decreased as per the enterprise’s requirements. In traditional RDBMS
(Relational Database Management System) the systems cannot be
scaled to approach large amounts of data.

3. Fault Tolerance is Available:

Hadoop uses commodity hardware (inexpensive systems) which can be

crashed at any moment. In Hadoop data is replicated on various
DataNodes in a Hadoop cluster which ensures the availability of data if
somehow any of your systems got crashed. You can read all of the data
from a single machine if this machine faces a technical issue data can
also be read from other nodes in a Hadoop cluster because the data is
copied or replicated by default.

4. High Availability is provided:

Fault tolerance provides High Availability in the Hadoop cluster. High

Availability means the availability of data on the Hadoop cluster. Due to
fault tolerance in case if any of the DataNode goes down the same data
can be retrieved from any other node where the data is replicated. The
High available Hadoop cluster also has 2 or more than two NameNode
i.e. Active NameNode and Passive NameNode also known as stand by
NameNode.

5. Cost-Effective:

Hadoop is open-source and uses cost-effective commodity hardware

which provides a cost-efficient model, unlike traditional Relational
databases that require expensive hardware and high-end processors to
deal with Big Data. The problem with traditional Relational databases is
that storing the Massive volume of data is not cost-effective, so the
company’s started to remove the raw data.

6. Hadoop Provide Flexibility:

Hadoop is designed in such a way that it can deal with any kind of
dataset like structured (MySql Data), Semi-Structured (XML, JSON), Un-
structured (Images and Videos) very efficiently. This means it can easily
process any kind of data independent of its structure which makes it
highly flexible.

7. Easy to Use:

Hadoop is easy to use since the developers need not worry about any
of the processing work since it is managed by the Hadoop itself.
Hadoop ecosystem is also very large comes up with lots of tools like
Hive, Pig, Spark, HBase, Mahout, etc.

8. Hadoop uses Data Locality:

The concept of Data Locality is used to make Hadoop processing fast. In

the data locality concept, the computation logic is moved near data
rather than moving the data to the computation logic. The cost of
Moving data on HDFS is costliest and with the help of the data locality
concept, the bandwidth utilization in the system is minimized.

Hadoop Architecture
Hadoop is a framework written in Java that utilizes a large cluster of
commodity hardware to maintain and store big size data. Hadoop
works on MapReduce Programming Algorithm that was introduced by
Google. Today lots of Big Brand Companies are using Hadoop in their
Organization to deal with big data, eg. Facebook, Yahoo, Netflix, eBay,
etc. The Hadoop Architecture Mainly consists of 4 components.

1. MapReduce
2. HDFS(Hadoop Distributed File System)
3. YARN(Yet Another Resource Negotiator)
4. Common Utilities or Hadoop Common

Architecture of Hadoop
Distributed Processing
Map Reduce

Distributed Storage HDFS

Yet Another Resource

Negotiator (Job YARN Hadoop Common
Scheduling and
Resource Manager)

Java Libraries and

Utilities (Java Scripts)

1. MapReduce

MapReduce nothing but just like an Algorithm or a data structure that

is based on the YARN framework. The major feature of MapReduce is to
perform the distributed processing in parallel in a Hadoop cluster which
Makes Hadoop working so fast. When you are dealing with Big Data,
serial processing is no more of any use. MapReduce has mainly 2 tasks
which are divided phase-wise:

In first phase, Map is utilized and in next phase Reduce is utilized.

2. HDFS (Hadoop Distributed File System)

HDFS (Hadoop Distributed File System) is utilized for storage
permission. It is mainly designed for working on commodity Hardware
devices (inexpensive devices), working on a distributed file system
design. HDFS is designed in such a way that it believes more in storing
the data in a large chunk of blocks rather than storing small data blocks.

HDFS in Hadoop provides Fault-tolerance and High availability to the

storage layer and the other devices present in that Hadoop cluster.

 Data storage Nodes in HDFS.

1. NameNode (Master)
2. DataNode (Slave)

NameNode: NameNode works as a Master in a Hadoop cluster that

guides the DataNode (Slaves). NameNode is mainly used for storing the
Metadata i.e. the data about the data. Meta Data can be the
transaction logs that keep track of the user’s activity in a Hadoop
cluster.

Meta Data can also be the name of the file, size, and the information
about the location (Block number, Block ids) of DataNode that
NameNode stores to find the closest DataNode for Faster
Communication. NameNode instructs the DataNodes with the
operation like delete, create, Replicate, etc.

DataNode: DataNodes works as a Slave DataNodes are mainly utilized

for storing the data in a Hadoop cluster; the number of DataNodes can
be from 1 to 500 or even more than that. The more number of
DataNode, the Hadoop cluster will be able to store more data. So it is
advised that the DataNode should have High storing capacity to store a
large number of file blocks.
3. YARN (Yet Another Resource Negotiator)

YARN is a Framework on which MapReduce works. YARN performs 2

operations that are Job scheduling and Resource Management. The
Purpose of Job scheduler is to divide a big task into small jobs so that
each job can be assigned to various slaves in a Hadoop cluster and
Processing can be maximized. Job Scheduler also keeps track of which
job is important, which job has more priority, dependencies between
the jobs and all the other information like job timing, etc. And the use
of Resource Manager is to manage all the resources that are made
available for running a Hadoop cluster.

 Features of YARN
1. Multi-Tenancy
2. Scalability
3. Cluster-Utilization
4. Compatibility

4. Hadoop common or Common Utilities

Hadoop common or Common utilities are nothing but our java library
and java files or we can say the java scripts that we need for all the
other components present in a Hadoop cluster. These utilities are used
by HDFS, YARN, and MapReduce for running the cluster. Hadoop
Common verifies that Hardware failure in a Hadoop cluster is common
so it needs to be solved automatically in software by Hadoop
Framework.

Introduction to Data Management and Data Access tools

1. Data Management using Flume
Flume is a distributed system used for efficiently collecting,
aggregating, and moving large amounts of streaming data from various
sources to a centralized data store. It is commonly used for log
aggregation in big data environments.

To perform data management using Flume, you first need to define the
sources of the data that you want to collect. Flume supports various
sources such as syslog, Avro, Thrift, and HTTP.

Next, you need to define the channel that you want to use for storing
the collected data temporarily before it is processed by the sink. Flume
supports various channels such as memory, JDBC, and file-based
channels.

Finally, you need to define the sink that you want to use for storing the
collected data permanently. Flume supports various sinks such as HDFS,
HBase, and Kafka.
A Flume agent is a JVM process which has 3 components –Flume
Source, Flume Channel and Flume Sink– through which events
propagate after initiated at an external source.
Overall, Flume is a powerful tool for data management in big data
environments. It provides a flexible and scalable architecture that can
be customized to meet the specific needs of your data management
tasks.

2. Oozie
Apache Oozie is a workflow scheduler for Hadoop. It is a system which
runs the workflow of dependent jobs. Here, users are permitted to
create Directed Acyclic Graphs of workflows, which can be run in
parallel and sequentially in Hadoop.

3. Zookeeper
Apache Zookeeper is an open source distributed coordination service
that helps to manage a large set of hosts. Management and
coordination in a distributed environment is tricky. Zookeeper
automates this process and allows developers to focus on building
software features rather than worry about its distributed nature.

So Zookeeper is an important part of Hadoop that takes care of these

small but important matters so that the developer can focus more on
the application’s functionality.
Zookeeper helps you to maintain configuration information, naming,
group services for distributed applications. It implements different
protocols on the cluster so that the application should not implement
on their own. It provides a single coherent view of multiple machines.

4. Hive
Hive is a data warehouse system which is used to analyze structured
data. It is built on the top of Hadoop. It was developed by Facebook.

Hive provides the functionality of reading, writing, and managing large

datasets residing in distributed storage. It runs SQL like queries called
HQL (Hive query language) which gets internally converted to
MapReduce jobs.

Apache Hive architecture consists mainly of three components:

1. Hive Client
2. Hive Services
3. Hive Storage and Computer
Using Hive, we can skip the requirement of the traditional approach of
writing complex MapReduce programs. Hive supports Data Definition
Language (DDL), Data Manipulation Language (DML), and User Defined
Functions (UDF).

5. Pig
Pig Represents Big Data as data flows. Pig is a high-level platform or
tool which is used to process the large datasets. It provides a high-level
of abstraction for processing over the MapReduce. It provides a high-
level scripting language, known as Pig Latin which is used to develop
the data analysis codes. First, to process the data which is stored in the
HDFS, the programmers will write the scripts using the Pig Latin
Language. Internally Pig Engine(a component of Apache Pig) converted
all these scripts into a specific map and reduce task. But these are not
visible to the programmers in order to provide a high-level of
abstraction. Pig Latin and Pig Engine are the two main components of
the Apache Pig tool. The result of Pig always stored in the HDFS.
6. Avro
To transfer data over a network or for its persistent storage, you need
to serialize the data. Prior to the serialization APIs provided by Java and
Hadoop, we have a special utility, called Avro, a schema-based
serialization technique.

Apache Avro is a language-neutral data serialization system. It was

developed by Doug Cutting, the father of Hadoop. Since Hadoop
writable classes lack language portability, Avro becomes quite helpful,
as it deals with data formats that can be processed by multiple
languages. Avro is a preferred tool to serialize data in Hadoop.

Avro has a schema-based system. A language-independent schema is

associated with its read and writes operations. Avro serializes the data
which has a built-in schema. Avro serializes the data into a compact
binary format, which can be de-serialized by any application.

Avro uses JSON format to declare the data structures. Presently, it

supports languages such as Java, C, C++, C#, Python, and Ruby.
The key feature of AVRO is it can efficiently handle any change in
data schema over time. i.e. Schema Evolution. It handles schema
changes like missing fields, added fields, and changed fields.

7. SQOOP for data access

Apache SQOOP, a command-line interface tool, moves data between
relational databases and Hadoop. It is used to export data from the
Hadoop file system to relational databases and to import data from
relational databases such as MySql and Oracle into the Hadoop file
system.

A component of the Hadoop ecosystem is Apache Sqoop. There was

a need for a specialized tool to perform this process quickly because a
lot of data needed to be moved from relational database systems onto
Hadoop. This is when Apache Sqoop entered the scene and is now
widely used for moving data from RDBMS files to the Hadoop
ecosystem for MapReduce processing and other uses.

HBase
In HBase, tables are split into regions and are served by the region servers.
Regions are vertically divided by column families into “Stores”. Stores are
saved as files in HDFS. Shown below is the architecture of HBase.
The term ‘store’ is used for regions to explain the storage structure.
HBase has three major components: the client library, a master server,
and region servers. Region servers can be added or removed as per
requirement.
MasterServer

1. Assigns regions to the region servers and takes the help of Apache
ZooKeeper for this task.
2. Handles load balancing of the regions across region servers. It
unloads the busy servers and shifts the regions to less occupied
servers.
3. Maintains the state of the cluster by negotiating the load
balancing.
4. Is responsible for schema changes and other metadata operations
such as creation of tables and column families.

Regions

Regions are nothing but tables that are split up and spread across the
region servers.

Region server

1. The region servers have regions that -

2. Communicate with the client and handle data-related operations.
3. Handle read and write requests for all the regions under it.
4. Decide the size of the region by following the region size
thresholds.

History of Computers
100% (1)
History of Computers
7 pages
Inteligen 500: Operator Guide
50% (2)
Inteligen 500: Operator Guide
41 pages
Imule
No ratings yet
Imule
18 pages
Assignment-1 (All The Component of Motherboard)
No ratings yet
Assignment-1 (All The Component of Motherboard)
8 pages
TW DeployingMPLS Text
No ratings yet
TW DeployingMPLS Text
146 pages
Intel® Desktop Board DP965LT: Specification Update
No ratings yet
Intel® Desktop Board DP965LT: Specification Update
8 pages
Programming The Be Operating System
No ratings yet
Programming The Be Operating System
392 pages
Ransport Anagement Ystem: Dr. Thomas Brodkorb Dr. Hans-Jürgen Hennrich
No ratings yet
Ransport Anagement Ystem: Dr. Thomas Brodkorb Dr. Hans-Jürgen Hennrich
13 pages
Comparison of Wireless Technologies
No ratings yet
Comparison of Wireless Technologies
2 pages
SIMEAS P OperatingInstructionProfibus E50417 B1076 C238 A2 30082004 en
No ratings yet
SIMEAS P OperatingInstructionProfibus E50417 B1076 C238 A2 30082004 en
27 pages
Get Started With Win32 and C++
No ratings yet
Get Started With Win32 and C++
148 pages
Programming - Manual KX-TVM 200
No ratings yet
Programming - Manual KX-TVM 200
190 pages
Midterm Exam - Attempt Review
No ratings yet
Midterm Exam - Attempt Review
2 pages
Sharp MFP How To Print Out and Clear The Copy Counts in The User Account Control System
No ratings yet
Sharp MFP How To Print Out and Clear The Copy Counts in The User Account Control System
9 pages
03 IA32Architecture
No ratings yet
03 IA32Architecture
45 pages
Library For SNTP Server Functionality in Simatic S7 Cpus (LSNTP)
No ratings yet
Library For SNTP Server Functionality in Simatic S7 Cpus (LSNTP)
26 pages
Chapter 5 Exercises and Answers: Answers Are in Blue
No ratings yet
Chapter 5 Exercises and Answers: Answers Are in Blue
5 pages
ITT107 Jelise Hayden Assignment#2
No ratings yet
ITT107 Jelise Hayden Assignment#2
27 pages
Pricelist (HP..)
No ratings yet
Pricelist (HP..)
32 pages
(OpenWrt Wiki) Unlocking The Netgear Telnet Console
No ratings yet
(OpenWrt Wiki) Unlocking The Netgear Telnet Console
12 pages
BIOS Firmware Upgrade Guide
No ratings yet
BIOS Firmware Upgrade Guide
13 pages
File Mcqs
No ratings yet
File Mcqs
3 pages
Laptop Compare
No ratings yet
Laptop Compare
4 pages
Review Questions - Ansible Deep Dive
No ratings yet
Review Questions - Ansible Deep Dive
6 pages
ACL Instructor Workbook
No ratings yet
ACL Instructor Workbook
69 pages
Hiren's BootCD From USB Flash Drive (Pen Drive) WWW - Hiren.info
No ratings yet
Hiren's BootCD From USB Flash Drive (Pen Drive) WWW - Hiren.info
2 pages
πÇÉREAD MEπÇæGuide to Updating Firmware FinalµùÑΦ»¡Θâ¿σêåΣ┐«µö╣
No ratings yet
πÇÉREAD MEπÇæGuide to Updating Firmware FinalµùÑΦ»¡Θâ¿σêåΣ┐«µö╣
8 pages
MCQs
No ratings yet
MCQs
15 pages
Eeprom S7200
No ratings yet
Eeprom S7200
3 pages
Unit 2 - Introduction To Computer Terminology
No ratings yet
Unit 2 - Introduction To Computer Terminology
42 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)

Unit 3

Uploaded by

Unit 3

Uploaded by

Introduction to Hadoop

Hadoop is an open source framework from Apache and is used to store

Map Reduce: This is a framework which helps Java programs to do the

2. Highly Scalable Cluster:

Hadoop is a highly scalable model. A large amount of data is divided

3. Fault Tolerance is Available:

Hadoop uses commodity hardware (inexpensive systems) which can be

4. High Availability is provided:

Fault tolerance provides High Availability in the Hadoop cluster. High

Hadoop is open-source and uses cost-effective commodity hardware

6. Hadoop Provide Flexibility:

8. Hadoop uses Data Locality:

The concept of Data Locality is used to make Hadoop processing fast. In

Distributed Storage HDFS

Yet Another Resource

Java Libraries and

MapReduce nothing but just like an Algorithm or a data structure that

In first phase, Map is utilized and in next phase Reduce is utilized.

2. HDFS (Hadoop Distributed File System)

HDFS in Hadoop provides Fault-tolerance and High availability to the

 Data storage Nodes in HDFS.

NameNode: NameNode works as a Master in a Hadoop cluster that

DataNode: DataNodes works as a Slave DataNodes are mainly utilized

YARN is a Framework on which MapReduce works. YARN performs 2

4. Hadoop common or Common Utilities

Introduction to Data Management and Data Access tools

So Zookeeper is an important part of Hadoop that takes care of these

Hive provides the functionality of reading, writing, and managing large

Apache Hive architecture consists mainly of three components:

Apache Avro is a language-neutral data serialization system. It was

Avro has a schema-based system. A language-independent schema is

Avro uses JSON format to declare the data structures. Presently, it

7. SQOOP for data access

A component of the Hadoop ecosystem is Apache Sqoop. There was

1. The region servers have regions that -

You might also like