0% found this document useful (0 votes)
20 views3 pages

Bigdata

this is all about the bigdata ,its basic usages and functionality .How bigdata can be helpful to deal with the data volume problems of today.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

Bigdata

this is all about the bigdata ,its basic usages and functionality .How bigdata can be helpful to deal with the data volume problems of today.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Q 4)

a) “Big data” as the name suggests refers the voluminous amount of data that has to be analysed or
processed .” Big data” become the focused concern of some of the social networking platforms
when they felt tremendous difficulties to handle with the huge volume of data and their system
reached at the verge of being collapsed. They needed a data system that could have the
capability of dynamic expansion of servers that are being used for data storage and could have
the good features of scaling too. Primarily “Big data” systems that are required must have the
features to deal with the following:
i) Variety : Data Base System must be capable to handle with the different type of data.
ii) Volume : Data Base System can cope up with the Huge Volume of Data.
iii) Velocity : Data Base System can deal with the situation when variety of data is coming
with the great velocity to be processed.
b) Relational databases are not suitable to store data generated from social networking application
because of the following reasons:
i) Relational data base system are unable to expand numbers of servers being used for
data storage dynamically , to expand the data storage servers Dynamically is the prime
and most requirement of all those organization that are dealing with the huge volume of
data.
ii) Relational data base system were not having the features of scalability they cannot
support distributed data storage with large clusters.
c) Parts of the Hadoop framework
i) Hadoop Distributed File System(HDFS):
HDFS is the main part of Hadoop File System, its Hadoop’s primary storage and is
primarily Java based . It provides the Reliability , Fault Tolerance and accessible data
storage for Distributed File System
ii) Name Node:
It is also termed as Master Node. This Name Node actually does not store data rather it
stores Meta Data. Main roles of Name Node are as given under
a) Managing name spaces of file system
b) Controlling clients file
c) Opening, closing and executing files.
iii) Data Node:
This is also called as slave node and used to store data in HDFS. It performs read and
write operation on the basis of user request

Data Storage and Management - Hadoop Distributed File System (HDFS)


This is the most important component of the Hadoop ecosystem. HDFS is Hadoop's
primary storage system. Hadoop Distributed File System (HDFS) is a Java-based file
system that provides reliable, fault tolerance and accessible data storage for the big
data. HDFS is a distributed file system that runs on conventional hardware. HDFS is
already configured with the default settings for many installations. Typically, a large
cluster configuration is required. Hadoop interacts directly with HDFS using commands.
Wen comes to HDFS, there are also two components can be identified, which are known
as Name Node and Data Node. (DataFlair, 2019)

Name Node

It is also known as Master node. Here, it does not store actual data or datasets. Name
Node stores the Meta data, for an example, the number of calls transform from a tower,
their position, where the end users are getting the call, the Data node data and other
details. Basically, this contains files and directories. The tasks of Name node can be
recognized as follows. (DataFlair, 2019)

 Managing file system namespace


 Controlling the access of clients to files
 Executing file system through naming, opening, closing files and directories

Data Node

Data node is called as Slave. Data node is responsible for the effective storage of data in
HDFS. The data node completes read and write operations on customer request. Replica
Block of Data node consists of two files in the file system. The first file is for data and the
second for registry metadata. HDFS metadata contains a data control. At startup, each
Data node is connected to the appropriate Name node and grasp. The ID of the Data Node
namespace and the software version are controlled by the handshake. If a discrepancy is
detected, Data Node is automatically disabled. When comes to tasks of Data node, those
can be detailed as follows. (DataFlair, 2019)

 This is consisting of operations like block replica creation, deletion, and


replication according to the instruction of Name node
 Managing data storage of the system

Processing and Computation – Hadoop MapReduce

When comes to Hadoop MapReduce, that is the main component of the Hadoop, that
provides data processing. MapReduce is can be identified as an easy-to-write application
framework that processes the large amount of structured and unstructured data stored in
the Hadoop distributed file system. (DataFlair, 2019)

MapReduce programs are parallel, so they are very useful for large-scale data analysis
using multiple clusters. Therefore, this parallelism increases the speed and reliability of
the cluster. In MapReduce, there are two functions, Map function and Reduce
function. (DataFlair, 2019)

Two functions can be identified, map function and reduce function.


 The map function retrieves a data set and converts it to another data set. Each
element is divided into processing (key / value pairs).
 The Reduce function accepts the Map output as an input and integrates these
data nodes based on the key and changes the key value accordingly.

Using Hadoop for Processing Large Datasets such as Data Record (CRD) or Customer
Transaction Data

Hadoop is used for processing the large data sets such as Call Data Record (CDR) or
Customer Transaction Data.

You might also like