Bigdata
Bigdata
a) “Big data” as the name suggests refers the voluminous amount of data that has to be analysed or
processed .” Big data” become the focused concern of some of the social networking platforms
when they felt tremendous difficulties to handle with the huge volume of data and their system
reached at the verge of being collapsed. They needed a data system that could have the
capability of dynamic expansion of servers that are being used for data storage and could have
the good features of scaling too. Primarily “Big data” systems that are required must have the
features to deal with the following:
i) Variety : Data Base System must be capable to handle with the different type of data.
ii) Volume : Data Base System can cope up with the Huge Volume of Data.
iii) Velocity : Data Base System can deal with the situation when variety of data is coming
with the great velocity to be processed.
b) Relational databases are not suitable to store data generated from social networking application
because of the following reasons:
i) Relational data base system are unable to expand numbers of servers being used for
data storage dynamically , to expand the data storage servers Dynamically is the prime
and most requirement of all those organization that are dealing with the huge volume of
data.
ii) Relational data base system were not having the features of scalability they cannot
support distributed data storage with large clusters.
c) Parts of the Hadoop framework
i) Hadoop Distributed File System(HDFS):
HDFS is the main part of Hadoop File System, its Hadoop’s primary storage and is
primarily Java based . It provides the Reliability , Fault Tolerance and accessible data
storage for Distributed File System
ii) Name Node:
It is also termed as Master Node. This Name Node actually does not store data rather it
stores Meta Data. Main roles of Name Node are as given under
a) Managing name spaces of file system
b) Controlling clients file
c) Opening, closing and executing files.
iii) Data Node:
This is also called as slave node and used to store data in HDFS. It performs read and
write operation on the basis of user request
Name Node
It is also known as Master node. Here, it does not store actual data or datasets. Name
Node stores the Meta data, for an example, the number of calls transform from a tower,
their position, where the end users are getting the call, the Data node data and other
details. Basically, this contains files and directories. The tasks of Name node can be
recognized as follows. (DataFlair, 2019)
Data Node
Data node is called as Slave. Data node is responsible for the effective storage of data in
HDFS. The data node completes read and write operations on customer request. Replica
Block of Data node consists of two files in the file system. The first file is for data and the
second for registry metadata. HDFS metadata contains a data control. At startup, each
Data node is connected to the appropriate Name node and grasp. The ID of the Data Node
namespace and the software version are controlled by the handshake. If a discrepancy is
detected, Data Node is automatically disabled. When comes to tasks of Data node, those
can be detailed as follows. (DataFlair, 2019)
When comes to Hadoop MapReduce, that is the main component of the Hadoop, that
provides data processing. MapReduce is can be identified as an easy-to-write application
framework that processes the large amount of structured and unstructured data stored in
the Hadoop distributed file system. (DataFlair, 2019)
MapReduce programs are parallel, so they are very useful for large-scale data analysis
using multiple clusters. Therefore, this parallelism increases the speed and reliability of
the cluster. In MapReduce, there are two functions, Map function and Reduce
function. (DataFlair, 2019)
Using Hadoop for Processing Large Datasets such as Data Record (CRD) or Customer
Transaction Data
Hadoop is used for processing the large data sets such as Call Data Record (CDR) or
Customer Transaction Data.