0% found this document useful (0 votes)
80 views3 pages

Big Data

Big data refers to large, diverse, and complex datasets that are difficult to process using traditional data processing applications. It is characterized by 3 V's - volume, velocity, and variety. Hadoop is an open-source framework that allows distributed processing of big data across clusters of computers. It uses HDFS for storage, MapReduce for processing, and YARN for resource management. Hadoop has a master-slave architecture with a name node, data nodes, and job tracker to allow parallel processing of large datasets.

Uploaded by

Thakur Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views3 pages

Big Data

Big data refers to large, diverse, and complex datasets that are difficult to process using traditional data processing applications. It is characterized by 3 V's - volume, velocity, and variety. Hadoop is an open-source framework that allows distributed processing of big data across clusters of computers. It uses HDFS for storage, MapReduce for processing, and YARN for resource management. Hadoop has a master-slave architecture with a name node, data nodes, and job tracker to allow parallel processing of large datasets.

Uploaded by

Thakur Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Big Data -

Big data is a combination of structured, semi structured and unstructured data collected by


organizations that can be mined for information and used machine learning projects, predictive
modeling and other advanced analytics applications.

Three V’s of Big Data

 Volume

Volume is the v most associated with big data because, well, volume can be big. What we're
talking about here is quantities of data that reach almost incomprehensible proportions.

 Velocity

Velocity is the measure of how fast the data is coming in. Facebook has to handle a tsunami of
photographs every day. It has to ingest it all, process it, file it, and somehow, later, be able to
retrieve it.

 Variety

Data was once collected from one place and delivered in one format. Once taking the shape of
database files - such as, excel, csv and access - it is now being presented in non-traditional forms,
like video, text, pdf, and graphics on social media, as well as via tech such as wearable devices.

Challenges of Big Data –

 Insufficient understanding and acceptance of big data

Many of the times, companies fail to know even the basics: what big data actually is, what its
benefits are, what infrastructure is needed, etc. Without a clear understanding, a big data
adoption project risks to be doomed to failure. Companies may waste lots of time and resources
on things they don’t even know how to use. And if employees don’t understand big data’s
value and/or don’t want to change the existing processes for the sake of its adoption, they can
resist it and impede the company’s progress.

 Unreliable data

Nobody is hiding the fact that big data isn’t 100% accurate. And all in all, it’s not that critical.
But it doesn’t mean that we shouldn’t at all control how reliable our data is. Not only can it
contain wrong information, but also duplicate itself, as well as contain contradictions. And it’s
unlikely that data of extremely inferior quality can bring any useful insights or shiny
opportunities to our precision-demanding business tasks.

 Security
Data brings in its wake the issue of governance and security. Big data, by its very nature, means
that data flows from different sources. The more nodes there are, the more the system is
vulnerable to exploits that could lead to losses. Managing such sources and ensuring integrity as
well as security call for expert governance measures. 

 Organizational Resistance 

Organizational resistance even in other areas of business has been around since forever. It is a
problem that companies can anticipate, and as such, decide the best way to deal with the
problem. If it’s already happening in our organization, we should know that it is not something
out of the ordinary. Of the utmost importance is to determine the best way to handle the situation
to ensure big data success. 

 Huge Costs Requirements

The management of big data, right from the adoption stage, demands a lot of expenses. For
instance, if your company chooses to use an on-premises solution must be ready to spend money
on new hardware, electricity, new recruitments such as developers and administrators and so on.
Additionally, you will be required to meet the costs of developing, setting up, configuring and
maintaining new software even though the frameworks needed are open source.

HADOOP -

Hadoop is a framework permitting the storage of large volumes of data on node systems. The
Hadoop architecture allows parallel processing of data using several components:

 Hadoop HDFS to store data across slave machines


 Hadoop YARN for resource management in the Hadoop cluster
 Hadoop Map Reduce to process data in a distributed fashion
 Zookeeper to ensure synchronization across a cluster

Hadoop Architecture -

Hadoop has a Master-Slave Architecture for data storage and distributed data processing
using Map Reduce and HDFS methods.

 Name Node - Name Node represented every files and directory which is used in the
namespace
 Data Node - Data Node helps you to manage the state of an HDFS node and allows you to
interacts with the blocks
 Master Node - The master node allows you to conduct parallel processing of data using
Hadoop Map Reduce.
 Slave Node - The slave nodes are the additional machines in the Hadoop cluster which
allows you to store data to conduct complex calculations. Moreover, the entire slave node
comes with Task Tracker and a Data Node. This allows you to synchronize the processes
with the Name Node and Job Tracker respectively.

Four Application of Hadoop –

1. Strengthen security and compliance

Hadoop can efficiently analyze server-log data and respond to a security breach in real-time.
Server-logs are nothing but computer-generated logs that capture network data operations,
particularly the security and regulatory compliance data. Server-log provides companies and
organizations important insights pertaining to network usage, security threats and compliance.
Hadoop is the perfect fit for staging and analyzing this data. 

2. Hadoop for understanding customers’ requirements

The most important application of Hadoop understands Customer’ requirements. Different


companies such as finance, telecom uses Hadoop for finding out the customer’s requirement by
examining a big amount of data and discovering useful information from these vast amounts of
data. By understanding customers behaviors, organizations can improve their sales.

3. Geo-location Data

We are a part of a fast growing technological world, where smart-phones play a major role.
Retail, manufacturing, auto industry and other enterprises can now track their customers’
movement and predict customer purchases using geo-location data using smart phones and
tablets.  Hadoop clusters help in streamlining enormous amount of geo-location data for the
organizations to figure out their trouble areas in the business.

4. Hadoop Applications in Retail industry

Retailers both online and offline use Hadoop for improving their sales. Many e-commerce
companies use Hadoop for keeping track of the products bought together by the customers. On
the basis of this, they provide suggestions to the customer to buy the other product when the
customer is trying to buy one of the relevant products from that group.

You might also like