0% found this document useful (0 votes)

80 views3 pages

Big Data

Big data refers to large, diverse, and complex datasets that are difficult to process using traditional data processing applications. It is characterized by 3 V's - volume, velocity, and variety. Hadoop is an open-source framework that allows distributed processing of big data across clusters of computers. It uses HDFS for storage, MapReduce for processing, and YARN for resource management. Hadoop has a master-slave architecture with a name node, data nodes, and job tracker to allow parallel processing of large datasets.

Uploaded by

Thakur Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views3 pages

Big Data

Uploaded by

Thakur Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Big Data -

Big data is a combination of structured, semi structured and unstructured data collected by

organizations that can be mined for information and used machine learning projects, predictive
modeling and other advanced analytics applications.

Three V’s of Big Data

 Volume

Volume is the v most associated with big data because, well, volume can be big. What we're
talking about here is quantities of data that reach almost incomprehensible proportions.

 Velocity

Velocity is the measure of how fast the data is coming in. Facebook has to handle a tsunami of
photographs every day. It has to ingest it all, process it, file it, and somehow, later, be able to
retrieve it.

 Variety

Data was once collected from one place and delivered in one format. Once taking the shape of
database files - such as, excel, csv and access - it is now being presented in non-traditional forms,
like video, text, pdf, and graphics on social media, as well as via tech such as wearable devices.

Challenges of Big Data –

 Insufficient understanding and acceptance of big data

Many of the times, companies fail to know even the basics: what big data actually is, what its
benefits are, what infrastructure is needed, etc. Without a clear understanding, a big data
adoption project risks to be doomed to failure. Companies may waste lots of time and resources
on things they don’t even know how to use. And if employees don’t understand big data’s
value and/or don’t want to change the existing processes for the sake of its adoption, they can
resist it and impede the company’s progress.

 Unreliable data

Nobody is hiding the fact that big data isn’t 100% accurate. And all in all, it’s not that critical.
But it doesn’t mean that we shouldn’t at all control how reliable our data is. Not only can it
contain wrong information, but also duplicate itself, as well as contain contradictions. And it’s
unlikely that data of extremely inferior quality can bring any useful insights or shiny
opportunities to our precision-demanding business tasks.

 Security
Data brings in its wake the issue of governance and security. Big data, by its very nature, means
that data flows from different sources. The more nodes there are, the more the system is
vulnerable to exploits that could lead to losses. Managing such sources and ensuring integrity as
well as security call for expert governance measures.

 Organizational Resistance

Organizational resistance even in other areas of business has been around since forever. It is a
problem that companies can anticipate, and as such, decide the best way to deal with the
problem. If it’s already happening in our organization, we should know that it is not something
out of the ordinary. Of the utmost importance is to determine the best way to handle the situation
to ensure big data success.

 Huge Costs Requirements

The management of big data, right from the adoption stage, demands a lot of expenses. For
instance, if your company chooses to use an on-premises solution must be ready to spend money
on new hardware, electricity, new recruitments such as developers and administrators and so on.
Additionally, you will be required to meet the costs of developing, setting up, configuring and
maintaining new software even though the frameworks needed are open source.

HADOOP -

Hadoop is a framework permitting the storage of large volumes of data on node systems. The
Hadoop architecture allows parallel processing of data using several components:

 Hadoop HDFS to store data across slave machines

 Hadoop YARN for resource management in the Hadoop cluster
 Hadoop Map Reduce to process data in a distributed fashion
 Zookeeper to ensure synchronization across a cluster

Hadoop Architecture -

Hadoop has a Master-Slave Architecture for data storage and distributed data processing
using Map Reduce and HDFS methods.

 Name Node - Name Node represented every files and directory which is used in the
namespace
 Data Node - Data Node helps you to manage the state of an HDFS node and allows you to
interacts with the blocks
 Master Node - The master node allows you to conduct parallel processing of data using
Hadoop Map Reduce.
 Slave Node - The slave nodes are the additional machines in the Hadoop cluster which
allows you to store data to conduct complex calculations. Moreover, the entire slave node
comes with Task Tracker and a Data Node. This allows you to synchronize the processes
with the Name Node and Job Tracker respectively.

Four Application of Hadoop –

1. Strengthen security and compliance

Hadoop can efficiently analyze server-log data and respond to a security breach in real-time.
Server-logs are nothing but computer-generated logs that capture network data operations,
particularly the security and regulatory compliance data. Server-log provides companies and
organizations important insights pertaining to network usage, security threats and compliance.
Hadoop is the perfect fit for staging and analyzing this data.

2. Hadoop for understanding customers’ requirements

The most important application of Hadoop understands Customer’ requirements. Different

companies such as finance, telecom uses Hadoop for finding out the customer’s requirement by
examining a big amount of data and discovering useful information from these vast amounts of
data. By understanding customers behaviors, organizations can improve their sales.

3. Geo-location Data

We are a part of a fast growing technological world, where smart-phones play a major role.
Retail, manufacturing, auto industry and other enterprises can now track their customers’
movement and predict customer purchases using geo-location data using smart phones and
tablets. Hadoop clusters help in streamlining enormous amount of geo-location data for the
organizations to figure out their trouble areas in the business.

4. Hadoop Applications in Retail industry

Retailers both online and offline use Hadoop for improving their sales. Many e-commerce
companies use Hadoop for keeping track of the products bought together by the customers. On
the basis of this, they provide suggestions to the customer to buy the other product when the
customer is trying to buy one of the relevant products from that group.

ISO 30401 Lead Auditor Course......
100% (4)
ISO 30401 Lead Auditor Course......
435 pages
Study Guide Ns Erp Consultant Exam
No ratings yet
Study Guide Ns Erp Consultant Exam
16 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
IDoc Basics For Functional Consultants
No ratings yet
IDoc Basics For Functional Consultants
36 pages
Las4 7 Q1 PR2
No ratings yet
Las4 7 Q1 PR2
22 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
DataReportal GDR013 20231019 Digital 2023 October Global Statshot Report v02 1
No ratings yet
DataReportal GDR013 20231019 Digital 2023 October Global Statshot Report v02 1
304 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Case Study - NPCI
No ratings yet
Case Study - NPCI
2 pages
VJA 3330 3130 Lesson 5 Communication March 2015
No ratings yet
VJA 3330 3130 Lesson 5 Communication March 2015
38 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Millipede Memory
No ratings yet
Millipede Memory
16 pages
Spark Scala Protected
No ratings yet
Spark Scala Protected
211 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
Data Warehousing Components
No ratings yet
Data Warehousing Components
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
Eling & Lehmann (2018)
No ratings yet
Eling & Lehmann (2018)
38 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
202 HRM
No ratings yet
202 HRM
2 pages
School Management System: User Guide
No ratings yet
School Management System: User Guide
3 pages
Wellbook Ps
No ratings yet
Wellbook Ps
1 page
Module 1 - Data Representation, and Data Structures-1
No ratings yet
Module 1 - Data Representation, and Data Structures-1
20 pages
202 HRM
No ratings yet
202 HRM
2 pages
Top 10 Highest Paying Jobs in HR: Median Annual Salary: $56,110
No ratings yet
Top 10 Highest Paying Jobs in HR: Median Annual Salary: $56,110
3 pages
205 Be
No ratings yet
205 Be
1 page
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
Export Dpa
No ratings yet
Export Dpa
119 pages
5 LD Registration Form
No ratings yet
5 LD Registration Form
4 pages
9.1. Questionnaires and Forms Summer Project-Cement Survey
No ratings yet
9.1. Questionnaires and Forms Summer Project-Cement Survey
14 pages
204 FM
No ratings yet
204 FM
2 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet
Aries
No ratings yet
Aries
51 pages
Full Paper - Siti Nur Fa'Idatul Khasanah - S3 - Attitude of English
No ratings yet
Full Paper - Siti Nur Fa'Idatul Khasanah - S3 - Attitude of English
12 pages
Import As Import As Import As: # Importing The Libraries
No ratings yet
Import As Import As Import As: # Importing The Libraries
3 pages
Subject Marks of PGDM Students Marks
No ratings yet
Subject Marks of PGDM Students Marks
52 pages
K KPF 1 Pat 7 A
No ratings yet
K KPF 1 Pat 7 A
17 pages
Post-Graduation Diploma in Management Batch-2019-21: Assignment On Operation and Supply Chain Management
No ratings yet
Post-Graduation Diploma in Management Batch-2019-21: Assignment On Operation and Supply Chain Management
24 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Business Computing-1
No ratings yet
Business Computing-1
39 pages
Programme: Cartography and Gis Course: Gis Tools Course Code: Cag 311 Credit Unit: 2
No ratings yet
Programme: Cartography and Gis Course: Gis Tools Course Code: Cag 311 Credit Unit: 2
22 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
100% (1)
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
31 pages
10
No ratings yet
10
4 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
LS1.1 - V6 Generalized Architecture of Big Data Systems
No ratings yet
LS1.1 - V6 Generalized Architecture of Big Data Systems
8 pages
Tittle of Proposal: An Assessment of Factors Contributing To Medicines and Pharmaceuticals Expiries at Zambia Medicines and Medical Supplies Agency
No ratings yet
Tittle of Proposal: An Assessment of Factors Contributing To Medicines and Pharmaceuticals Expiries at Zambia Medicines and Medical Supplies Agency
23 pages
Concept of Education, Training and Development
No ratings yet
Concept of Education, Training and Development
5 pages
Metvy Student Partner Program-WORKSHEET: Instructions
No ratings yet
Metvy Student Partner Program-WORKSHEET: Instructions
5 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
Research
No ratings yet
Research
37 pages
Contour Map Digitizing Group N
No ratings yet
Contour Map Digitizing Group N
19 pages
Scenario Summary: Changing Cells: Result Cells
No ratings yet
Scenario Summary: Changing Cells: Result Cells
3 pages
Scenario Summary: Changing Cells: Result Cells
No ratings yet
Scenario Summary: Changing Cells: Result Cells
3 pages
Scenario Summary: Changing Cells: Result Cells
No ratings yet
Scenario Summary: Changing Cells: Result Cells
3 pages
Evolution of File System
No ratings yet
Evolution of File System
21 pages
Answer 1. Human Resource Valuation
No ratings yet
Answer 1. Human Resource Valuation
3 pages
Answer 1.: Levels
No ratings yet
Answer 1.: Levels
3 pages
Answer 1. The Four Dimensions of Trust Are - 1. Privacy
No ratings yet
Answer 1. The Four Dimensions of Trust Are - 1. Privacy
3 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Digital Marketing
No ratings yet
Digital Marketing
2 pages
Post Assessment Activity: Social Responsibility Awareness
No ratings yet
Post Assessment Activity: Social Responsibility Awareness
2 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Cuestionario Why Big Data and Where Did It Come From?
50% (2)
Cuestionario Why Big Data and Where Did It Come From?
4 pages
Compensation
No ratings yet
Compensation
2 pages
1 - Understanding Big Data
No ratings yet
1 - Understanding Big Data
46 pages
Lecture 2 - Introduction To Big Data Analytics - 1691894427998
No ratings yet
Lecture 2 - Introduction To Big Data Analytics - 1691894427998
55 pages
Real-Time Stock Market Analysis Using LSTM
No ratings yet
Real-Time Stock Market Analysis Using LSTM
5 pages
An Introduction To Big Data
No ratings yet
An Introduction To Big Data
31 pages
LIRAS Brochure
No ratings yet
LIRAS Brochure
4 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
BDM Unit I Slides Part 1
No ratings yet
BDM Unit I Slides Part 1
27 pages
Bigdata MINT PDF
No ratings yet
Bigdata MINT PDF
4 pages
Bernardo Andrey Panggabean
No ratings yet
Bernardo Andrey Panggabean
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
BIG DATA and Its Traits
No ratings yet
BIG DATA and Its Traits
25 pages
111 (1st)
No ratings yet
111 (1st)
1 page
111 (1st)
No ratings yet
111 (1st)
1 page
Value A Is Greater 45000 46890 Value B Is Greater 45230 44025 Value A Is Greater 54000 54000 Both Are Equal 89000 85060 Value A Is Greater
No ratings yet
Value A Is Greater 45000 46890 Value B Is Greater 45230 44025 Value A Is Greater 54000 54000 Both Are Equal 89000 85060 Value A Is Greater
1 page
Business Plan Abstract
No ratings yet
Business Plan Abstract
11 pages
A6515 BDA Question Bank
No ratings yet
A6515 BDA Question Bank
9 pages
DQ 2
No ratings yet
DQ 2
2 pages
Big Data Syllabus For Theory and Lab
No ratings yet
Big Data Syllabus For Theory and Lab
4 pages
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
No ratings yet
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
10 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
5 pages
Seminar
No ratings yet
Seminar
16 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
SAP SFT Reporting Course (Kodakco) - Client Curriculum
No ratings yet
SAP SFT Reporting Course (Kodakco) - Client Curriculum
2 pages
Big Educational Data & Analytics Survey
No ratings yet
Big Educational Data & Analytics Survey
23 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Big Data Data Analytics
No ratings yet
Big Data Data Analytics
5 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Apache HIVE
No ratings yet
Apache HIVE
9 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data
No ratings yet
Big Data
1 page
Big Data
No ratings yet
Big Data
52 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
Big Data Landscape 2017
No ratings yet
Big Data Landscape 2017
1 page
Big Data
No ratings yet
Big Data
11 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Big Data
No ratings yet
Big Data
6 pages
Big Data: Presented by J.Jitendra Kumar
No ratings yet
Big Data: Presented by J.Jitendra Kumar
14 pages
Big Data
No ratings yet
Big Data
25 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
A Seminar Report: Big Data
No ratings yet
A Seminar Report: Big Data
22 pages

Big Data

Uploaded by

Big Data

Uploaded by

Big Data -

Big data is a combination of structured, semi structured and unstructured data collected by

Three V’s of Big Data

Challenges of Big Data –

 Insufficient understanding and acceptance of big data

 Huge Costs Requirements

 Hadoop HDFS to store data across slave machines

Four Application of Hadoop –

1. Strengthen security and compliance

2. Hadoop for understanding customers’ requirements

The most important application of Hadoop understands Customer’ requirements. Different

4. Hadoop Applications in Retail industry

You might also like