0% found this document useful (0 votes)

3 views11 pages

Lecture 2

The document outlines the key drivers of Big Data, categorized into technological, business, and social/environmental factors, highlighting advancements in data generation, storage, and processing technologies. It discusses how businesses leverage Big Data for decision-making, cost reduction, and personalized customer experiences, while also addressing societal needs and regulatory compliance. Additionally, it describes the architecture of Big Data systems, emphasizing the importance of managing large datasets through various processing and storage techniques.

Uploaded by

kharstikim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

Lecture 2

Uploaded by

kharstikim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

4.

Drivers of Big Data

Big Data is growing at an unprecedented rate due to multiple factors that influence
its adoption across industries. These drivers of Big Data can be categorized into
Technological Drivers, Business Drivers, and Social & Environmental
Drivers.

1️⃣ Technological Drivers

Technological advancements play a significant role in the rapid expansion of Big Data. These
innovations enable the efficient storage, processing, and analysis of massive datasets.

1.1 Increase in Data Generation

 The exponential growth of digital data is primarily driven by social media, IoT (Internet
of Things) devices, sensors, e-commerce, and business transactions.
 Example: Every day, social media platforms like Facebook, Twitter, and Instagram
generate petabytes of user-generated content, interactions, and multimedia files.
 IoT devices such as smart home sensors, industrial equipment, and wearable devices
continuously collect and transmit real-time data, further fueling Big Data growth.

1.2 Advancements in Storage Technologies

 Cloud computing has revolutionized data storage, allowing organizations to store

massive datasets securely and affordably.
 Distributed file systems like Hadoop Distributed File System (HDFS) enable
organizations to store and manage vast amounts of unstructured data efficiently.
 Solid-State Drives (SSDs) and high-performance storage technologies improve data
access speed and reliability.

1.3 Computing Power & Scalability

 High-performance computing (HPC) has enabled faster processing of vast datasets.

 Cloud computing platforms (AWS, Google Cloud, Azure) offer scalable solutions for
data processing and analysis.
 Example: Companies can scale up or down their computing power based on real-time
requirements, reducing infrastructure costs.

1.4 Big Data Frameworks & Tools

 Technologies like Apache Hadoop, Apache Spark, and NoSQL databases

(MongoDB, Cassandra) allow organizations to efficiently process and analyze large-
scale datasets.
 Parallel processing and distributed computing reduce the time required to derive
meaningful insights from data.

1.5 Artificial Intelligence (AI) & Machine Learning (ML)

 AI and ML models require vast amounts of structured and unstructured data for
training.
 Big Data helps train deep learning models, such as those used in self-driving cars,
facial recognition, fraud detection, and recommendation systems.
 Example: AI-powered voice assistants like Siri and Alexa continuously analyze and
learn from user interactions, improving accuracy over time.

2️⃣ Business Drivers

Businesses across industries leverage Big Data to enhance decision-making, improve customer
experiences, reduce costs, and stay ahead of competitors.

2.1 Data-Driven Decision Making

 Companies use Big Data analytics to gain insights into customer behavior, market
trends, and operational efficiency.
 Predictive analytics helps businesses forecast future sales, inventory needs, and risks.
 Example: Banks use Big Data to detect fraudulent transactions in real-time.

2.2 Cost Reduction

 Organizations use Big Data to optimize supply chains, reduce operational costs, and
improve efficiency.
 Cloud-based Big Data solutions reduce the need for expensive physical infrastructure.
 Example: Retailers use Big Data to optimize inventory levels and reduce wastage,
leading to cost savings.

2.3 Personalization & Customer Experience

 Big Data helps businesses offer personalized recommendations, targeted marketing,

and tailored customer experiences.
 Example: Netflix and Amazon use Big Data to analyze user preferences and provide
customized content recommendations.

2.4 Fraud Detection & Risk Management

 Financial institutions and cybersecurity firms use Big Data analytics for real-time
fraud detection and anomaly detection.
 Example: Credit card companies analyze millions of transactions daily to identify
suspicious activities and prevent fraud.

2.5 Real-Time Processing & Automation

 Industries like finance, healthcare, and manufacturing rely on real-time data

analytics for automation and fast decision-making.
 Example: Smart factories use IoT sensors and Big Data to predict machine failures and
schedule proactive maintenance.

3️⃣ Social & Environmental Drivers

The adoption of Big Data is also influenced by societal needs, regulatory policies, and
environmental concerns.

3.1 Growth of Social Media & Digital Platforms

 Social media platforms generate massive amounts of user-generated data daily,

contributing significantly to Big Data growth.
 Example: Twitter processes over 500 million tweets per day, all of which are valuable
for sentiment analysis and trend forecasting.

3.2 Smart Cities & IoT Integration

 Governments and organizations use Big Data to optimize urban planning, traffic
management, and energy consumption.
 Example: Smart traffic lights adjust signals based on real-time vehicle flow data,
reducing congestion in major cities.

3.3 Healthcare & Genomics

 Medical research and personalized medicine rely on Big Data for disease prediction,
drug discovery, and diagnostics.
 Example: Genomic sequencing generates vast amounts of data, which is used to identify
genetic disorders and develop precision medicine.

3.4 Regulatory Compliance & Governance

 Industries must analyze and manage large volumes of compliance-related data due to
regulations such as GDPR (General Data Protection Regulation) and HIPAA (Health
Insurance Portability and Accountability Act).
 Example: GDPR ensures that companies protect users' personal data and provide
transparency in how data is used.
3.5 Environmental Monitoring & Sustainability

 Big Data is used in climate modeling, disaster prediction, and efficient resource
management.
 Example: Meteorological departments use Big Data analytics to predict hurricanes,
earthquakes, and climate change patterns.
5. Big data architecture is specifically designed to manage data ingestion,
data processing, and analysis of data that is too large or complex. A big size data
cannot be store, process and manage by conventional relational databases. The
solution is to organize technology into a structure of big data architecture. Big data
architecture is able to manage and process data.

Key Aspects of Big Data Architecture

 To store and process large size data like 100 GB in size.
 To aggregates and transform of a wide variety of unstructured data for analysis
and reporting.
 Access, processing and analysis of streamed data in real time.

a) Data Sources
All big data solutions start with one or more data sources. The Big Data
Architecture accommodates various data sources and efficiently manages a wide
range of data types. Some common data sources in big data architecture include
transactional databases, logs, machine-generated data, social media and web data,
streaming data, external data sources, cloud-based data, NOSQL databases, data
warehouses, file systems, APIs, and web services.

b) Data Storage
Big Data storage consists of distributed file stores that can hold large, multi-format
files efficiently. A Data Lake is used to store diverse file formats, including
structured, semi-structured, and unstructured data. This storage is primarily used
for batch operations and supports blob storage solutions such as:
HDFS (Hadoop Distributed File System)
Microsoft Azure Blob Storage
AWS S3 (Simple Storage Service)
Google Cloud Storage (GCP Storage)

c) Batch Processing
Batch processing is a long-running operation that processes data in chunks by
filtering, aggregating, and preparing it for analysis. These jobs require input
data, process it, and generate output files. Common batch processing tools include:
Hive Jobs (SQL-like querying for batch data)
U-SQL Jobs (Microsoft’s big data processing language)
Apache Sqoop (Data transfer between RDBMS and Hadoop)
Apache Pig (High-level scripting for Hadoop)
Custom MapReduce Jobs (Written in Java, Scala, Python)

d) Real-Time Message Ingestion

A real-time streaming system handles incoming data as it arrives, differing from
batch processing, which processes data in scheduled intervals. Data is continuously
collected and stored for processing. Some common message-based ingestion tools
include:
Apache Kafka (Highly scalable, distributed event streaming)
Apache Flume (Data collection, aggregation, and movement)
Azure Event Hubs (Streaming platform for event-driven applications)

e) Stream Processing
Unlike batch processing, stream processing handles real-time data flows by
consuming, processing, and delivering insights within milliseconds to seconds.
This is achieved using publish-subscribe messaging systems and window-based
data processing techniques.
Apache Spark Streaming (Micro-batch stream processing)
Apache Flink (Low-latency, distributed stream processing)
Apache Storm (Real-time distributed computation)
Processed data is then stored in a sink for further use

f) Analytics-Based Datastore
Once processed, data is stored in a data warehouse or NoSQL database for
querying and analysis. These analytical stores allow faster lookups and advanced
analytics.
HBase (NoSQL database for real-time read/write)
Apache Hive (SQL-based querying on Hadoop)
Spark SQL (Query engine for structured big data processing)
Hive enables metadata abstraction, making it easier to manage and analyze large
datasets.

g) Reporting & Analysis

The insights generated from Big Data processing need to be visualized using
reporting and analysis tools. These tools create dashboards, graphs, and reports
to support business intelligence (BI) and decision-making.
IBM Cognos
Oracle Hyperion
Tableau, Power BI, Looker
These tools help organizations understand trends, make predictions, and gain
actionable insights.

h) Orchestration
Orchestration tools automate and manage Big Data workflows, ensuring data
pipelines run efficiently. They enable data transformation, movement, and
scheduling across different sources and destinations. Some common orchestration
tools include:
Apache Oozie (Workflow scheduler for Hadoop)
Apache Airflow (Task orchestration and workflow automation)
Azure Data Factory (Cloud-based ETL and data movement service)
6. 5 V's of Big Data

1. Volume (Size of Data)

• Refers to the massive amount of data generated daily from sources like
social media, IoT devices, sensors, transactions, and logs.
• Examples: Facebook generates over 4 petabytes of data per day.
• The Large Hadron Collider produces 1 petabyte per second of data during
experiments.
• Challenges: Requires scalable storage solutions like Hadoop HDFS, AWS
S3, and Google BigQuery.
2. Velocity (Speed of Data Generation & Processing)
• It describes the speed at which data is generated, collected, and
processed in real time.
• With the development and usage of IoT devices and real-time data streams,
the velocity of data has expanded tremendously, demanding systems that can
process data instantly to derive meaningful insights.
• Examples:
• Stock market transactions require millisecond-level processing.
• IoT sensors stream continuous real-time data for predictive maintenance.
• Challenges: Needs low-latency data pipelines using Kafka, Apache Flink,
and Spark Streaming.
3. Variety (Different Data Formats & Sources)
• Big Data includes different types of data like structured data (found in
databases), unstructured data (like text, images, videos), and semi-structured
data (like JSON and XML) from various sources. This diversity requires
advanced tools for data integration, storage, and analysis.
• Examples:
• Structured: SQL databases, Excel files.
• Semi-Structured: JSON, XML, NoSQL databases.
• Unstructured: Images, videos, audio, social media posts.
• Challenges: Requires multi-format storage (HDFS, MongoDB) and
flexible processing frameworks (Spark, Hadoop).
4. Veracity (Data Quality & Accuracy)
• Veracity refers accuracy and trustworthiness of the data. Ensuring data
quality, addressing data discrepancies, and dealing with data ambiguity are
all major issues in Big Data analytics.
• Examples:
• Fake news and misinformation on social media.
• Sensor data errors due to hardware malfunctions.
• Challenges: Requires data cleansing, filtering, and validation using
AI/ML techniques.
5. Value (Business & Analytical Insights)
• The ability to convert large volumes of data into useful insights. Big Data's
ultimate goal is to extract meaningful and actionable insights that can lead to
better decision-making, new products, enhanced consumer experiences, and
competitive advantages.
• Examples:
• E-commerce: Personalized recommendations (Amazon, Netflix).
• Healthcare: Predicting disease outbreaks with Big Data analytics.
• Challenges: Requires AI-driven analytics, data monetization, and
predictive modeling.

120 Google Tag Manager Resources - Analytics Mania PDF
100% (1)
120 Google Tag Manager Resources - Analytics Mania PDF
11 pages
Customer Relationship Management (CRM) Vs Customer Experience Management (CEM)
No ratings yet
Customer Relationship Management (CRM) Vs Customer Experience Management (CEM)
12 pages
EtiyaAI Videotron
No ratings yet
EtiyaAI Videotron
25 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
What's Is Big D-WPS Office
No ratings yet
What's Is Big D-WPS Office
3 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Big Data
No ratings yet
Big Data
12 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
Big Data
No ratings yet
Big Data
10 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Unit II Big Data Architecture
No ratings yet
Unit II Big Data Architecture
5 pages
Big Data
No ratings yet
Big Data
67 pages
Big Data 2
No ratings yet
Big Data 2
49 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
UNIT II - Emerging Technology
No ratings yet
UNIT II - Emerging Technology
22 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
Big Data Chatgpt
No ratings yet
Big Data Chatgpt
8 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Big Data..Unit-1 Notes
No ratings yet
Big Data..Unit-1 Notes
16 pages
Report On Bigdata
No ratings yet
Report On Bigdata
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
Unit 1
No ratings yet
Unit 1
20 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
15 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
Big Data 1
No ratings yet
Big Data 1
28 pages
Big Data Components
No ratings yet
Big Data Components
31 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
No ratings yet
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
17 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Big Data Components
No ratings yet
Big Data Components
58 pages
Big Data Analytics02
No ratings yet
Big Data Analytics02
20 pages
Big Data
No ratings yet
Big Data
18 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Big Data Report
No ratings yet
Big Data Report
10 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Bda Notes
No ratings yet
Bda Notes
87 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
ET Ext
No ratings yet
ET Ext
217 pages
Attachment
No ratings yet
Attachment
25 pages
Finance - Unit 4
No ratings yet
Finance - Unit 4
39 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Group 4
No ratings yet
Group 4
10 pages
BDA Notes Part 1
No ratings yet
BDA Notes Part 1
11 pages
Big Data
No ratings yet
Big Data
190 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Lecture 1
No ratings yet
Lecture 1
25 pages
Agents in Langchain
No ratings yet
Agents in Langchain
6 pages
Vector Databases
No ratings yet
Vector Databases
2 pages
Assignment 1
No ratings yet
Assignment 1
1 page
ADASYN Class Imbalance Explanation
No ratings yet
ADASYN Class Imbalance Explanation
2 pages
ReAct Agents
No ratings yet
ReAct Agents
6 pages
Module 4 Internet of Things
No ratings yet
Module 4 Internet of Things
17 pages
BBA (Finance and Marketing Analytics - Honours - Honours With Research) - 20231128092926 - 20240105120734
No ratings yet
BBA (Finance and Marketing Analytics - Honours - Honours With Research) - 20231128092926 - 20240105120734
2 pages
Optimizing Carbon Capture Supply Chains With AI-Driven Supplier Quality Management and Predictive Analytics
No ratings yet
Optimizing Carbon Capture Supply Chains With AI-Driven Supplier Quality Management and Predictive Analytics
4 pages
Back To Basics: How To Simplify Your Safety Program To Achieve Real Results
100% (1)
Back To Basics: How To Simplify Your Safety Program To Achieve Real Results
32 pages
2014 BigData Analytics Business Partner Case Studies 4-9-14
No ratings yet
2014 BigData Analytics Business Partner Case Studies 4-9-14
59 pages
What Is Marketing Analytics
No ratings yet
What Is Marketing Analytics
10 pages
Week 6 Report
No ratings yet
Week 6 Report
4 pages
The CIO Coffeetable Book - Vymo - CAI
No ratings yet
The CIO Coffeetable Book - Vymo - CAI
180 pages
Digital Transformation - The CFO Role - Mckinsey Report
No ratings yet
Digital Transformation - The CFO Role - Mckinsey Report
9 pages
Rajnish Kishore - 03-07.2024
No ratings yet
Rajnish Kishore - 03-07.2024
4 pages
IOTTT
No ratings yet
IOTTT
2 pages
Overview QSS SaaS - 2021 - 2022
No ratings yet
Overview QSS SaaS - 2021 - 2022
22 pages
VeliEDGE Presentation
No ratings yet
VeliEDGE Presentation
21 pages
Social Media Marketing Plan 2022
No ratings yet
Social Media Marketing Plan 2022
44 pages
Cc101 Introduction To Computing Module 4 5
No ratings yet
Cc101 Introduction To Computing Module 4 5
10 pages
Unit I Introduction To Big Data: 1.1 Definition
No ratings yet
Unit I Introduction To Big Data: 1.1 Definition
16 pages
Thomson Reuters State of The Shared Services Market reportp2AXBxNgxPnUWBu6gW22pldySp9MMSEimKfZPLGd
No ratings yet
Thomson Reuters State of The Shared Services Market reportp2AXBxNgxPnUWBu6gW22pldySp9MMSEimKfZPLGd
28 pages
PDF-files-websiteR19 UG-PG FINAL SYLLABUSMBAMBA
No ratings yet
PDF-files-websiteR19 UG-PG FINAL SYLLABUSMBAMBA
126 pages
Case Study
No ratings yet
Case Study
2 pages
Assignment 9
No ratings yet
Assignment 9
3 pages
Free CV Template 12
No ratings yet
Free CV Template 12
1 page
Data Analytics Complete Syllabus
No ratings yet
Data Analytics Complete Syllabus
5 pages
New York Times, P. B1. Retrieved From
No ratings yet
New York Times, P. B1. Retrieved From
14 pages
Compiled by Ayon Roy
No ratings yet
Compiled by Ayon Roy
2 pages
Introduction To Marketing and Sales Analytics
No ratings yet
Introduction To Marketing and Sales Analytics
10 pages
Chapter 1 Introduction To Datascience
No ratings yet
Chapter 1 Introduction To Datascience
13 pages
XDR For Users: Trend Micro
No ratings yet
XDR For Users: Trend Micro
3 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

4.

Drivers of Big Data

1️⃣ Technological Drivers

1.1 Increase in Data Generation

1.2 Advancements in Storage Technologies

 Cloud computing has revolutionized data storage, allowing organizations to store

1.3 Computing Power & Scalability

 High-performance computing (HPC) has enabled faster processing of vast datasets.

1.4 Big Data Frameworks & Tools

 Technologies like Apache Hadoop, Apache Spark, and NoSQL databases

1.5 Artificial Intelligence (AI) & Machine Learning (ML)

2️⃣ Business Drivers

2.1 Data-Driven Decision Making

2.2 Cost Reduction

2.3 Personalization & Customer Experience

 Big Data helps businesses offer personalized recommendations, targeted marketing,

2.4 Fraud Detection & Risk Management

2.5 Real-Time Processing & Automation

 Industries like finance, healthcare, and manufacturing rely on real-time data

3️⃣ Social & Environmental Drivers

3.1 Growth of Social Media & Digital Platforms

 Social media platforms generate massive amounts of user-generated data daily,

3.2 Smart Cities & IoT Integration

3.3 Healthcare & Genomics

3.4 Regulatory Compliance & Governance

Key Aspects of Big Data Architecture

d) Real-Time Message Ingestion

g) Reporting & Analysis

1. Volume (Size of Data)

You might also like