0% found this document useful (0 votes)
3 views11 pages

Lecture 2

The document outlines the key drivers of Big Data, categorized into technological, business, and social/environmental factors, highlighting advancements in data generation, storage, and processing technologies. It discusses how businesses leverage Big Data for decision-making, cost reduction, and personalized customer experiences, while also addressing societal needs and regulatory compliance. Additionally, it describes the architecture of Big Data systems, emphasizing the importance of managing large datasets through various processing and storage techniques.

Uploaded by

kharstikim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

Lecture 2

The document outlines the key drivers of Big Data, categorized into technological, business, and social/environmental factors, highlighting advancements in data generation, storage, and processing technologies. It discusses how businesses leverage Big Data for decision-making, cost reduction, and personalized customer experiences, while also addressing societal needs and regulatory compliance. Additionally, it describes the architecture of Big Data systems, emphasizing the importance of managing large datasets through various processing and storage techniques.

Uploaded by

kharstikim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

4.

Drivers of Big Data


Big Data is growing at an unprecedented rate due to multiple factors that influence
its adoption across industries. These drivers of Big Data can be categorized into
Technological Drivers, Business Drivers, and Social & Environmental
Drivers.

1️⃣ Technological Drivers


Technological advancements play a significant role in the rapid expansion of Big Data. These
innovations enable the efficient storage, processing, and analysis of massive datasets.

1.1 Increase in Data Generation

 The exponential growth of digital data is primarily driven by social media, IoT (Internet
of Things) devices, sensors, e-commerce, and business transactions.
 Example: Every day, social media platforms like Facebook, Twitter, and Instagram
generate petabytes of user-generated content, interactions, and multimedia files.
 IoT devices such as smart home sensors, industrial equipment, and wearable devices
continuously collect and transmit real-time data, further fueling Big Data growth.

1.2 Advancements in Storage Technologies

 Cloud computing has revolutionized data storage, allowing organizations to store


massive datasets securely and affordably.
 Distributed file systems like Hadoop Distributed File System (HDFS) enable
organizations to store and manage vast amounts of unstructured data efficiently.
 Solid-State Drives (SSDs) and high-performance storage technologies improve data
access speed and reliability.

1.3 Computing Power & Scalability

 High-performance computing (HPC) has enabled faster processing of vast datasets.


 Cloud computing platforms (AWS, Google Cloud, Azure) offer scalable solutions for
data processing and analysis.
 Example: Companies can scale up or down their computing power based on real-time
requirements, reducing infrastructure costs.

1.4 Big Data Frameworks & Tools

 Technologies like Apache Hadoop, Apache Spark, and NoSQL databases


(MongoDB, Cassandra) allow organizations to efficiently process and analyze large-
scale datasets.
 Parallel processing and distributed computing reduce the time required to derive
meaningful insights from data.

1.5 Artificial Intelligence (AI) & Machine Learning (ML)

 AI and ML models require vast amounts of structured and unstructured data for
training.
 Big Data helps train deep learning models, such as those used in self-driving cars,
facial recognition, fraud detection, and recommendation systems.
 Example: AI-powered voice assistants like Siri and Alexa continuously analyze and
learn from user interactions, improving accuracy over time.

2️⃣ Business Drivers


Businesses across industries leverage Big Data to enhance decision-making, improve customer
experiences, reduce costs, and stay ahead of competitors.

2.1 Data-Driven Decision Making

 Companies use Big Data analytics to gain insights into customer behavior, market
trends, and operational efficiency.
 Predictive analytics helps businesses forecast future sales, inventory needs, and risks.
 Example: Banks use Big Data to detect fraudulent transactions in real-time.

2.2 Cost Reduction

 Organizations use Big Data to optimize supply chains, reduce operational costs, and
improve efficiency.
 Cloud-based Big Data solutions reduce the need for expensive physical infrastructure.
 Example: Retailers use Big Data to optimize inventory levels and reduce wastage,
leading to cost savings.

2.3 Personalization & Customer Experience

 Big Data helps businesses offer personalized recommendations, targeted marketing,


and tailored customer experiences.
 Example: Netflix and Amazon use Big Data to analyze user preferences and provide
customized content recommendations.

2.4 Fraud Detection & Risk Management

 Financial institutions and cybersecurity firms use Big Data analytics for real-time
fraud detection and anomaly detection.
 Example: Credit card companies analyze millions of transactions daily to identify
suspicious activities and prevent fraud.

2.5 Real-Time Processing & Automation

 Industries like finance, healthcare, and manufacturing rely on real-time data


analytics for automation and fast decision-making.
 Example: Smart factories use IoT sensors and Big Data to predict machine failures and
schedule proactive maintenance.

3️⃣ Social & Environmental Drivers


The adoption of Big Data is also influenced by societal needs, regulatory policies, and
environmental concerns.

3.1 Growth of Social Media & Digital Platforms

 Social media platforms generate massive amounts of user-generated data daily,


contributing significantly to Big Data growth.
 Example: Twitter processes over 500 million tweets per day, all of which are valuable
for sentiment analysis and trend forecasting.

3.2 Smart Cities & IoT Integration

 Governments and organizations use Big Data to optimize urban planning, traffic
management, and energy consumption.
 Example: Smart traffic lights adjust signals based on real-time vehicle flow data,
reducing congestion in major cities.

3.3 Healthcare & Genomics

 Medical research and personalized medicine rely on Big Data for disease prediction,
drug discovery, and diagnostics.
 Example: Genomic sequencing generates vast amounts of data, which is used to identify
genetic disorders and develop precision medicine.

3.4 Regulatory Compliance & Governance

 Industries must analyze and manage large volumes of compliance-related data due to
regulations such as GDPR (General Data Protection Regulation) and HIPAA (Health
Insurance Portability and Accountability Act).
 Example: GDPR ensures that companies protect users' personal data and provide
transparency in how data is used.
3.5 Environmental Monitoring & Sustainability

 Big Data is used in climate modeling, disaster prediction, and efficient resource
management.
 Example: Meteorological departments use Big Data analytics to predict hurricanes,
earthquakes, and climate change patterns.
5. Big data architecture is specifically designed to manage data ingestion,
data processing, and analysis of data that is too large or complex. A big size data
cannot be store, process and manage by conventional relational databases. The
solution is to organize technology into a structure of big data architecture. Big data
architecture is able to manage and process data.

Key Aspects of Big Data Architecture


 To store and process large size data like 100 GB in size.
 To aggregates and transform of a wide variety of unstructured data for analysis
and reporting.
 Access, processing and analysis of streamed data in real time.

a) Data Sources
All big data solutions start with one or more data sources. The Big Data
Architecture accommodates various data sources and efficiently manages a wide
range of data types. Some common data sources in big data architecture include
transactional databases, logs, machine-generated data, social media and web data,
streaming data, external data sources, cloud-based data, NOSQL databases, data
warehouses, file systems, APIs, and web services.

b) Data Storage
Big Data storage consists of distributed file stores that can hold large, multi-format
files efficiently. A Data Lake is used to store diverse file formats, including
structured, semi-structured, and unstructured data. This storage is primarily used
for batch operations and supports blob storage solutions such as:
HDFS (Hadoop Distributed File System)
Microsoft Azure Blob Storage
AWS S3 (Simple Storage Service)
Google Cloud Storage (GCP Storage)

c) Batch Processing
Batch processing is a long-running operation that processes data in chunks by
filtering, aggregating, and preparing it for analysis. These jobs require input
data, process it, and generate output files. Common batch processing tools include:
Hive Jobs (SQL-like querying for batch data)
U-SQL Jobs (Microsoft’s big data processing language)
Apache Sqoop (Data transfer between RDBMS and Hadoop)
Apache Pig (High-level scripting for Hadoop)
Custom MapReduce Jobs (Written in Java, Scala, Python)

d) Real-Time Message Ingestion


A real-time streaming system handles incoming data as it arrives, differing from
batch processing, which processes data in scheduled intervals. Data is continuously
collected and stored for processing. Some common message-based ingestion tools
include:
Apache Kafka (Highly scalable, distributed event streaming)
Apache Flume (Data collection, aggregation, and movement)
Azure Event Hubs (Streaming platform for event-driven applications)

e) Stream Processing
Unlike batch processing, stream processing handles real-time data flows by
consuming, processing, and delivering insights within milliseconds to seconds.
This is achieved using publish-subscribe messaging systems and window-based
data processing techniques.
Apache Spark Streaming (Micro-batch stream processing)
Apache Flink (Low-latency, distributed stream processing)
Apache Storm (Real-time distributed computation)
Processed data is then stored in a sink for further use

f) Analytics-Based Datastore
Once processed, data is stored in a data warehouse or NoSQL database for
querying and analysis. These analytical stores allow faster lookups and advanced
analytics.
HBase (NoSQL database for real-time read/write)
Apache Hive (SQL-based querying on Hadoop)
Spark SQL (Query engine for structured big data processing)
Hive enables metadata abstraction, making it easier to manage and analyze large
datasets.

g) Reporting & Analysis


The insights generated from Big Data processing need to be visualized using
reporting and analysis tools. These tools create dashboards, graphs, and reports
to support business intelligence (BI) and decision-making.
IBM Cognos
Oracle Hyperion
Tableau, Power BI, Looker
These tools help organizations understand trends, make predictions, and gain
actionable insights.

h) Orchestration
Orchestration tools automate and manage Big Data workflows, ensuring data
pipelines run efficiently. They enable data transformation, movement, and
scheduling across different sources and destinations. Some common orchestration
tools include:
Apache Oozie (Workflow scheduler for Hadoop)
Apache Airflow (Task orchestration and workflow automation)
Azure Data Factory (Cloud-based ETL and data movement service)
6. 5 V's of Big Data

1. Volume (Size of Data)


• Refers to the massive amount of data generated daily from sources like
social media, IoT devices, sensors, transactions, and logs.
• Examples: Facebook generates over 4 petabytes of data per day.
• The Large Hadron Collider produces 1 petabyte per second of data during
experiments.
• Challenges: Requires scalable storage solutions like Hadoop HDFS, AWS
S3, and Google BigQuery.
2. Velocity (Speed of Data Generation & Processing)
• It describes the speed at which data is generated, collected, and
processed in real time.
• With the development and usage of IoT devices and real-time data streams,
the velocity of data has expanded tremendously, demanding systems that can
process data instantly to derive meaningful insights.
• Examples:
• Stock market transactions require millisecond-level processing.
• IoT sensors stream continuous real-time data for predictive maintenance.
• Challenges: Needs low-latency data pipelines using Kafka, Apache Flink,
and Spark Streaming.
3. Variety (Different Data Formats & Sources)
• Big Data includes different types of data like structured data (found in
databases), unstructured data (like text, images, videos), and semi-structured
data (like JSON and XML) from various sources. This diversity requires
advanced tools for data integration, storage, and analysis.
• Examples:
• Structured: SQL databases, Excel files.
• Semi-Structured: JSON, XML, NoSQL databases.
• Unstructured: Images, videos, audio, social media posts.
• Challenges: Requires multi-format storage (HDFS, MongoDB) and
flexible processing frameworks (Spark, Hadoop).
4. Veracity (Data Quality & Accuracy)
• Veracity refers accuracy and trustworthiness of the data. Ensuring data
quality, addressing data discrepancies, and dealing with data ambiguity are
all major issues in Big Data analytics.
• Examples:
• Fake news and misinformation on social media.
• Sensor data errors due to hardware malfunctions.
• Challenges: Requires data cleansing, filtering, and validation using
AI/ML techniques.
5. Value (Business & Analytical Insights)
• The ability to convert large volumes of data into useful insights. Big Data's
ultimate goal is to extract meaningful and actionable insights that can lead to
better decision-making, new products, enhanced consumer experiences, and
competitive advantages.
• Examples:
• E-commerce: Personalized recommendations (Amazon, Netflix).
• Healthcare: Predicting disease outbreaks with Big Data analytics.
• Challenges: Requires AI-driven analytics, data monetization, and
predictive modeling.

You might also like