BDA Session 2
BDA Session 2
LAKSHMIKANTHA G C
BITS Pilani
Pilani Campus WILP, BITS PILANI
BITS Pilani
Pilani Campus
Previous Session
Big Data: An Overview
Types, Characteristics, Challenges, Applications, and Lifecycle
BITS Pilani
Pilani Campus
• Data collection,
• Processing,
• Modeling, and
• Decision-making.
Data Collection: Gathering raw data from various sources such as databases, sensors, or logs.
Processing: Cleaning and transforming raw data into a usable format for analysis.
Modeling: Applying statistical or machine learning models to derive insights or predictions from
processed data.
Decision-making: Utilizing the results from modeling to make informed decisions that can impact
business strategies or operations.
Techniques such as
• Statistical analysis,
• Machine learning are used to uncover patterns and trends in large datasets.
• Descriptive Analytics
• Diagnostic Analytics
• Predictive Analytics
• Predictive Analytics
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
• Each type of analytics builds on the previous one, offering increasing levels
of insight, from what happened to what should be done.
• The big data technology landscape includes tools for data management,
processing, and visualization, each playing a critical role in handling large
datasets.
Hadoop Ecosystem
• Hadoop, with its core
components HDFS,
MapReduce, and YARN, is
foundational for big data
processing. It allows scalable
storage and processing of
large datasets.
NoSQL Databases
• NoSQL databases like
MongoDB handle large
volumes of unstructured
data.
• These are essential for
applications needing flexible
and scalable data storage
solutions.
• A comprehensive pipeline
includes data ingestion, storage,
processing, analysis, and
visualization. For example,
creating a customer 360 view
integrates multiple data sources.
• Techniques like text analytics and sentiment analysis are essential for
extracting insights from unstructured data, such as social media feeds.
• Ensuring data security and privacy is crucial in big data analytics. Techniques
like encryption and access control are essential to protect sensitive data.
• In retail, big data analytics is used for customer segmentation and targeted
marketing, improving customer experiences and sales.
• The Internet of Things (IoT) generates vast amounts of data that big data
analytics processes for applications like smart cities and real-time monitoring.
• Edge computing processes data closer to the source, reducing latency and
enabling real-time analytics, essential for applications like autonomous
vehicles.
• Big data must be aligned with business strategy to drive innovation and
maintain a competitive edge, as illustrated by companies leveraging data for
strategic advantage.