0% found this document useful (0 votes)

17 views5 pages

BDA UNIT-1 (Lecture-1)

Big data refers to large, fast, and complex data sets that traditional methods struggle to process, characterized by volume, velocity, variety, veracity, and value. The evolution of data management has transitioned from early file management to modern data governance and cloud computing, with big data management focusing on data collection, storage, and analytics. Applications of big data span various industries including healthcare, finance, and retail, while challenges include data security, scalability, and integration.

Uploaded by

Surya Shastri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

BDA UNIT-1 (Lecture-1)

Uploaded by

Surya Shastri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lecture-1

Introduc on to Big Data

- Big data refers to data that is so large, fast and complex that is diﬃcult or impossible to
process using tradi onal methods.
- Big data is about data volume and large data set’s, measured in terms of TB or PB.

Characteris cs of Big data:

1. Volume: The vast amount of data generated every second.
E.g. big data is its high volume. This describes the huge amount of data that is available for
collec on and produced from a variety of sources and devices on a con nuous basis.
2. Velocity: Big data velocity refers to the speed at which data is generated. Today, data is o en
produced in real me or near real me, and therefore, it must also be processed, accessed,
and analysed at the same rate to have any meaningful impact.
3. Variety: Data is heterogeneous, meaning it can come from many different sources and can
be structured, unstructured, or semi-structured. More tradi onal structured data (such as
data in spreadsheets or rela onal databases) is now supplemented by unstructured text,
images, audio, video files, or semi-structured formats like sensor data that can’t be organized
in a fixed data schema.
4. Veracity Veracity refers to the accuracy and reliability of data. Because big data comes in
such great quan es and from various sources, it can contain noise or errors, which can lead
to poor decision-making.
5. Value Value refers to the real-world benefits organiza ons can get from big data. These
benefits include everything from op mizing business opera ons to iden fying new
marke ng opportuni es.

Sources of Big Data

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of
data on a day to day basis as they have billions of users worldwide.

o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from
which users buying trends can be traced.

o Weather Sta on: All the weather sta on and satellite gives very huge data which are stored
and manipulated to forecast weather.

o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.

o Share Market: Stock exchange across the world generates huge amount of data through its
daily transac on.
The evolu on of data management

1. Early Days: File Management (1950s-60s)

i. Punch Cards and Magne c Tapes: Data was primarily stored on punch
cards
ii. magne c tapes. Manual processes were labour-intensive, and data
retrieval was slow.
iii. Flat Files: Organiza ons began using flat file databases, where data was
stored in simple text files with limited structure.
2. . Database Management Systems (1970s)
i. Hierarchical and Network Databases: Early database models included
hierarchical (e.g., IBM's IMS) and network databases, which allowed for
more complex rela ons among data.
ii. Rela onal Databases: The introduc on of rela onal database
management systems (RDBMS) by Edgar F. Codd revolu onized data
management. The Structured Query Language (SQL) became the
standard for managing and querying data.
3. Emergence of Data Warehousing (1980s-90s)
i. Data Warehousing: The concept of data warehousing emerged, allowing
organiza ons to consolidate data from various sources for repor ng and
analysis. Tools like ETL (Extract, Transform, Load) processes became
popular.
ii. Decision Support Systems (DSS): Analy cal tools and systems began to
interact with data warehouses, enabling be er business decision-
making.
4. The Rise of Big Data and NoSQL (2000s)
i. Big Data Technologies: With the explosion of data generated by social
media, mobile devices, and the internet, tradi onal RDBMS struggled to
handle volume, velocity, and variety. Technologies like Hadoop and Spark
emerged to manage big data.
ii. NoSQL Databases: Non-rela onal databases (e.g., MongoDB, Cassandra)
provided flexible data models for unstructured or semi-structured data,
allowing organiza ons to scale out data storage efficiently.
5. Cloud Compu ng and Data Management (2010s)
i. Move to the Cloud: Cloud services revolu onized data storage and
management, allowing businesses to scale resources dynamically. Major
providers like AWS, Microso Azure, and Google Cloud offered database
services that emphasized elas city and cost-effec veness.
ii. Data Lakes: The concept of data lakes emerged, enabling organiza ons
to store vast amounts of raw data in its na ve format, allowing for
analy cs and processing.
6. Modern Data Management and Governance (2020s)
i. DataOps and Agile Data Management: DataOps methodologies began
to emerge, focusing on the agile and collabora ve aspects of data
management, similar to DevOps.
ii. Data Governance and Compliance: As data privacy regula ons like GDPR
and CCPA arose, organiza ons priori zed data governance, data quality,
and compliance management.
iii. AI and Machine Learning: The integra on of AI and machine learning
into data management processes improved data insights, automa on,
and predic ve analy cs.
7. Future Trends
i. Decentralized and Federated Data Systems: With the rise of blockchain
technologies and decentralized data architectures, future data
management may involve more distributed and secure ways to handle
data.
ii. Data Fabric and Integra on: Concepts like data fabric aim to provide
seamless integra on, accessibility, and management across various data
sources and formats.
iii. Enhanced Automa on: Con nued advancements in AI and automa on
are likely to drive more intelligent data management tools that reduce
manual interven on and enhance efficiency.

Big data management

Big data management is the systematic process of data collection, data processing and data analysis
that organizations use to transform raw data into actionable insights.
1. Big data collection
i. This stage involves capturing the large volumes of informa on from various sources
that cons tute big data.
ii. To handle the speed and diversity of incoming data, organiza ons o en rely on
specialized big data technologies and processes such as Apache Ka a for real- me
data streaming and Apache NiFi for data flow automa on.
iii. This stage also involves capturing metadata—informa on about the data’s origin,
format and other characteris cs. Metadata can provide essen al context for future
organizing and processing data down the line.
2. Big data storage
1. Data lakes
- Data lakes are low-cost storage environments designed to handle massive amounts of raw
structured and unstructured data. Data lakes generally don’t clean, validate or normalize
data. Instead, they store data in its na ve format, which means they can accommodate many
different types of data and scale easily.
- Data lakes are ideal for applica ons where the volume, variety and velocity of big data are
high and real- me performance is less important. They’re commonly used to support AI
training, machine learning and big data analy cs. Data lakes can also serve as general-
purpose storage spaces for all big data, which can be moved from the lake to different
applica ons as needed.

2. Data warehouses
- Data warehouses aggregate data from mul ple sources into a single, central and consistent
data store. They also clean data and prepare it so that it is ready for use, o en by
transforming the data into a rela onal format. Data warehouses are built to support data
analy cs, business intelligence and data science eﬀorts.
- warehouses are mainly used to make some subset of big data readily available to business
users for BI and analysis.

3. Data lakehouses
- Data lakehouses combine the ﬂexibility of data lakes with the structure and querying
capabili es of data warehouses, enabling organiza ons to harness the best of both solu on
types in a uniﬁed pla orm. Lakehouses are a rela vely recent development, but they are
becoming increasingly popular because they eliminate the need to maintain two disparate
data systems.

Tools & Technologies used for Storage: Hadoop Distributed File System (HDFS), Amazon S3,
Google Cloud Storage

3. Big data analy cs

Big data analy cs are the processes organiza ons use to derive value from their big data. Big
data analy cs involves using machine learning, data mining and sta s cal analysis tools to
iden fy pa erns, correla ons and trends within large datasets.

Tools & Technologies used: Tableau, Power BI, Python (Pandas, NumPy), R.

4. Big data processing tools

i. Organizations can use a variety of big data processing tools to transform raw
data into valuable insights.
ii. The three primary big data technologies used for data processing include:

1. Hadoop
Hadoop is an open-source framework that enables the distributed storage and
processing of large datasets across clusters of computers. This framework allows
the Hadoop Distributed File System (HDFS) to eﬃciently manage large amounts of data.

2. Apache Spark

Apache Spark is known for its speed and simplicity, particularly when it comes to real-
time data analytics. Because of its in-memory processing capabilities, it excels in data
mining, predictive analytics and data science tasks. Organizations generally turn to it for
applications that require rapid data processing, such as live-stream analytics.

For example, a streaming platform might use Spark to process user activity in real time
to track viewer habits and make instant recommendations.

3. NoSQL databases

NoSQL databases are designed to handle unstructured data, making them a flexible
choice for big data applications. Unlike relational databases, NoSQL solutions—such as
document, key-value and graph databases—can scale horizontally. This flexibility makes
them critical for storing data that doesn’t fit neatly into tables.

For example, an e-commerce company might use a NoSQL document database to

manage and store product descriptions, images and customer reviews.
Tools & Technology used for processing: Apache Hadoop, Apache Spark, Flink, Storm.

Applica ons of Big Data

1. Healthcare:

o Disease predic on, personalized medicine, pa ent monitoring.

2. Finance:

o Fraud detec on, algorithmic trading, credit scoring.

3. Retail and E-commerce:

o Personalized recommenda ons, inventory op miza on.

4. Social Media and Marke ng:

o Sen ment analysis, targeted adver sing, inﬂuencer analysis.

5. Smart Ci es:

o Traﬃc management, energy eﬃciency, waste management.

6. Manufacturing:

o Predic ve maintenance, quality control, supply chain op miza on.

Challenges of Big Data

 Data Security and Privacy: Protec ng sensi ve data from breaches and ensuring compliance
with regula ons (GDPR, CCPA).

 Scalability: Managing and processing exponen ally growing data eﬃciently.

 Data Integra on: Combining data from mul ple sources with varying formats.

 Real- me Processing: Handling and analyzing data streams in real- me.

 Data Governance: Ensuring data quality, lineage, and compliance.

Poshan Tracker 23.6 New Updates
No ratings yet
Poshan Tracker 23.6 New Updates
36 pages
Gtag Auditing Network and Comms MGMT 2nd Ed Rev
No ratings yet
Gtag Auditing Network and Comms MGMT 2nd Ed Rev
46 pages
Bahasa Inggris: " Procedure Text "
No ratings yet
Bahasa Inggris: " Procedure Text "
4 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
Note Autodesk Revit Mechanical - Essential PDF
No ratings yet
Note Autodesk Revit Mechanical - Essential PDF
38 pages
Maps For Spelljammer and Light of Xaryxis
No ratings yet
Maps For Spelljammer and Light of Xaryxis
25 pages
CH 1
No ratings yet
CH 1
218 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
55 pages
Multimedia-Unit 3
No ratings yet
Multimedia-Unit 3
23 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
No ratings yet
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Message-6 2
No ratings yet
Message-6 2
226 pages
Chapter III
No ratings yet
Chapter III
52 pages
Technical Bulletin 99
No ratings yet
Technical Bulletin 99
23 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Smooth Coefficient Estimation of A Seemingly Unrelated Regression
No ratings yet
Smooth Coefficient Estimation of A Seemingly Unrelated Regression
15 pages
Benchmarking For Comparative Evaluation of RP Systems and Processes
No ratings yet
Benchmarking For Comparative Evaluation of RP Systems and Processes
13 pages
MX12 Ug
No ratings yet
MX12 Ug
210 pages
Amortized
No ratings yet
Amortized
31 pages
Report Final 3.1
No ratings yet
Report Final 3.1
27 pages
Accesing IO
No ratings yet
Accesing IO
3 pages
Digital Signal
No ratings yet
Digital Signal
6 pages
SchneiderF M DomahidiE DietrichF 2020 Whatisimportantwhenweevaluatemovies
No ratings yet
SchneiderF M DomahidiE DietrichF 2020 Whatisimportantwhenweevaluatemovies
12 pages
Aphs Course Details
No ratings yet
Aphs Course Details
4 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
JavaScript - If... Else Statement
No ratings yet
JavaScript - If... Else Statement
4 pages
ADBMS-Module 1 Notes
No ratings yet
ADBMS-Module 1 Notes
18 pages
Learn HTML - Semantic HTML Cheatsheet - Codecademy
No ratings yet
Learn HTML - Semantic HTML Cheatsheet - Codecademy
2 pages
Bigdata Fundamentals
No ratings yet
Bigdata Fundamentals
82 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Graphics Chapter Two
No ratings yet
Graphics Chapter Two
31 pages
4-Big Data Management
No ratings yet
4-Big Data Management
40 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
737 Book NG 22 303
100% (2)
737 Book NG 22 303
76 pages
Structured Query Language (SQL) : Textbook Reference Database Management Systems: Chapter 5
No ratings yet
Structured Query Language (SQL) : Textbook Reference Database Management Systems: Chapter 5
146 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
23 pages
Data Analytics Notes Unit 1
No ratings yet
Data Analytics Notes Unit 1
23 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Chapter-4 2
No ratings yet
Chapter-4 2
30 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
8625 De3121 Bba
No ratings yet
8625 De3121 Bba
92 pages
Big Data Basics Unit 1
No ratings yet
Big Data Basics Unit 1
12 pages
Bda 1
No ratings yet
Bda 1
26 pages
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
No ratings yet
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
47 pages
Big Data Primer
No ratings yet
Big Data Primer
17 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Bigdata Units
No ratings yet
Bigdata Units
80 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
What Is Big Data87
No ratings yet
What Is Big Data87
4 pages
Big Data
No ratings yet
Big Data
30 pages
Data and Information Management
No ratings yet
Data and Information Management
18 pages
Introduction To Embedded Systems: Printed Book
No ratings yet
Introduction To Embedded Systems: Printed Book
1 page
Hamid Seminar
No ratings yet
Hamid Seminar
57 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Unit-1.1-Introduction To Big Data
No ratings yet
Unit-1.1-Introduction To Big Data
50 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1 - Bda
No ratings yet
Unit 1 - Bda
21 pages
Quectel BG96: Lte Cat Nb1
No ratings yet
Quectel BG96: Lte Cat Nb1
2 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
Big Data
No ratings yet
Big Data
31 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Unit 1
No ratings yet
Unit 1
11 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
An Introduction To Near-Field Communication and The Contactless Communication API
No ratings yet
An Introduction To Near-Field Communication and The Contactless Communication API
17 pages
Oracle 1Z0-083 v2022-05-21 q220 - 2
No ratings yet
Oracle 1Z0-083 v2022-05-21 q220 - 2
75 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Big Data
No ratings yet
Big Data
31 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
What Is Big Data
No ratings yet
What Is Big Data
8 pages
Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
VJ628D Service Manual
No ratings yet
VJ628D Service Manual
423 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Alpha-Test Questionnaire
No ratings yet
Alpha-Test Questionnaire
4 pages
The Future of Database Management Technologies: Harnessing the Power of Data: Insights and Strategies in Database Management
From Everand
The Future of Database Management Technologies: Harnessing the Power of Data: Insights and Strategies in Database Management
Robert Lewis
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet

BDA UNIT-1 (Lecture-1)

Uploaded by

BDA UNIT-1 (Lecture-1)

Uploaded by

Lecture-1

Introduc on to Big Data

Characteris cs of Big data:

Sources of Big Data

1. Early Days: File Management (1950s-60s)

Big data management

3. Big data analy cs

4. Big data processing tools

For example, an e-commerce company might use a NoSQL document database to

Applica ons of Big Data

o Disease predic on, personalized medicine, pa ent monitoring.

o Fraud detec on, algorithmic trading, credit scoring.

3. Retail and E-commerce:

o Personalized recommenda ons, inventory op miza on.

4. Social Media and Marke ng:

o Sen ment analysis, targeted adver sing, inﬂuencer analysis.

o Traﬃc management, energy eﬃciency, waste management.

o Predic ve maintenance, quality control, supply chain op miza on.

Challenges of Big Data

 Scalability: Managing and processing exponen ally growing data eﬃciently.

 Real- me Processing: Handling and analyzing data streams in real- me.

 Data Governance: Ensuring data quality, lineage, and compliance.

You might also like