0% found this document useful (0 votes)

36 views8 pages

Bda (Chapter 1)

It is basic knowledge about big data analysis

Uploaded by

patelnikhil0804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

Bda (Chapter 1)

It is basic knowledge about big data analysis

Uploaded by

patelnikhil0804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

BDA

CHAPTER 1

Introduction to Big Data (Simplified)

What is Data? Data refers to the quantities, characters, or symbols that a computer uses to perform
operations. It can be stored and shared through electrical signals or different storage media like
magnetic, optical, or mechanical devices.

Where Does Data Come From? Data comes from various sources, such as documents, images, audio,
software programs, and more.

Computer Data as Information Computer data is any information processed or stored by a computer.
It includes text files, images, audio, or software. The computer’s CPU processes this data, and it’s
saved in files on the hard disk.

Definition of Big Data Big Data refers to an extremely large and growing collection of data that is
too complex to be handled by regular data management systems. While regular data can be
measured in megabytes (MB) or gigabytes (GB), Big Data can reach sizes in petabytes (PB), which is
1,000,000,000,000,000 bytes.

Interesting Fact It is said that 90% of the world's data has been created in just the past three years.

Sources of Big Data

 Weather stations and satellites: Produce massive amounts of data for forecasting.

 Emails, blogs, and news websites: Continuously generate large data volumes.

 Social media: Posts, photos, videos, likes, and comments contribute to Big Data.

 Traffic data and GPS signals: Data from vehicles and maps.

 Digital pictures and videos: Cameras and smartphones produce a huge amount of data.

Characteristics of Big Data (Simplified)

Big Data has several key characteristics, often referred to as the 5 Vs:

1. Volume

o Definition: The amount of data is huge.

o Example: Social media generates tons of posts, videos, and photos every second.

2. Velocity

o Definition: Data is created and processed very quickly.

o Example: Online shopping sites process thousands of transactions every minute.

3. Variety

o Definition: Data comes in different formats like text, images, videos, and numbers.

o Example: A single app might store user messages, photos, and videos all in different
formats.
4. Veracity

o Definition: The data can sometimes be uncertain or incorrect.

o Example: Social media posts may contain false information, which needs to be
filtered.

5. Value

o Definition: The importance of the data in making decisions.

o Example: Companies analyze customer feedback to improve their products and

services.

These characteristics make Big Data challenging but also valuable for gaining insights.

Explanation of Big Data Characteristics:

1. Volume (Data at Rest)

o What it means: Big Data is huge. We're talking about data in terabytes or even
petabytes, not just megabytes or gigabytes.
o Example: The Internet of Things (IoT) generates enormous amounts of data, which
keeps growing.

2. Variety (Data in Many Forms)

o What it means: Data comes in many formats—structured (like databases) and

unstructured (like text, videos, or images).

o Example: Emails, social media posts, and videos all create different types of data that
need to be stored and analyzed.

3. Veracity (Data in Doubt)

o What it means: This refers to the accuracy and trustworthiness of data. Large
volumes of data can sometimes be incomplete or inaccurate.

o Example: Social media posts may contain incorrect information, which makes it
difficult to ensure data quality.

4. Velocity (Data in Motion)

o What it means: The speed at which data is generated, processed, and made
accessible. It’s important for real-time data analysis.

o Example: Data from social media, sensors, and mobile devices is generated and
shared continuously at high speeds.

5. Value (Data into Money)

o What it means: The goal is to turn raw data into something valuable, like insights or
revenue for businesses.

o Example: Analyzing customer data to understand behavior and make personalized

offers.
6. Visualization (Data Readable)

o What it means: Presenting data in an easy-to-understand way using graphs, charts,

and other visual tools.

o Example: Companies use charts to spot trends or patterns in their sales data.

7. Virality (Data Spread)

o What it means: How fast data or information spreads from one person to another,
often through social media.

o Example: A viral video that quickly spreads across the internet through social media
platforms.
These characteristics show what makes Big Data unique and challenging to manage but also very
powerful.

Challenges of Conventional Systems with Big Data

1. Volume of Data

o What it means: Data is growing rapidly from various sources like machines,
telecommunication, and sensors.

o Example: IBM estimates that by 2020, the world's data volume will reach about 35
zettabytes. Managing such vast amounts of data is challenging.

2. Processing and Analyzing

o What it means: Handling and analyzing large amounts of data is difficult and time-
consuming.

o Example: Extracting meaningful insights from huge data sets requires significant time
and effort, and it can be expensive due to the complexity and different formats of
data.

3. Management of Data

o What it means: Data comes in various forms—structured (like databases), semi-

structured (like XML files), and unstructured (like emails or social media posts).

o Example: Managing and integrating these different types of data is complex and
requires sophisticated systems.

In essence, conventional systems struggle to keep up with the growing volume of data, the
complexity of processing and analyzing it, and the challenge of managing diverse data formats.

Types of Big Data

1. Unstructured Data

 What it is: Data that doesn’t have a predefined format or structure.

 Characteristics: Often large and complex, making it difficult to process and analyze.

 Examples: Search results from Google, social media posts, emails, images, and videos.
 Challenges: Hard to derive value from this raw, unstructured data without advanced tools
and techniques.

2. Structured Data

 What it is: Data that is organized in a fixed format and can be easily stored, accessed, and
processed.

 Characteristics: Data is well-defined and fits neatly into tables or spreadsheets.

 Examples: Employee records in a database (like a table with Employee_ID, Name, Gender,
etc.).

 Advantages: Easy to manage and analyze using traditional database systems and tools.

3. Semi-structured Data

 What it is: Data that combines elements of both structured and unstructured data.

 Characteristics: Contains tags or markers to separate data elements, but doesn’t fit into a
rigid structure.

 Examples: XML files with tags (like <name>, <age>, etc.), web logs, and transaction histories.

 Advantages: More flexible than structured data, but still organized enough to be useful.

Differences Between Data Types

Factor Structured Data Semi-structured Data Unstructured Data

More flexible; some Highly flexible; no

Flexibility Less flexible; fixed schema
structure and tags predefined schema

No transaction
Transaction Matured techniques for Less mature; adapted
management; no
Management handling transactions from DBMS
concurrency

Query Complex queries and joins Queries possible but Mainly text-based queries;
Performance are possible less complex less efficient

Based on relational Based on text and character

Technology Based on XML, RDF
databases data

In summary, structured data is organized and easy to manage, semi-structured data offers some
flexibility with a bit of structure, and unstructured data is highly variable and challenging to process.

Intelligent Data Analysis (IDA) - Simple Explanation

What is IDA?

 IDA helps us find hidden patterns and useful information from large amounts of data. It uses
smart techniques to uncover insights that are not obvious at first glance.

Steps in IDA:

1. Data Preparation:
o What it means: Collect and clean the data you need from different sources.

o Example: If you're studying customer reviews, you collect all reviews and remove
any errors or irrelevant information.

2. Rules Finding or Data Mining:

o What it means: Look for patterns or rules in the cleaned data.

o Example: Discover that customers who buy running shoes often buy sports socks
too.

3. Result Validation and Explanation:

o What it means: Check if the patterns you found are accurate and explain them
clearly.

o Example: Confirm that your discovery about shoe and sock purchases is correct and
explain it in simple terms.

IDA Process:

 Collect Data: Gather information from different places.

 Analyze Data: Use methods to find patterns or trends.

 Explain Results: Make sure the findings are accurate and easy to understand.

Where is IDA Used?

 Banking: To find fraud or manage risks.

 Media: To understand what content people like and improve advertisements.

 Healthcare: To predict illnesses and improve patient care.

How Does It Work?

 Machine Learning: Teaches computers to learn from data and make predictions.

 Deep Learning: Handles complex data and recognizes intricate patterns.

In short, Intelligent Data Analysis helps us turn lots of data into useful information, making it easier
to make decisions and understand trends.

Traditional Data vs. Big Data - Simple Explanation

1. Confidentiality & Data Accuracy:

 Traditional Data: Easier to manage confidentiality with access control rules.

 Big Data: More complex, needs special mechanisms to ensure data confidentiality and
accuracy.

2. Data Relationship:

 Traditional Data: Relationships between data are clear and stable.

 Big Data: Relationships are often unknown or constantly changing.

3. Data Storage Size:

 Traditional Data: Stored in gigabytes to terabytes.

 Big Data: Stored in petabytes to zettabytes (very large amounts of data).

4. Types of Data:

 Traditional Data: Mostly structured (stored in databases like tables).

 Big Data: Includes structured, semi-structured, and unstructured data (like text, images,
videos).

5. Flexibility:

 Traditional Data: Based on fixed schemas (data models don’t change easily).

 Big Data: Dynamic, adaptable to different types of data without fixed structures.

6. Real-Time Analytics:

 Traditional Data: Data is processed periodically (hourly, daily).

 Big Data: Data is processed in real-time (every second).

7. Distributed Architecture:

 Traditional Data: Managed centrally.

 Big Data: Managed in a distributed system (spread across multiple locations).

Key Differences Between Traditional Data and Big Data

Traditional Data Big Data

Generated in enterprise systems (like ERP,

Generated from social media, sensors, etc.
CRM)

Smaller volume (Gigabytes-Terabytes) Larger volume (Petabytes-Zettabytes)

Deals with all types of data (structured,

Deals with structured data
unstructured)

Centralized storage and management Distributed storage and management

Easier to process Requires special tools and processing methods

Schema is fixed and static Schema is flexible and dynamic

Importance of Big Data:

 Big data helps organizations process and analyze massive amounts of information that
traditional systems can’t handle.

 By using big data, businesses can gain insights to improve decision-making and create value.
Case Study: Big Data Solutions (Easy Explanation)

Big Data helps companies handle huge amounts of data to improve their services and make smarter
decisions. Here's a simple case study to explain Big Data solutions.

E-Commerce Site XYZ

Situation: An online shopping site with 100 million users wants to:

 Give $100 vouchers to its top 10 customers who spent the most in the last year.

 Understand what these customers like to buy, so they can recommend similar products.

Problems:

 There’s a huge amount of customer data, and it’s difficult to store and analyze it all.

Solution:

1. Storage:
o Use Hadoop to store all the data across multiple computers. Hadoop can store a lot
of data cheaply.

2. Processing:

o Use MapReduce to go through all the data and find the top 10 customers quickly.

3. Analysis:

o Use tools like Pig and Hive to figure out the buying trends of these customers.

4. Cost:

o Hadoop is free, so it doesn’t cost much to set up and run.

Real-World Examples of Big Data Solutions

1. Walmart:

o Walmart uses Big Data to understand what products customers usually buy together.
With this information, they suggest related products to increase sales.

o They use tools like Hadoop to handle real-time data from their many stores around
the world.

2. Uber:

o Uber uses Big Data to track where their services are in high demand, adjusting prices
accordingly (surge pricing).

o This helps them make sure drivers are available where people need them most.

3. Netflix:

o Netflix uses Big Data to recommend shows and movies based on what users watch
and like. They even use this data to decide what new content to create.
o They use tools like Hadoop and Hive to analyze user data and improve
recommendations.

In simple terms, Big Data helps companies like Walmart, Uber, and Netflix understand customer
behavior, improve services, and make better business decisions.

The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Dbms Project
No ratings yet
Dbms Project
10 pages
Oracle Generative AI (1Z0-1127-25) Mock Test - Set - 4
No ratings yet
Oracle Generative AI (1Z0-1127-25) Mock Test - Set - 4
5 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
MCS 226 (2025)
No ratings yet
MCS 226 (2025)
3 pages
Turban Chap 03
No ratings yet
Turban Chap 03
30 pages
Algonquin College - HireAC - MyAccount - Cooperative Education - Co-Op Job Postings
No ratings yet
Algonquin College - HireAC - MyAccount - Cooperative Education - Co-Op Job Postings
3 pages
CISA Job Practices
No ratings yet
CISA Job Practices
4 pages
TaLWaR Blockchain-Based Trust Management Scheme For Smart Enterprises With Augmented Intelligence
No ratings yet
TaLWaR Blockchain-Based Trust Management Scheme For Smart Enterprises With Augmented Intelligence
8 pages
Big Data Lecture # 1
No ratings yet
Big Data Lecture # 1
15 pages
Ahalts White Paper
No ratings yet
Ahalts White Paper
16 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Final Paper
No ratings yet
Final Paper
48 pages
Course Lecturers
No ratings yet
Course Lecturers
6 pages
Research Paper1 EmmanuelR IJIRCCE
No ratings yet
Research Paper1 EmmanuelR IJIRCCE
4 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Unit 1
No ratings yet
Unit 1
57 pages
Unit 1
No ratings yet
Unit 1
44 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data 1
No ratings yet
Big Data 1
22 pages
Bigdata Writing
No ratings yet
Bigdata Writing
11 pages
Mountain View Community Hospital
No ratings yet
Mountain View Community Hospital
1 page
Unit 1
No ratings yet
Unit 1
76 pages
BD 1
No ratings yet
BD 1
15 pages
Unit-I - Big Data
No ratings yet
Unit-I - Big Data
29 pages
Unit 1
No ratings yet
Unit 1
56 pages
Big Type Data
No ratings yet
Big Type Data
4 pages
Learning Paths
No ratings yet
Learning Paths
25 pages
Unit 1
No ratings yet
Unit 1
107 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
LAB Manual: Course: CSC271: Database Systems
No ratings yet
LAB Manual: Course: CSC271: Database Systems
55 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
5 pages
Wondo Genet College of Forestry and Natural Resource Department of GIS
No ratings yet
Wondo Genet College of Forestry and Natural Resource Department of GIS
12 pages
Ch7 Complete ER-Model and ExtendedER-Model
No ratings yet
Ch7 Complete ER-Model and ExtendedER-Model
106 pages
DP 203t00 Data Engineering On Microsoft Azure - en
No ratings yet
DP 203t00 Data Engineering On Microsoft Azure - en
2 pages
Cloud Computing
No ratings yet
Cloud Computing
86 pages
Big Data and Supply Chain Management: A Review and Bibliometric Analysis
No ratings yet
Big Data and Supply Chain Management: A Review and Bibliometric Analysis
24 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Lab # 12 Server Site Scripting On MY SQL
No ratings yet
Lab # 12 Server Site Scripting On MY SQL
6 pages
Data Governance Scorecard
No ratings yet
Data Governance Scorecard
3 pages
Chat GPT
No ratings yet
Chat GPT
2 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Big Data
No ratings yet
Big Data
3 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
Big Data Lecture 1
No ratings yet
Big Data Lecture 1
22 pages
BDA Presentations M1 P1
No ratings yet
BDA Presentations M1 P1
40 pages
Unit 1
No ratings yet
Unit 1
24 pages
Xcon
No ratings yet
Xcon
12 pages
IN: J03, J04, J07, J08, J09 OUT: J05, J06: Q-Cell - HANWHA ZX0.2 Digital Germany
No ratings yet
IN: J03, J04, J07, J08, J09 OUT: J05, J06: Q-Cell - HANWHA ZX0.2 Digital Germany
4 pages
UNIT - 1 - DA - Notes
No ratings yet
UNIT - 1 - DA - Notes
51 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
19 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
Big Data Basics Unit 1
No ratings yet
Big Data Basics Unit 1
12 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Unit 1
No ratings yet
Unit 1
10 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
CS6302 Database Management Systems Syllabus Notes Question Papers 2 Marks with Answers Question Bank - CS6302 DBMS Study Materials _ Anna University TNEA - TANCET - TANCA 2015 Admission Guidance _ Study Materials.pdf
No ratings yet
CS6302 Database Management Systems Syllabus Notes Question Papers 2 Marks with Answers Question Bank - CS6302 DBMS Study Materials _ Anna University TNEA - TANCET - TANCA 2015 Admission Guidance _ Study Materials.pdf
8 pages
What Is Data
No ratings yet
What Is Data
20 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
Francesco Carrara - Programa Del Curso de Derecho Criminal - Tomo I PDF
100% (1)
Francesco Carrara - Programa Del Curso de Derecho Criminal - Tomo I PDF
331 pages
DBMS Important Topics
100% (1)
DBMS Important Topics
1 page
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Normalized vs. Denormalized: Normalization
No ratings yet
Normalized vs. Denormalized: Normalization
3 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
University Institute of Computing: Big Data Analytics 21CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 21CAH-782
13 pages
Module 1
No ratings yet
Module 1
21 pages
K2 BP 101 - Module 00 - K2101 Roadmap
No ratings yet
K2 BP 101 - Module 00 - K2101 Roadmap
8 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
BDA Unit-1
No ratings yet
BDA Unit-1
56 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Types of Hypertext
No ratings yet
Types of Hypertext
2 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Unit 1
No ratings yet
Unit 1
59 pages
Classification
No ratings yet
Classification
21 pages
BDT 1
No ratings yet
BDT 1
49 pages
Assignment: Advance Marketing Research & Data Analytics
No ratings yet
Assignment: Advance Marketing Research & Data Analytics
4 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
64 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages

Bda (Chapter 1)

Uploaded by

Bda (Chapter 1)

Uploaded by

BDA

Introduction to Big Data (Simplified)

Sources of Big Data

Characteristics of Big Data (Simplified)

o Definition: The amount of data is huge.

o Definition: Data is created and processed very quickly.

o Example: Online shopping sites process thousands of transactions every minute.

o Definition: The data can sometimes be uncertain or incorrect.

o Definition: The importance of the data in making decisions.

o Example: Companies analyze customer feedback to improve their products and

Explanation of Big Data Characteristics:

1. Volume (Data at Rest)

2. Variety (Data in Many Forms)

o What it means: Data comes in many formats—structured (like databases) and

3. Veracity (Data in Doubt)

4. Velocity (Data in Motion)

5. Value (Data into Money)

o Example: Analyzing customer data to understand behavior and make personalized

o What it means: Presenting data in an easy-to-understand way using graphs, charts,

7. Virality (Data Spread)

Challenges of Conventional Systems with Big Data

2. Processing and Analyzing

o What it means: Data comes in various forms—structured (like databases), semi-

Types of Big Data

 What it is: Data that doesn’t have a predefined format or structure.

 Characteristics: Data is well-defined and fits neatly into tables or spreadsheets.

Differences Between Data Types

Factor Structured Data Semi-structured Data Unstructured Data

More flexible; some Highly flexible; no

Based on relational Based on text and character

Intelligent Data Analysis (IDA) - Simple Explanation

2. Rules Finding or Data Mining:

o What it means: Look for patterns or rules in the cleaned data.

3. Result Validation and Explanation:

 Collect Data: Gather information from different places.

 Analyze Data: Use methods to find patterns or trends.

Where is IDA Used?

 Banking: To find fraud or manage risks.

 Media: To understand what content people like and improve advertisements.

 Healthcare: To predict illnesses and improve patient care.

How Does It Work?

 Deep Learning: Handles complex data and recognizes intricate patterns.

Traditional Data vs. Big Data - Simple Explanation

1. Confidentiality & Data Accuracy:

 Traditional Data: Easier to manage confidentiality with access control rules.

 Traditional Data: Relationships between data are clear and stable.

3. Data Storage Size:

 Traditional Data: Stored in gigabytes to terabytes.

 Big Data: Stored in petabytes to zettabytes (very large amounts of data).

 Traditional Data: Mostly structured (stored in databases like tables).

 Traditional Data: Data is processed periodically (hourly, daily).

 Big Data: Data is processed in real-time (every second).

 Traditional Data: Managed centrally.

 Big Data: Managed in a distributed system (spread across multiple locations).

Key Differences Between Traditional Data and Big Data

Traditional Data Big Data

Generated in enterprise systems (like ERP,

Smaller volume (Gigabytes-Terabytes) Larger volume (Petabytes-Zettabytes)

Deals with all types of data (structured,

Centralized storage and management Distributed storage and management

Easier to process Requires special tools and processing methods

Schema is fixed and static Schema is flexible and dynamic

Importance of Big Data:

E-Commerce Site XYZ

o Hadoop is free, so it doesn’t cost much to set up and run.

Real-World Examples of Big Data Solutions

You might also like