Introduction To Big Data BS (CS) 6 Lecture # 3: Dr. Syed Attique Shah (PH.D.)

This document provides an introduction to common problems in big data and characteristics of big data including the 3 V's - volume, variety, and velocity. It discusses that big data is characterized by huge amounts of data in different formats that must be processed quickly. Beyond the 3 V's, more characteristics are introduced including veracity, which refers to the quality and uncertainties in data. The challenges of big data involve processing large and diverse datasets in real-time.

Uploaded by

Ahsan Iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views32 pages

Introduction To Big Data BS (CS) 6 Lecture # 3: Dr. Syed Attique Shah (PH.D.)

Uploaded by

Ahsan Iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to Big Data

BS (CS) 6th
Lecture # 3

Dr. Syed Attique Shah (Ph.D.)

Assistant Professor
Department of Information Technology,
Faculty of Information & Communication Technologies (FICT),
BUITEMS, Quetta, Pakistan

1
Common Problems in Big Data
• Big data is a blanket term that is used to refer to any collection of data so
large and complex that it exceeds the processing capability of conventional
data management systems and techniques.
• Data plays a very important role in every aspect of life.
• In most of the situations there were some common problems
• One common problem is the size of data is extremely huge
• Traditional systems like RDBMS they do not scale up to the limit of this
particular extent
• Even so doing it will be extremely costly to do it with RDBMS
• Other common problem which we have seen is that data is split across
multiple systems
• It's not centralized in most of the cases
2
Common Problems in Big Data
• The intention is to make a single system that can
handle this diversity and volume of data as well as
perform different operations of data
• We conclude like big data is a challenge, consisting of
three core problems also called the 3 V’s of Big Data
• Big data is commonly characterized using a number of
V’s (mainly 6).

3
3’Vs of Big Data

• The 3 V’s of Big Data are:

• Volume
• Variety
• Velocity

4
Characteristics of
Big Data - Volume
• Volume – Refers to the
amount of data being
generated or present.
• Volume of data these days we
are receiving is in petabytes
or even exabytes which is
beyond the capacity of a
storage of a single machine

5
Characteristics of Big Data - Volume
• Volume is the big data dimension that relates to the sheer size of big data.
• This volume can come from large datasets being shared or many small data
pieces and events being collected over time.
• Every minute 204 million emails are sent, 200,000 photos are uploaded, and
1.8million likes are generated on Facebook.
• The size and the scale of storage for big data can be massive. Peta, Exa and
Yotta are used to define size
• CERN's large hadron collider generates 15 petabytes a year.
• According to a big data company called EMC, digital data, will grow by a
factor of 44 until the year 2020 i.e. 35.2 zettabytes.
• A zettabyte is 1 trillion GB, that's 10 to the power of 21.

6
Characteristics of Big Data - Volume

7
Characteristics of Big Data - Volume
• What is the relevance of this much data in our world?
• The idea is to understand that businesses and organizations
are collecting and leveraging large volumes of data to improve
their end products, whether it is safety, reliability,
healthcare, or governance.
• Challenges involve the cost, scalability, and performance
related to their storage, access, and processing.

8
Characteristics of
Big Data - Variety
• Variety – Refers to the types
of data being generated /
present
• Data which we receive is not
usually in the same format
• It can get be divided into
three categories:
• Structured
• Semi-Structured
• Unstructured like a flat file
9
Characteristics of Big Data - Variety
• It refers to increased diversity of data.
• Image data, text data, network data, geographic
maps, computer generated simulations are only a
few of the types of data we encounter everyday.
• The heterogeneity of data can be characterized
along several dimensions.
• A satellite image of wildfires from NASA is very different
from tweets sent out by people who are seeing the fire
spread.

10
Characteristics of Big Data - Variety
• Sometimes we also use qualitative versus quantitative
measures. For example, age can be a number or we
represent it by terms like infant, juvenile, or adult.
• Think of an email collection
• Sender, receiver, date - Structured
• Body of the email - Text
• Attachments - Multimedia
• Who-sends-to-whom - Network
• A current email cannot reference -
Semantics

11
Characteristics of Big Data - Variety
• Impact of data variety
• Harder to ingest
• Difficult to create common storage
• Difficult compare and match data across
variety
• Difficult to integrate
• Management and policy challenges

12
Characteristics of Big Data - Velocity
• Velocity – Refers to the speed at which data is generated
and processing is required.
• The amount of data being generated is increasing at a
rapid pace

13
Velocity refers to the increasing Speed of Storing data
speed at which big data is
Speed of Creating data
created and the increasing speed
at which the data needs to be Speed of Analyzing
stored and analyzed. data

Characteristics If a business cannot take advantage of the data

of Big Data - as it gets generated, or at the speed analysis of
it is needed, they often miss opportunities.

Velocity
Batch Processing
Processing can be done as Real Time Processing
Streaming Analysis

14
Characteristics of Big Data - Velocity
• Realtime Processing
• Instantly capture streaming data
• Feed real time to machines
• Process Realtime
• Act
• Sensors and smart devices monitoring the human body can
detect abnormalities in real time and trigger immediate
action, potentially saving lives.

15
Characteristics of Big Data - Velocity
• Batch Processing
• Collect Data
• Clean Data
• Feed in Chunks
• Wait
• Act
• This type of processing is still very common today, but it can be catastrophic to some
businesses.
• Organizations which make decisions on latest data are more likely to hit the target.
• For this reason it's important to match the speed of processing with the speed of
information generation, and get real time decision making power.

16
Characteristics of Big Data - Velocity
• Streaming Analysis
• Decisions based on processing of already acquired data such as
batch processing, may give an incomplete picture. And hence, the
applications need real-time status of the context at hand. That is,
streaming analysis.
• Streaming data gives information on what's going on right now.
• Streaming data has velocity, meaning it gets generated at various rates.
• Analysis of such data in real time gives agility and adaptability to maximize
benefits you want to extract.

17
Characteristics of Big Data - Velocity

18
More V’s of Big Data
• We have huge amounts of data in different formats, and varying quality
which must be processed quickly.
• More Vs have been introduced to the big data community as we discover new
challenges and ways to define big data.
• They include:
• Veracity
• Value
• Valence

19
More V’s of Big Data
• Veracity – Refers to the biases, noise, and abnormality in data. Or,
better yet, It refers to the often unmeasurable uncertainties and
truthfulness and trustworthiness of data.
• Data can become invalid if proper preprocessing hasn’t been
performed.
• Filtering the invalid data out increases the its trustworthiness

20
Characteristics of Big Data - Veracity
• Veracity of Big Data refers to the quality of the data.
• It sometimes gets referred to as validity or volatility referring to the lifetime
of the data.
• Veracity is very important for making big data operational.
• Big data can be noisy and uncertain.
• It can be full of biases, abnormalities and it can be imprecise.
• Data is of no value if it's not accurate, the results of big data analysis are
only as good as the data being analyzed.
• Uncertainty of the data increases as we go from enterprise data to sensor
data.
• Traditional enterprise data in warehouses have standardized quality solutions
like master processes for ETL.

21
Characteristics of Big Data - Veracity

22
Characteristics of Big Data - Veracity
• As enterprises started incorporating less structured and unstructured people
and machine data into their big data solutions, the data become messier and
more uncertain.
• In January 2013, Google Friends actually estimated almost twice as
many flu cases as was reported by CDC, the Centers for Disease
Control and Prevention.
• The primary reason behind this was that Google Flu Trends used a big data
on the internet and did not account properly for uncertainties about the data.
• This resulted in what we call an over estimation.

23
More V’s of Big Data

• Valence – Refers to the connectedness of big data and the ways in which
data can be used and formatted.
• Data which can be used for one specific task will be of lessor value.
• Data should be acquired and formatted in a such a way to increase its usage
as the velocity of data is very fast.

24
Characteristics of Big Data - Valence
• Valence refers to Connectedness. The more connected data is, the higher it's
valences.
• Data items are often directly connected to one another.
• A city is connected to the country it belongs to.
• Two Facebook users are connected because they are friends.
• An employee is connected to his work place.
• Data could also be indirectly connected.
• Two scientists are connected, because they are both physicists.
• For data collection, valence measures the ratio of actually connected data items
to the possible number of connections that could occur within the collection.
• The most important aspect of valence is that the data connectivity increases
over time.

25
Characteristics of Big Data - Valence
• The series of network graphs comes from social experiment where
scientists attending a conference were asked to meet other scientists
they did not know before. After several rounds of meetings, they
found new connections shown by their red edges.

• A high valence data set is denser. This makes many regular, analytic
critiques very inefficient.
• More complex analytical methods must be adopted to account for the
increasing density.

26
Characteristics of Big Data - Valence
• Valence: Challenges
• More complex data exploration algorithms
• Modeling and prediction of valence changes
• Group event detection
• Emergent behavior analysis

27
Characteristics of Big Data - Value
• We described the five ways which are considered to be dimensions of
big data.
• At the heart of the big data challenge is turning all of the other
dimensions into truly useful business value.
• The idea behind processing all this big data in the first place is to
bring value to the problem at hand.

28
Characteristics of Big Data - Value
• Value – Refers to the quality and enhanced decision for
an individual / organization.
• Data is of no use if it has not brought into the value
• Should be able to have some insightful decision from the
data

29
Characteristics of Big Data – Example
(Assignment)
• Let's imagine now that you're part of a company called Eglence Inc.
• One of the products of Eglence Inc is a highly popular mobile game called
Catch the Pink Flamingo.
• It's a multi-user game where the users have to catch special types of pink
flamingos that randomly pop up on the world map on their screens based on
the mission that gets updated randomly.
• The game is played by millions of people online throughout the world.
• One of the goals of the game is to form a network of players to collectively
cover the world map with pink flamingo sightings and compete other groups.
• Users can pick their groups based on player stats.
• The game's website sends free cool stuff to registered users.
30
Characteristics of Big Data - Example
• Registration requires users to enter demographic information such gender,
year of birth, city, highest education, and things like that.
• However, most of the users enter inaccurate information about themselves,
just like most of us do.
• To help improve the game, the game collects real-time usage activity data
from each player and feeds them to it's data servers.
• The players of this game are enthusiastically active on social media, and
have strong associations with the game.
• A popular Twitter hashtag for this game is, CatchThePinkFlamingo, which
gets more than 200,000 mentions worldwide per day.
• There are strong communities of users who meet via social media and get
together to play the game.

31
Characteristics of Big Data - Example
• Now, imagine yourself as the big data solutions architect for Fun Games Inc.
• There are definitely examples of all three types of data sources in this example.
• The mobile app generates data for the analysis of user activity.
• Twitter conversations of players forma rich source of unstructured data from
people.
• And the customer and game records are examples of data that this organization
collects.
• This is a challenging big data example where all characteristics of big data are represented.
• There are high volumes of player, game and Twitter data, which also speaks to the variety
of data.
• The data streams from the mobile app, website, and social media in real-time, which can
be defined as high velocity data.
• The quality of demographic data users enter is not clear, and there are networks of players
which are related to the balance of big data. 32

Cox Bill
No ratings yet
Cox Bill
5 pages
Documented Disciplinary and Grievance Handling Procedure
No ratings yet
Documented Disciplinary and Grievance Handling Procedure
7 pages
Mini Tennis Coaching Manual
100% (5)
Mini Tennis Coaching Manual
110 pages
The Body Meridians
100% (14)
The Body Meridians
61 pages
Week 5 - Big Data Analytics - 2025 2025-05-22 02 - 53 - 39
No ratings yet
Week 5 - Big Data Analytics - 2025 2025-05-22 02 - 53 - 39
94 pages
Big Data Lec1
No ratings yet
Big Data Lec1
37 pages
Lesson 3 Characteristics of Big Data
No ratings yet
Lesson 3 Characteristics of Big Data
55 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
Big Data and Its Characteristics
No ratings yet
Big Data and Its Characteristics
21 pages
Unit I
No ratings yet
Unit I
25 pages
BigData BCom
No ratings yet
BigData BCom
57 pages
BigData - BCom Unit 1
No ratings yet
BigData - BCom Unit 1
9 pages
Overview of Big Data
No ratings yet
Overview of Big Data
4 pages
Class - Big Data UNIT-I
No ratings yet
Class - Big Data UNIT-I
40 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
Big Data 101
No ratings yet
Big Data 101
18 pages
Unit 1 What Is Big Data
No ratings yet
Unit 1 What Is Big Data
26 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
Lecture 1 - Intro To Big Data Analytics
No ratings yet
Lecture 1 - Intro To Big Data Analytics
42 pages
Big Data
No ratings yet
Big Data
2 pages
CC&BD Unit 3
No ratings yet
CC&BD Unit 3
16 pages
Big Data
No ratings yet
Big Data
34 pages
BDS Module-1
No ratings yet
BDS Module-1
59 pages
Life Cycle
No ratings yet
Life Cycle
170 pages
BD Unit 1
No ratings yet
BD Unit 1
63 pages
Big Data Intro PDF
No ratings yet
Big Data Intro PDF
93 pages
Project FInal Report
No ratings yet
Project FInal Report
67 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
Big Data 1
No ratings yet
Big Data 1
22 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Reding Report
No ratings yet
Reding Report
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
96 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
Arora 2016
No ratings yet
Arora 2016
6 pages
Lecture Notes - Introduction To Big Data
0% (1)
Lecture Notes - Introduction To Big Data
8 pages
A Study of Big Data Characteristics: October 2016
No ratings yet
A Study of Big Data Characteristics: October 2016
5 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Unit 1
No ratings yet
Unit 1
56 pages
Big Data Lecture 1
No ratings yet
Big Data Lecture 1
22 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
6 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BDA GTU Study Material Presentations Unit-1 09082021103431AM
No ratings yet
BDA GTU Study Material Presentations Unit-1 09082021103431AM
57 pages
Big Data
No ratings yet
Big Data
54 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
Big Data
No ratings yet
Big Data
7 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages
3 - Big Data Characteristics
No ratings yet
3 - Big Data Characteristics
32 pages
CS8091 BDA Unit I LectureNotes
No ratings yet
CS8091 BDA Unit I LectureNotes
73 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
Big Data Analytic
No ratings yet
Big Data Analytic
10 pages
BCC (IEEE Format) Big Data
No ratings yet
BCC (IEEE Format) Big Data
2 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
23 pages
2020big Data
No ratings yet
2020big Data
60 pages
Big Data Report
No ratings yet
Big Data Report
10 pages
Bda M1
No ratings yet
Bda M1
111 pages
Module 1.1 - Introduction To Big Data
No ratings yet
Module 1.1 - Introduction To Big Data
18 pages
CHAPTER Five Big Data and Cloud Computing
No ratings yet
CHAPTER Five Big Data and Cloud Computing
13 pages
Introduction To Big Data BS (CS) 6 Lecture # 5: Dr. Syed Attique Shah (PH.D.)
No ratings yet
Introduction To Big Data BS (CS) 6 Lecture # 5: Dr. Syed Attique Shah (PH.D.)
38 pages
Introduction To Big Data BS (CS) 6 Lecture # 4: Dr. Syed Attique Shah (PH.D.)
No ratings yet
Introduction To Big Data BS (CS) 6 Lecture # 4: Dr. Syed Attique Shah (PH.D.)
19 pages
Introduction To Big Data BS (CS) 6 Lecture # 2: Dr. Syed Attique Shah (PH.D.)
No ratings yet
Introduction To Big Data BS (CS) 6 Lecture # 2: Dr. Syed Attique Shah (PH.D.)
28 pages
Introduction To Big Data BS (CS) 6 Lecture # 1: Dr. Syed Attique Shah (PH.D.)
No ratings yet
Introduction To Big Data BS (CS) 6 Lecture # 1: Dr. Syed Attique Shah (PH.D.)
19 pages
AP CHHGH Full Final List
No ratings yet
AP CHHGH Full Final List
23 pages
Guerilla Warfare Advocates in The United States
No ratings yet
Guerilla Warfare Advocates in The United States
78 pages
Improving Proficiency in The Four Fundamental Operations in Mathematics in Grade Two SPED FL/GT Pupils LF Don Emilio Salumbides Elementary School Through The Implementation of Vedic Math Techniques
No ratings yet
Improving Proficiency in The Four Fundamental Operations in Mathematics in Grade Two SPED FL/GT Pupils LF Don Emilio Salumbides Elementary School Through The Implementation of Vedic Math Techniques
5 pages
Courtship Dating and Marriage Apple GR 1
No ratings yet
Courtship Dating and Marriage Apple GR 1
29 pages
Spongebob Squarepants (Theme Song) (Arr. Paul Lavender) - Snare Drum by Paul Lavender - Marching Band - Digital Sheet Music SH
No ratings yet
Spongebob Squarepants (Theme Song) (Arr. Paul Lavender) - Snare Drum by Paul Lavender - Marching Band - Digital Sheet Music SH
1 page
Chapter 1 (BC)
No ratings yet
Chapter 1 (BC)
30 pages
Harvard Decision Making
No ratings yet
Harvard Decision Making
64 pages
Regions Bank Statement
No ratings yet
Regions Bank Statement
2 pages
Japanese Mochi Recipes Daniel Humphreys
100% (1)
Japanese Mochi Recipes Daniel Humphreys
42 pages
Late Modernism and Peter Eisenmann
No ratings yet
Late Modernism and Peter Eisenmann
16 pages
Nifty Master
No ratings yet
Nifty Master
35 pages
Food Technology Homework Booklet
100% (1)
Food Technology Homework Booklet
8 pages
MR Tector - Description of Repairs PDF
No ratings yet
MR Tector - Description of Repairs PDF
62 pages
Freud's Wolfman
No ratings yet
Freud's Wolfman
21 pages
Review On Community Detection Algorithms in Social Network
No ratings yet
Review On Community Detection Algorithms in Social Network
5 pages
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
No ratings yet
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
55 pages
Oxygen Scavenging Packaging Systems
No ratings yet
Oxygen Scavenging Packaging Systems
10 pages
2019.08.19 Jowers's Answer, Affirmative Defenses, Counterclaims, and Third-Party Complaint (Filed) PDF
No ratings yet
2019.08.19 Jowers's Answer, Affirmative Defenses, Counterclaims, and Third-Party Complaint (Filed) PDF
71 pages
The Concept of The General Will in The Writings of Rousseau, Sièyes, and Robespierre by Dr. Stephen Carruthers
No ratings yet
The Concept of The General Will in The Writings of Rousseau, Sièyes, and Robespierre by Dr. Stephen Carruthers
10 pages
Greenlights by Matthew Mcconaughey Fight Thirty Years Not Quite at The Top Hardcover by Harry Hill 2 Books Collection Set Mcconaughey PDF Download
No ratings yet
Greenlights by Matthew Mcconaughey Fight Thirty Years Not Quite at The Top Hardcover by Harry Hill 2 Books Collection Set Mcconaughey PDF Download
32 pages
TV Repair Guide LCD
100% (22)
TV Repair Guide LCD
45 pages
Assignment
No ratings yet
Assignment
7 pages
CSR Pepsico
No ratings yet
CSR Pepsico
5 pages
London Is Open: Kurtis Bevan, Manon Raja, Cristian Spinu, Jacob Barry
No ratings yet
London Is Open: Kurtis Bevan, Manon Raja, Cristian Spinu, Jacob Barry
10 pages
The Role of Technology
100% (1)
The Role of Technology
10 pages
A Level Media Studies Statement of Intent Form Ocr
No ratings yet
A Level Media Studies Statement of Intent Form Ocr
3 pages