0% found this document useful (0 votes)

186 views56 pages

Descriptive Data Analytics

The document discusses descriptive data analytics including measures of central tendency, measures of dispersion, population and sample means and standard deviations, and measures of co-movement between variables. Descriptive analytics is used to identify trends, patterns and relationships in data.

Uploaded by

Lia Ann Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views56 pages

Descriptive Data Analytics

Uploaded by

Lia Ann Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 56

DESCRIPTIVE

DATA ANALYTICS
(PART I)
23 September 2023
Content
• Descriptive Data Analytics
• Sample Use Cases
• Measures of Central Tendency
• Measures of Dispersion
• Population Mean and Standard Deviation
• Sample Mean and Standard Deviation
• Measures of Co-movement between Variables
• Presentation of Data Analysis
Descriptive Data Analytics
• Process of using current and historical data to identify trends, patterns and
relationships
• Simplest form of data analysis
Sample Use Cases
Traffic and Engagement Reports
• Analyze user traffic in social media
or webpage
• Evaluate whether advertisements
increase traffic
• Understand the dynamics of user
traffic
Sample Use Cases
Financial Applications
• Look at underlying patterns to assess
company’s financial health
• Understand cost drivers of financial
metrics
• Assess performance of funds and
other investments
Sample Use Cases

Demand Trends
• Determine which products or services
are trending
• Which products are favored at a given
point in time
• Understand patterns in consumer
behavior
Sample Use Cases

Aggregated Survey Results

• Generate key insights from surveys
• Detect correlations of variables and
understand relationships of factors
based on survey responses
Sample Use Cases

Progress to Goals
• Analyze results of Key Performance
Indicators (KPI) to check whether efforts
are on track or adjustments need to be
made
• Generate dashboards for updating project
milestones
Measures of Central Tendency
• Goal: Describe the center of a data set
• Three Common Ways:
(1) Mean: Sum all the numbers and divide by how many numbers you have
(2) Median: Arrange the numbers in ascending order and find the middle number
(3) Mode: Find the most common occurring number
Measures of Central Tendency: Illustrations
Suppose you are given the net worth of 10 people:

$22,000 $13,000 $1,200,000 $150 $45,000

$30,000 $45,000 $40,000 $14,000 $45,000

Find the mean, median and mode.

Measures of Central Tendency: Illustrations
Mean

= [$22,000 + $13,000 + $1,200,000 + $150 + $45,000 + $30,000 + $45,000 +

$40,000 + $14,000 + $45,000]/10
= $145,415
Measures of Central Tendency: Illustrations
Median

Original order:
$22,000 $13,000 $1,200,000 $150 $45,000
$30,000 $45,000 $40,000 $14,000 $45,000

Ascending order:
$150 $13,000 $14,000 $22, 000 $30,000
$40,000 $45,000 $45,000 $45,000 $1,200,000

Median = [$30,000 + $40,000]/2 = $35,000

Measures of Central Tendency: Illustrations
Mode

Original order:
$22,000 $13,000 $1,200,000 $150 $45,000
$30,000 $45,000 $40,000 $14,000 $45,000
Value Frequency
$150 1
$13,000 1
$14,000 1
Mode = $45,000
$22,000 1
$30,000 1
$40,000 1
$45,000 3
$1,200,000 1
Measures of Central Tendency

Measure of Central Advantage When to use

Tendency
• Considers the impact of all values in the data
Mean set No obvious extreme values
• Fairly stable for sufficiently large sample
Median Avoids data distortion with extreme outlier There are extreme outliers observed
values
Provides results in analyzing frequency of Useful for analyzing qualitative data
Mode qualitative data
Measures of Dispersion
• Goal: Describe how spread out the values in a dataset
• Common tools:
(1) Range: Maximum value – Minimum Value
(2) Standard Deviation: “Average” Deviation from the Mean
(3) Interquartile Range: Range in which the middle 50% of the data distribution lies
(or Third Quartile – First Quartile)
Measures of Dispersion: Illustrations
Determine the range of marks on tests A and B

Range of Marks on Test A Range of Marks on Test B

= 70 – 45 = 65 – 45
= 25 = 20
Measures of Dispersion: Illustrations
Determine the standard deviation of the following population of 6 cities in Bay Area
in California:

City San Jose San Ramon San Francisco Daly Palo Alto Oakland

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Measures of Dispersion: Illustrations
Mean
= [1,000,000 + 85,000 + 870,000 + 100,000 + 69,000 + 420,000]/6
= 424,000
City San Jose San Ramon San Francisco Daly Palo Alto Oakland

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Deviation from
Mean (Actual – 576,000 -339,000 446,000 -324,000 -355,000 -4,000
Mean)
Squared
Deviation (576,000)2 (-339,000)2 (446,000)2 (-324,000)2 (-355,000)2 (-4,000)2
Measures of Dispersion: Illustrations
City San Jose San Ramon San Francisco Daly Palo Alto Oakland

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Deviation from
Mean (Actual – 576,000 -339,000 446,000 -324,000 -355,000 -4,000
Mean)
Squared
Deviation (576,000)2 (-339,000)2 (446,000)2 (-324,000)2 (-355,000)2 (-4,000)2

Variance = “Average” Squared Deviation

= [(576,000)2 + (-339,000)2 + (446,000)2 + (-324,000)2 + (-
355,000)2 + (-4,000)2]/6
= 146,105,000,000
Standard Deviation = = 382,236.84
Measures of Dispersion: Illustrations
Interquartile Distribution

First quartile is the point where 25% Third quartile is the point where 75%
of the observations are less than or of the observations are less than or
equal to first quartile and 75% are equal to third quartile and 25% are
greater than or equal to first quartile greater than or equal to first quartile
Measures of Dispersion: Illustrations
Interquartile Distribution

Interquartile range is the distribution of

the observations from Q1 to Q3 which
comprises 50% of the observations
Measures of Dispersion: Illustrations
Interquartile Distribution

Suppose we have the following observations:

15 18 19 20 20 20 21 23 23 24 24 25

Determine the interquartile range.

Measures of Dispersion: Illustrations
Step 1: Determine the median of the observations

15 18 19 20 20 20 21 23 23 24 24 25

Median = [20 + 21]/2 = 20.5

Measures of Dispersion: Illustrations
Step 2: Determine the first quartile of the observations

15 18 19 20 20 20

First Quartile = [19 + 20]/2 = 19.5

Measures of Dispersion: Illustrations
Step 3: Determine the third quartile of the observations

21 23 23 24 24 25

Third Quartile = [23 + 24]/2 = 23.5

Measures of Dispersion: Illustrations
Step 4: Find the interquartile range

Interquartile Range
= Third Quartile – First Quartile
= 23.5 – 19.5
=4
Population Mean and Standard Deviation
Population Mean and Standard Deviation
Recall: Population vs. Sample

• Population: Total elements of a data

set
• Sample: Representative drawn from
population
Population Mean and Standard Deviation

The weights of five children in a family are:

x1 = 3.5kg x2 = 12.3kg x3 = 17.7kg x4 = 20.9kg x5 = 23.1kg

Determine the population mean and standard deviation of their weights.

Population Mean and Standard Deviation
Population Mean and Standard Deviation
Sample Mean and Standard Deviation
Exercise: Sample Mean and Standard
Deviation
Suppose you went to Japan and visited a dojo of
veteran Sumo Wrestlers. You asked a few of them
and noted down the weights of 5 Sumo Wrestlers:

x1 = 205kg x2 = 192kg x3 = 223kg

x4 = 240kg x5 = 188kg

Determine the sample mean and standard

deviation of the weights.
Measures of Co-movement between Variables

• Suppose we want to measure the co-movement between two variables X and Y

• We can get the sample covariance and sample correlation between X and Y
Measures of Co-movement between Variables

• If we have two data series X1, ..., XN, and Y1, ..., YN, we can estimate their expected
covariance using sample covariance

and their correlation using sample correlation

Measures of Co-movement between Variables
Steps in computing for s(X,Y):
(1) Get the average of X and Y
(2) For each sample data in X, determine its distance from the average of X. For each
sample data in Y, determine its distance from the average of Y.
(3) Multiply the calculated distance of X from its respective average with the
corresponding distance of Y from its respective average. We call this new variable Z.
(4) Sum up all the entries in variable Z.
(5) Divide the sum in Step 4 by N – 1 where N is the size of the sample data. The result
obtained in Step 5 is the Sample Covariance. If we have a sufficiently large data,
Sample Covariance can be used to approximate Population Covariance
Measures of Co-movement between Variables
Determine the sample covariance s(X,Y) of two variables X and Y with the following
dataset:

Variable X Variable Y
4 8
2 10
6 5
3 7
Measures of Co-movement between Variables
Step 1. Get the average of X and Y
Variable X Variable Y
4 8
3 10
6 5
3 9

Average of X = [4 + 3 + 6 + 3]/4 = 4
Average of Y = [8 + 10 + 5 + 9]/4 = 8
Measures of Co-movement between Variables
Step 2. For each sample data in X, determine its distance from the average of X. For
each sample data in Y, determine its distance from the average of Y.

X X – Average of X Y Y – Average of Y

4 (4 – 4) = 0 8 (8 – 8) = 0
3 (3 - 4) = -1 10 (10 – 8) = 2
6 (6 – 4) = 2 5 (5 – 8) = -3
3 (3 – 4) = -1 9 (9 – 8) = 1
Measures of Co-movement between Variables
Step 3. Multiply the calculated distance of X from its respective average with the
corresponding distance of Y from its respective average. We call this new variable Z.

X X – Average of X Y Y – Average of Y Z

4 (4 – 4) = 0 8 (8 – 8) = 0 0x0=0
3 (3 - 4) = -1 10 (10 – 8) = 2 -1 x 2 = -2
6 (6 – 4) = 2 5 (5 – 8) = -3 2 x -3 = -6
3 (3 – 4) = -1 9 (9 – 8) = 1 -1 x 1 = -1
Measures of Co-movement between Variables
Step 4. Sum up all the entries in variable Z.

X X – Average of X Y Y – Average of Y Z

4 (4 – 4) = 0 8 (8 – 8) = 0 0x0=0
3 (3 - 4) = -1 10 (10 – 8) = 2 -1 x 2 = -2
6 (6 – 4) = 2 5 (5 – 8) = -3 2 x -3 = -6
3 (3 – 4) = -1 9 (9 – 8) = 1 -1 x 1 = -1

Sum of entries in Z = 0 + (-2) + (-6) + (-1) = -9

Measures of Co-movement between Variables
Step 5. Divide the sum in Step 4 by N – 1 where N is the size of the sample data. The
result obtained in Step 5 is the Sample Covariance. If we have a sufficiently large data,
Sample Covariance can be used to approximate Population Covariance

Sum of entries in Z = 0 + (-2) + (-6) + (-1) = -9

N = 4 since we have 4 data points

Thus, S(X,Y) = Sum of entries in Z/[N – 1]

= -9/[4 – 1]
= -3
Measures of Co-movement between Variables
Additionally, we can determine the sample correlation ρ(X,Y) of variables X and Y thru
the formula:
Measures of Co-movement between Variables
To get sample standard deviation of X (i.e., s(X)):

X X – Average of X (X – Average of X)2

4 (4 – 4) = 0 0
3 (3 - 4) = -1 1
6 (6 – 4) = 2 4
3 (3 – 4) = -1 1

s(X) = (0 + 1 + 4 + 1)/3
=2
Measures of Co-movement between Variables
To get sample standard deviation of Y (i.e., s(Y)):

Y Y – Average of Y (Y – Average of Y)2

8 (8 – 8) = 0 0
10 (10 – 8) = 2 4
5 (5 – 8) = -3 9
9 (9 – 8) = 1 1

s(Y) = (0 + 4 + 9 + 1)/3
= 14/3
= 2.67
Measures of Co-movement between Variables
Finally, using the formula ρ(X,Y) = s(X, Y)/[s(X) * s(Y)], we determine sample
correlation ρ(X,Y) as follows:

ρ(X,Y) = s(X, Y)/[s(X) * s(Y)]

= -3/[2*2.67]
= -0.5618
Measures of Co-movement between Variables
1.5

1.45

1.4

1.35

1.3
STOCK B

1.25

1.2

1.15

1.1

1.05

1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Stock A

Example of two stocks A and B with perfectly positive correlation (or covariance)
Measures of Co-movement between Variables
15

11
Stock Y

6
0 0.5 1 1.5 2 2.5 3

Stock X

Example of two stocks X and Y with perfectly negative correlation (or covariance)
Measures of Co-movement between Variables
1.6

1.4

1.2
stock d

0.8

0.6

0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Stock c

Example of two uncorrelated stocks C and D

Presentation of Data Analysis
 Histogram
 Box-plot
 Scatter Plot
 Bar Graph
 Line Graph
 Pie Chart
Presentation of Data Analysis: Histogram

 Divide data into a number of classes

 Number or frequency of each class is represented by a vertical rectangle
Presentation of Data Analysis: Box-plot

 Constructed using quartiles

 Gives good indication of spread of data set and its symmetry (or lack of symmetry)
 Consists of a scale, a box drawn between first and third quartile, the median placed
within the box, whiskers on both sides and outliers (if any)
Presentation of Data Analysis: Scatter Plot
15

11
Stock Y

6
0 0.5 1 1.5 2 2.5 3

Stock X

Plot two variables X and Y into a two-dimensional X- and Y-coordinate graph

Useful for visualizing correlation of two variables
Presentation of Data Analysis: Bar Graph and
Pie Chart

Useful for visualizing mode of dataset

Presentation of Data Analysis: Line Graph

Useful for visualizing time-series data

Unit-2 SQL Updated
No ratings yet
Unit-2 SQL Updated
102 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
22 pages
SQL
No ratings yet
SQL
101 pages
1.data Representation A Level
No ratings yet
1.data Representation A Level
128 pages
DBMS - Module 3 Ppts - Jan28th (Autosaved)
100% (1)
DBMS - Module 3 Ppts - Jan28th (Autosaved)
104 pages
Module No 5 Relational Database Design
No ratings yet
Module No 5 Relational Database Design
160 pages
Final - DBMS UNIT-5
No ratings yet
Final - DBMS UNIT-5
181 pages
Data Distribution
No ratings yet
Data Distribution
18 pages
Lesson1 - Data Definitions
No ratings yet
Lesson1 - Data Definitions
57 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
GIS Ch4
No ratings yet
GIS Ch4
37 pages
Data Mining
No ratings yet
Data Mining
87 pages
Session 3 4 Data Literacy Privacy Ethics
100% (1)
Session 3 4 Data Literacy Privacy Ethics
19 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
1 page
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Integration and Normalization
No ratings yet
Integration and Normalization
19 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
Ethics Privacy and Security
No ratings yet
Ethics Privacy and Security
27 pages
BookSlides 1 Machine Learning For Predictive Data Analytics
No ratings yet
BookSlides 1 Machine Learning For Predictive Data Analytics
56 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
Lesson 6 Data Life Cycle Part 2
No ratings yet
Lesson 6 Data Life Cycle Part 2
30 pages
4 Data Distribution 1
No ratings yet
4 Data Distribution 1
20 pages
IT Risk Management - Post - Assessment Questions & Answers
100% (1)
IT Risk Management - Post - Assessment Questions & Answers
5 pages
DBMS Module 2
No ratings yet
DBMS Module 2
125 pages
Lesson 3 Big Data Overview
No ratings yet
Lesson 3 Big Data Overview
30 pages
Advanced SQL - LAB 3
No ratings yet
Advanced SQL - LAB 3
21 pages
Big Data Analytics 1
No ratings yet
Big Data Analytics 1
22 pages
Advanced SQL - LAB 2
No ratings yet
Advanced SQL - LAB 2
11 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
Basic SQL: IS 2511 - Fundamentals of Database Systems
No ratings yet
Basic SQL: IS 2511 - Fundamentals of Database Systems
53 pages
DataMining S
No ratings yet
DataMining S
103 pages
Subqueries
No ratings yet
Subqueries
32 pages
Advanced SQL - LAB 1
No ratings yet
Advanced SQL - LAB 1
12 pages
Hadia Ehsan
No ratings yet
Hadia Ehsan
18 pages
DBMS Module1 Part1
No ratings yet
DBMS Module1 Part1
66 pages
20461C 00
100% (1)
20461C 00
7 pages
Module 4
No ratings yet
Module 4
63 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
No ratings yet
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
53 pages
Perl Tutorial
No ratings yet
Perl Tutorial
32 pages
Business Operations and Analytics
No ratings yet
Business Operations and Analytics
33 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
SQL Basic
100% (1)
SQL Basic
53 pages
Policy Evaluation
No ratings yet
Policy Evaluation
10 pages
Report Design & Data Monitor Using Businessobjects Dashboard Design
No ratings yet
Report Design & Data Monitor Using Businessobjects Dashboard Design
74 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Chapter 5 Statistics and Data
No ratings yet
Chapter 5 Statistics and Data
25 pages
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
15 pages
ITIL 4 Value System
No ratings yet
ITIL 4 Value System
7 pages
Mana Mohan R
No ratings yet
Mana Mohan R
147 pages
Unit 01
No ratings yet
Unit 01
32 pages
Syllabus:: 1.1 Data Mining
No ratings yet
Syllabus:: 1.1 Data Mining
30 pages
Advanced SQL: Stored Procedures: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Advanced SQL: Stored Procedures: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
23 pages
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
No ratings yet
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
77 pages
Structured Query Language (SQL)
No ratings yet
Structured Query Language (SQL)
145 pages
DBMS Module 1
No ratings yet
DBMS Module 1
56 pages
RDBMS
No ratings yet
RDBMS
155 pages
ITSM Excellence With Proven Training From The Industry Leader
No ratings yet
ITSM Excellence With Proven Training From The Industry Leader
4 pages
4-Stored Procedures
No ratings yet
4-Stored Procedures
22 pages
Oracle 9i Database Server: PL/SQL
No ratings yet
Oracle 9i Database Server: PL/SQL
160 pages
Cambridge Analytica
No ratings yet
Cambridge Analytica
8 pages
Bubble Tea Shopin SFArea
No ratings yet
Bubble Tea Shopin SFArea
30 pages
Midterm Exam
No ratings yet
Midterm Exam
2 pages

Descriptive Data Analytics

Uploaded by

Descriptive Data Analytics

Uploaded by

DESCRIPTIVE

Aggregated Survey Results

$22,000 $13,000 $1,200,000 $150 $45,000

Find the mean, median and mode.

= [$22,000 + $13,000 + $1,200,000 + $150 + $45,000 + $30,000 + $45,000 +

Median = [$30,000 + $40,000]/2 = $35,000

Measure of Central Advantage When to use

Range of Marks on Test A Range of Marks on Test B

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Population 1,000,000 85,000 870,000 100,000 69,000 420,000

Variance = “Average” Squared Deviation

Interquartile range is the distribution of

Suppose we have the following observations:

Determine the interquartile range.

Median = [20 + 21]/2 = 20.5

First Quartile = [19 + 20]/2 = 19.5

Third Quartile = [23 + 24]/2 = 23.5

• Population: Total elements of a data

The weights of five children in a family are:

x1 = 3.5kg x2 = 12.3kg x3 = 17.7kg x4 = 20.9kg x5 = 23.1kg

Determine the population mean and standard deviation of their weights.

x1 = 205kg x2 = 192kg x3 = 223kg

Determine the sample mean and standard

• Suppose we want to measure the co-movement between two variables X and Y

and their correlation using sample correlation

Sum of entries in Z = 0 + (-2) + (-6) + (-1) = -9

Sum of entries in Z = 0 + (-2) + (-6) + (-1) = -9

Thus, S(X,Y) = Sum of entries in Z/[N – 1]

X X – Average of X (X – Average of X)2

Y Y – Average of Y (Y – Average of Y)2

ρ(X,Y) = s(X, Y)/[s(X) * s(Y)]

Example of two uncorrelated stocks C and D

 Divide data into a number of classes

 Constructed using quartiles

Plot two variables X and Y into a two-dimensional X- and Y-coordinate graph

Useful for visualizing mode of dataset

Useful for visualizing time-series data

You might also like