Descriptive Data Analytics
Descriptive Data Analytics
DATA ANALYTICS
(PART I)
23 September 2023
Content
• Descriptive Data Analytics
• Sample Use Cases
• Measures of Central Tendency
• Measures of Dispersion
• Population Mean and Standard Deviation
• Sample Mean and Standard Deviation
• Measures of Co-movement between Variables
• Presentation of Data Analysis
Descriptive Data Analytics
• Process of using current and historical data to identify trends, patterns and
relationships
• Simplest form of data analysis
Sample Use Cases
Traffic and Engagement Reports
• Analyze user traffic in social media
or webpage
• Evaluate whether advertisements
increase traffic
• Understand the dynamics of user
traffic
Sample Use Cases
Financial Applications
• Look at underlying patterns to assess
company’s financial health
• Understand cost drivers of financial
metrics
• Assess performance of funds and
other investments
Sample Use Cases
Demand Trends
• Determine which products or services
are trending
• Which products are favored at a given
point in time
• Understand patterns in consumer
behavior
Sample Use Cases
Progress to Goals
• Analyze results of Key Performance
Indicators (KPI) to check whether efforts
are on track or adjustments need to be
made
• Generate dashboards for updating project
milestones
Measures of Central Tendency
• Goal: Describe the center of a data set
• Three Common Ways:
(1) Mean: Sum all the numbers and divide by how many numbers you have
(2) Median: Arrange the numbers in ascending order and find the middle number
(3) Mode: Find the most common occurring number
Measures of Central Tendency: Illustrations
Suppose you are given the net worth of 10 people:
Original order:
$22,000 $13,000 $1,200,000 $150 $45,000
$30,000 $45,000 $40,000 $14,000 $45,000
Ascending order:
$150 $13,000 $14,000 $22, 000 $30,000
$40,000 $45,000 $45,000 $45,000 $1,200,000
Original order:
$22,000 $13,000 $1,200,000 $150 $45,000
$30,000 $45,000 $40,000 $14,000 $45,000
Value Frequency
$150 1
$13,000 1
$14,000 1
Mode = $45,000
$22,000 1
$30,000 1
$40,000 1
$45,000 3
$1,200,000 1
Measures of Central Tendency
City San Jose San Ramon San Francisco Daly Palo Alto Oakland
First quartile is the point where 25% Third quartile is the point where 75%
of the observations are less than or of the observations are less than or
equal to first quartile and 75% are equal to third quartile and 25% are
greater than or equal to first quartile greater than or equal to first quartile
Measures of Dispersion: Illustrations
Interquartile Distribution
15 18 19 20 20 20 21 23 23 24 24 25
15 18 19 20 20 20 21 23 23 24 24 25
15 18 19 20 20 20
21 23 23 24 24 25
Interquartile Range
= Third Quartile – First Quartile
= 23.5 – 19.5
=4
Population Mean and Standard Deviation
Population Mean and Standard Deviation
Recall: Population vs. Sample
• If we have two data series X1, ..., XN, and Y1, ..., YN, we can estimate their expected
covariance using sample covariance
Variable X Variable Y
4 8
2 10
6 5
3 7
Measures of Co-movement between Variables
Step 1. Get the average of X and Y
Variable X Variable Y
4 8
3 10
6 5
3 9
Average of X = [4 + 3 + 6 + 3]/4 = 4
Average of Y = [8 + 10 + 5 + 9]/4 = 8
Measures of Co-movement between Variables
Step 2. For each sample data in X, determine its distance from the average of X. For
each sample data in Y, determine its distance from the average of Y.
X X – Average of X Y Y – Average of Y
4 (4 – 4) = 0 8 (8 – 8) = 0
3 (3 - 4) = -1 10 (10 – 8) = 2
6 (6 – 4) = 2 5 (5 – 8) = -3
3 (3 – 4) = -1 9 (9 – 8) = 1
Measures of Co-movement between Variables
Step 3. Multiply the calculated distance of X from its respective average with the
corresponding distance of Y from its respective average. We call this new variable Z.
X X – Average of X Y Y – Average of Y Z
4 (4 – 4) = 0 8 (8 – 8) = 0 0x0=0
3 (3 - 4) = -1 10 (10 – 8) = 2 -1 x 2 = -2
6 (6 – 4) = 2 5 (5 – 8) = -3 2 x -3 = -6
3 (3 – 4) = -1 9 (9 – 8) = 1 -1 x 1 = -1
Measures of Co-movement between Variables
Step 4. Sum up all the entries in variable Z.
X X – Average of X Y Y – Average of Y Z
4 (4 – 4) = 0 8 (8 – 8) = 0 0x0=0
3 (3 - 4) = -1 10 (10 – 8) = 2 -1 x 2 = -2
6 (6 – 4) = 2 5 (5 – 8) = -3 2 x -3 = -6
3 (3 – 4) = -1 9 (9 – 8) = 1 -1 x 1 = -1
4 (4 – 4) = 0 0
3 (3 - 4) = -1 1
6 (6 – 4) = 2 4
3 (3 – 4) = -1 1
s(X) = (0 + 1 + 4 + 1)/3
=2
Measures of Co-movement between Variables
To get sample standard deviation of Y (i.e., s(Y)):
8 (8 – 8) = 0 0
10 (10 – 8) = 2 4
5 (5 – 8) = -3 9
9 (9 – 8) = 1 1
s(Y) = (0 + 4 + 9 + 1)/3
= 14/3
= 2.67
Measures of Co-movement between Variables
Finally, using the formula ρ(X,Y) = s(X, Y)/[s(X) * s(Y)], we determine sample
correlation ρ(X,Y) as follows:
1.45
1.4
1.35
1.3
STOCK B
1.25
1.2
1.15
1.1
1.05
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Stock A
Example of two stocks A and B with perfectly positive correlation (or covariance)
Measures of Co-movement between Variables
15
14
13
12
11
Stock Y
10
6
0 0.5 1 1.5 2 2.5 3
Stock X
Example of two stocks X and Y with perfectly negative correlation (or covariance)
Measures of Co-movement between Variables
1.6
1.4
1.2
stock d
0.8
0.6
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Stock c
14
13
12
11
Stock Y
10
6
0 0.5 1 1.5 2 2.5 3
Stock X