Statistics For Data Science
Statistics For Data Science
Statistics For Data Science
Objectives
At the end of this chapter, students will be able to
Note: There are many other types of frequency tables depending on information you want
to record.
Displaying Data
• Ways to display data:
• Frequency histogram
• Relative frequency Histogram
• Multiple bar graph
• Stacked bar graph
• Line graph
• Pie chart
Displaying Data
Descriptive Statistics - Measures of
Central Tendency
Central Tendency – the propensity of data to be located or
clustered about some point.
Example: A second sample of statistics exam scores for 15 students are (in order from
smallest to largest) as follows: 52, 60, 65, 67, 70, 71, 74, 76, 78, 78, 78, 80, 86, 89, 95
Notice that 15 is an odd number. The median is the 8th value (the middle value). The 8th
value is 76 so the median Median = 76.
Descriptive Statistics- Measures of
Variability
Measures of variability describe how the data is spread out. The
most commonly used measures of variability are:
• Range: This is the difference between the largest and smallest
values in a dataset.
• Variance: This is the average of the squared differences from
the mean. It measures how much the data deviates from the
mean.
• Standard Deviation: This is the square root of the variance. It
is a measure of how spread out the data is from the mean.
Variance
A deviation is the difference between a value and the mean and is written as: x-µ
The variance is the average of the squares of the deviations.
Example: {2, 3, 5, 6} is a set of data. The sample mean is 4. The deviations are:
• 2 - 4 = -2
• 3 - 4 = -1
• 5 - 4 = 1
• 6 - 4 = 2
The deviations squared are:
• (-2)2 = 4
• (-1)2 = 1
• (1)2 = 1
• (2)2 = 4
4+1+1+4
• An average of the deviations squared is 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = 2.5
4
Standard Deviation
The standard deviation is a special average of the deviations. It measures how the data is
spread out from its mean.
It is calculated as the square root of the variance of the data. The formula for calculating
standard deviation is: