0% found this document useful (0 votes)
43 views8 pages

Session 12

The document discusses different statistical concepts including descriptive statistics, measures of central tendency, and measures of variability. Descriptive statistics describes and summarizes data numerically or visually. Measures of central tendency include the mean, median, and mode which show typical values in a dataset. Measures of variability like variance and standard deviation quantify how spread out the data is.

Uploaded by

darayir140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

Session 12

The document discusses different statistical concepts including descriptive statistics, measures of central tendency, and measures of variability. Descriptive statistics describes and summarizes data numerically or visually. Measures of central tendency include the mean, median, and mode which show typical values in a dataset. Measures of variability like variance and standard deviation quantify how spread out the data is.

Uploaded by

darayir140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistics

Statistics refers to the mathematics and techniques with which we understand data.

Descriptive Statistics
It is about describing and summarizing data. It uses two main approaches:

1. The quantitative approach describes and summarizes data numerically.


2. The visual approach illustrates data with charts, plots, histograms, and other graphs.

Types of Measures
Central tendency tells you about the centers of the data. Useful measures include the mean, median, and mode.
Variability tells you about the spread of the data. Useful measures include variance and standard deviation.
Correlation or joint variability tells you about the relation between a pair of variables in a dataset. Useful measures include covariance and
the correlation coefficient.

Getting Started With Python Statistics Libraries


In [1]: 1 import math
2 import statistics as st

In [2]: 1 x = [8.0, 1, 2.5, 4, 28.0]


2 x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]
3 x

Out[2]: [8.0, 1, 2.5, 4, 28.0]


In [3]: 1 x_with_nan

Out[3]: [8.0, 1, 2.5, nan, 4, 28.0]

Measures of Central Tendency


The measures of central tendency show the central or middle values of datasets. There are several definitions of what’s considered to be the
center of a dataset. In this tutorial, you’ll learn how to identify and calculate these measures of central tendency:

Mean
Weighted mean
Geometric mean
Harmonic mean
Median
Mode

Mean
The sample mean, also called the sample arithmetic mean or simply the average, is the arithmetic average of all the items in a dataset. The mean
of a dataset 𝑥 is mathematically expressed as
∑𝑥𝑖
𝑛
, where 𝑖 = 1, 2, …, 𝑛. In other words, it’s the sum of all the elements 𝑥ᵢ divided by the number of items in the dataset 𝑥.

In [4]: 1 mean_ = sum(x) / len(x)


2 mean_

Out[4]: 8.7

In [5]: 1 mean_ = st.mean(x)


2 mean_

Out[5]: 8.7
In [6]: 1 mean_ = st.fmean(x)
2 mean_

Out[6]: 8.7

However, if there are nan values among your data, then statistics.mean() and statistics.fmean() will return nan as the output:

In [7]: 1 mean_ = st.fmean(x_with_nan)


2 mean_

Out[7]: nan

Weighted Mean
The weighted mean, also called the weighted arithmetic mean or weighted average, is a generalization of the arithmetic mean that enables you to
define the relative contribution of each data point to the result.

You define one weight 𝑤ᵢ for each data point 𝑥ᵢ of the dataset 𝑥, where 𝑖 = 1, 2, …, 𝑛 and 𝑛 is the number of items in 𝑥. Then, you multiply each

∑𝑤𝑖𝑥𝑖
data point with the corresponding weight, sum all the products, and divide the obtained sum with the sum of weights:

∑𝑤𝑖
.

In [8]: 1 x = [8.0, 1, 2.5, 4, 28.0]


2 w = [0.1, 0.2, 0.3, 0.25, 0.15]
3 wmean = sum(w[i] * x[i] for i in range(len(x))) / sum(w)
4 wmean

Out[8]: 6.95

Geometric Mean
The Geometric Mean is a special type of average where we multiply the numbers together and then take a square root (for two numbers), cube
root (for three numbers) etc. where i = 1, 2, 3, ....n.
√𝑛 ⎯𝜋𝑥𝑖
⎯⎯⎯⎯⎯
In [9]: 1 gmean = st.geometric_mean(x)
2 print(round(gmean, 2))

4.68

Harmonic Mean
The harmonic mean is the reciprocal of the mean of the reciprocals of all items in the dataset.
For example, the harmonic mean of three values a, b and c will be equivalent to
3
(1/𝑎 + 1/𝑏 + 1/𝑐)
If one of the values is zero, the result will be zero.

The harmonic mean is a type of average, a measure of the central location of the data. It is often appropriate when averaging rates or ratios, for
example speeds.

Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. What is the average speed?

In [10]: 1 st.harmonic_mean([40, 60])

Out[10]: 48.0

In [11]: 1 st.harmonic_mean([10, 30, 50, 70, 90])

Out[11]: 27.97513321492007

In [12]: 1 x = [8.0, 1, 2.5, 4, 28.0]


2 hmean = st.harmonic_mean(x)
3 print(round(hmean, 2))

2.76
Median
The sample median is the middle element of a sorted dataset. The dataset can be sorted in increasing or decreasing order. If the number of
elements 𝑛 of the dataset is odd, then the median is the value at the middle position: 0.5(𝑛 + 1). If 𝑛 is even, then the median is the arithmetic
mean of the two values in the middle, that is, the items at the positions 0.5𝑛 and 0.5𝑛 + 1.

For example, if you have the data points 2, 4, 1, 8, and 9, then the median value is 4, which is in the middle of the sorted dataset (1, 2, 4, 8, 9). If
the data points are 2, 4, 1, and 8, then the median is 3, which is the average of the two middle elements of the sorted sequence (2 and 4).

In [13]: 1 st.median([1, 3, 5])

Out[13]: 3

In [14]: 1 st.median([1, 3, 5, 7])

Out[14]: 4.0

In [15]: 1 st.median([5, 3, 7, 1])

Out[15]: 4.0

In [16]: 1 x = [8.0, 1, 2.5, 4, 28.0]


2 med = st.median(x)
3 med

Out[16]: 4

In [17]: 1 med = st.median(x[:-1])


2 med

Out[17]: 3.25

median_low() and median_high() are two more functions related to the median in the Python statistics library. They always return an element
from the dataset:

If the number of elements is odd, then there’s a single middle value, so these functions behave just like median().
If the number of elements is even, then there are two middle values. In this case, median_low() returns the lower and median_high() the
higher middle value.
g

In [18]: 1 st.median_low([1, 3, 5])

Out[18]: 3

In [19]: 1 st.median_low([1, 3, 5, 7])

Out[19]: 3

In [20]: 1 st.median_high([1, 3, 5])

Out[20]: 3

In [21]: 1 st.median_high([1, 3, 5, 7])

Out[21]: 5

Mode
The sample mode is the value in the dataset that occurs most frequently. If there isn’t a single such value, then the set is multimodal since it has
multiple modal values. For example, in the set that contains the points 2, 3, 2, 8, and 12, the number 2 is the mode because it occurs twice, unlike
the other items that occur only once.

In [22]: 1 st.mode([1, 1, 2, 3, 3, 3, 3, 4])

Out[22]: 3

In [23]: 1 st.multimode([1, 1, 1, 1, 2, 3, 3, 3, 3, 4])

Out[23]: [1, 3]

In [24]: 1 st.multimode('aabbbbccddddeeffffgg')

Out[24]: ['b', 'd', 'f']


Measures of Variability
The measures of central tendency aren’t sufficient to describe data. You’ll also need the measures of variability that quantify the spread of data
points.

Variance
Standard deviation

Variance
The sample variance quantifies the spread of the data. It shows numerically how far the data points are from the mean. You can express the
sample variance of the dataset 𝑥 with 𝑛 elements mathematically as

𝑆2 = ∑(𝑥𝑖 − 𝑥¯)2
𝑛−1
where
𝑆2 = sample variance
𝑥𝑖= the value of the one observation
𝑥¯
= the mean value of all observations
𝑛= the number of observations

In [25]: 1 x = [8.0, 1, 2.5, 4, 28.0]


2 st.variance(x)

Out[25]: 123.2

Standard Deviation

⎯∑(
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑥 𝑖 − 𝜇 ) 2⎯
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values.

𝜎=√ 𝑁
where
𝜎= population standard deviation
𝑁 = the size of the population
𝑥𝑖 = each value from the population
𝜇 = the population mean
In [26]: 1 st.stdev(x)

Out[26]: 11.099549540409287

You might also like