Classx - DS - UNIT 1
Classx - DS - UNIT 1
Grade X
Chapter 1: Use of Statistics in Data
Science
LEARNING OBJECTIVES:
What are subsets and relative frequency?
Meaning of mean
What is median and its usage in data science?
What is mean absolute deviation?
What is Standard Deviation?
What is a Subset?
Two-way relative frequency table very similar to the two-way frequency type of
table.
The only difference here is we consider percentage instead of numbers.
Two-way relative frequency tables represent what is the percentage of data
points that fit in each category.
We can take the help of row relative frequencies or column relative frequencies;
it depends on the context of the problem.
Two-way relative frequency table (% given)
Two-way relative frequency tables are helpful when there are different sample sizes in a
dataset. Percentages makes it easier to compare the preferences.
Two-way relative frequency table
Two-way relative frequency table
Two-way relative frequency table
What is Mean?
The value at 3rd position is the middle point of the sorted list. So, 34 is our
median for the array.
Example of Median
Mean Absolute Deviation (MAD) is the average of how far away all values in a data
set are from the mean.
The value of Mean absolute deviation gives a very good understanding of the
variability of the data set or in other words how scattered the data set is?
One of the applications of Mean Absolute Deviation in real life is when teachers give
tests to students and then average the results to see if the average score was high,
in between, or too low.
Each average tells a story.
Absolute Deviation can further help to see the distance between each of the scores
and the beginning average scores.
Example of Mean absolute deviation
Consider the below data set:
12, 16, 10, 18, 11, 19
Step 1: Calculate the mean
Mean = (12 + 16 + 10 + 18 + 11 + 19) / 6 = 14 (rounded off)
Step 2: Calculate the distance of each data point from the mean. We need to find
the absolute value. For example, if the distance is -2, then we ignore the negative
sign.
|-2| = 2
Step 3: Calculate the mean of the distances.
Mean of distances = (2 + 2 + 4 + 4 + 3 + 5) / 6 = 3.33
So, 3.33 is our mean absolute deviation, and the mean is 14.
What is Standard Deviation?
1. Calculate the mean by adding up all the data pieces and dividing it by the
number of pieces of the data.
2. Subtract mean from every value
3. Square each of the differences
4. Find the average of squared numbers calculated in point number 3 to find the
variance.
5. Lastly, find the square root of variance. That is the standard deviation.
Example
Take the values 1,2,3,5 and 8
7.84+3.24+0.64+1.44+17.64 = 30.8
30.8/5 = 6.16 (Variance)
Step 5: Find the square root of the variance
The square root of 6.16 = 2.48