INTRODUCTION TO STATISTICS Notes
INTRODUCTION TO STATISTICS Notes
INTRODUCTION TO STATISTICS
Statistics is a field which is omnipresent.
Descriptive statistics
Methods of organizing , summarizing and presenting numerical data fall into this
area of stats. It is the statistics carried out on an entire population .
INTRODUCTION TO STATISTICS 1
2. Inferential statistics
Problems involving statistical inference arise when a statistician takes a sample
from a population and wishes to make statements about the population
characteristics from the information in the sample .
INTRODUCTION TO STATISTICS 2
DIFFERENCE BETWEEN DESCRIPTIVE AND INFERENTIAL STATISTICS
2. MEDIAN : It is the middle no. in a data set , once the data is arranged in either
ascending or descending order.
Eg: 1,2,3,4,5,6,7,8,9,10. Equal median no. for example is 5 & 6 . In order to find the
median we will have to calculate the arithmetic mean of 5 and 6 .
3. MODE : Mode is the most common no. in a set . Foe example : 21,21,21,23,24,26.
High frequency in a set means the most repeated no. like 21 .
INTRODUCTION TO STATISTICS 3
Measure of variability
RANGE : It is the difference between maximum value and minimum value of
data set .
Eg : 21,21,21,23,24,25,26,28,29,31,33.
Range= 3321 = 12 .
INTRODUCTION TO STATISTICS 4
STEPS TO CALCULATE VARIANCE AND HENCE STANDARD
DEVIATION :
INTRODUCTION TO STATISTICS 5
STEP 3 : The square root of the variance is then used to find
standard deviation .
Standard deviations are usually easier to picture and apply . The standard
deviation is expressed in the same unit of measurement as data which is not the
case with variance . Using SD it can be determined if the data has normal curve or
other mathematical relationship . Larger variances cause more data points to fall
outside the SD. Smaller variances result in more data that is close to average .
INTRODUCTION TO STATISTICS 6
Two types of mean absolute deviation :
DATA SET
A data set is a collection of data of all kinds .
Majorly there are 2 types of dat set :
Nominal : These are sets of values that don't possess a natural ordering . Eg:
colour, gender of persons.
Ordinal : These type of values have a natural ordering while maintaining their
class of value . Eg: If we consider the size of a clothing brand then we can
INTRODUCTION TO STATISTICS 7
easily sort them according to their name tag in order of small , medium and
large . The grading system while marking candidate in a test can also be
considered as an ordinal data type where A is definitely better than S grade .
💡 The key thing is that there can be an infinite no. of value a feature can
take for eg; the price of a smart phone can vary from "X" amount to any
value .
Discrete data type : The numerical values which fall under integers or whole
numbers are placed under this category . For example : The no. of speakers in
a cell phone or no. of SIM cards .
INTRODUCTION TO STATISTICS 8
The truth is that it is still ordinal data. The reason for this is that, even if the
numbering is done , it does not convey the actual distances between the classes .
Eg: Consider the grading system of a test . The respective grades can be
A,B,C,D,E and if we number them from starting then it would be 1,2,3,4,5.
Now according to the numerical difference the distance between D& E grades is
the same as the distance between C & D which is not very accurate , as we all
know that C grade is till acceptable than E .
GRAPHICAL REPRESTATION
Graphics can be used as an effective method of visual communication . Statistical
graphics are beneficial for presentation and analysis of data . The statistical
graphic forms that we usually encounter are line chart, bar or common charts,
grouped bar charts, combination charts, pie charts and pictorial charts .
LINE CHARTS : These use lines between data points to depict magnitudes of
data for 2 variables or for one variable over time . A line chart for a time series
is known as time series plot or sequence plot .
INTRODUCTION TO STATISTICS 9
💡 Data values for a variable overtime are known as time series .
2. BAR CHART OR COLUMN CHART : Bar charts are used to depict magnitude of
data for different qualitative categories or overtime . The length/height of bars
allows the user to compare magnitudes easily .
INTRODUCTION TO STATISTICS 10
3. GROUPED BAR CHARTS : These can be used to depict the magnitude of 2 or
more grouped dat , values for different qualitative categories or overtime .
INTRODUCTION TO STATISTICS 11
4. COMBINATION CHARTS : These charts use lines and bars to depict the
magnitudes of 2 or more data values for different categories or for different times.
INTRODUCTION TO STATISTICS 12
5. PIE CHARTS : Pie charts can be used effectively to depict the proportions or
percentages of the total quantity that corresponds to several qualitative
categories. Each category is depicted as a wedge of a circle or a piece of a pie .
The angle in degrees of each wedge is equal to the categories proportion
multiplied by 360 degree.
INTRODUCTION TO STATISTICS 13
SEGMENTING DATA
We often talk about the top 25% or top 10% or top 5% or top 1% of something,
When we are segmenting data into percentages we commonly are talking about
quartiles, deciles, quintiles and percentiles respectively.
INTRODUCTION TO STATISTICS 14
KEY FEATURES OF QUARTILES
The quartile measures the spread of values above and below the mean or
median by dividing the distribution into 4 groups .
A quartile divides data into 3 points : a lower quartile, median and an upper
quartile to form 4 groups of the data set .
Quartiles are used to calculate the inter quartile range which is a measure of
variability around the median .
The quartiles of a data set divide the data into 4 equal parts with 1/4th of the data
values in each part.
The first quartile Q1 is the median of the first half of the data set and marks the
point at which 25% of the data values are lower and 75% are higher.
The second quartile Q2 is the median of the data set which divides the data set
in half .
The third quartile Q3 is the median of the second half of the data set and marks
the point at which 25% of the data values are higher and 75% are lower.
For example : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15. ( ascending order )
Q2 = 8
Q1 4.5 ( 45/2
Q3 = 12.5 1213/2 )
INTRODUCTION TO STATISTICS 15
One example of the use of deciles is in school awards or rankings .For example:
students in the top 10% may be given an award , if there are 578 students in a
graduating class the top 10% or 58 student may be given the award .
Similarly ,at the opposite end if the scale students who score in the bottom 10 %
or 20% may be given extra assistance to boost their scores.
Percentiles divide the data set into groupings of 1% . Standardized tests often
report percentile scores ,these score help compare student's performance to that
of their peers (often across a state or country ). The meaning of a percentile score
reflects the percentage of students whose scored at or above that particular
group of students.
For example : Students who receive a percentile ranking of 87 on a particular test
received scores that were equal to or higher than 87% of students who took the
test .
INTRODUCTION TO STATISTICS 16