Statistics Notes
Statistics Notes
Statistics is a branch of mathematics that deals with the study of collecting, analyzing,
interpreting, presenting, and organizing data in a particular manner. Statistics is defined as the
process of collection of data, classifying data, representing the data for easy interpretation, and
further analysis of data. Statistics also is referred to as arriving at conclusions from the sample
data that is collected using surveys or experiments. Different sectors such as psychology,
sociology, geology, probability, and so on also use statistics to function.
Mathematical Statistics
Statistics is used mainly to gain an understanding of the data and focus on various applications.
Statistics is the process of collecting data, evaluating data, and summarizing it into a
mathematical form. Initially, statistics were related to the science of the state where it was used
in the collection and analysis of facts and data about a country such as its economy, population,
etc. Mathematical statistics applies mathematical techniques like linear algebra, differential
equations, mathematical analysis, and theories of probability.
There are two methods of analyzing data in mathematical statistics that are used on a large scale:
Descriptive Statistics
Inferential Statistics
Descriptive Statistics
The descriptive method of statistics is used to describe the data collected and summarize the data
and its properties using the measures of central tendencies and the measures of dispersion.
Inferential Statistics
This method of statistics is used to draw conclusions from the data. Inferential statistics requires
statistical tests performed on samples, and it draws conclusions by identifying the differences
between the 2 groups. Tests calculate the p-value that is compared with the probability of
chance(α) = 0.05. If the p-value is less than α, then it is concluded that the p-value is statistically
significant.
Desc
Data Representation ripti
on
Bar
Grap
h
A
grou
p of
data
repre
sente
d
with
recta
ngula
r bars
with
lengt
hs
prop
ortio
nal to
the
value
s is
a bar
graph
.
The
bars
can
either
be
vertic
ally
or
horiz
ontall
y
plotte
d.
Pie
Char
t
The
pie
chart
is a
type
of
graph
in
whic
ha
circle
is
divid
ed
into
Secto
rs
wher
e
each
secto
r
repre
sents
a
prop
ortio
n of
the
whol
e.
Line
grap
h
The l
ine
graph
repre
sents
the
data
in a
form
of
series
that
is
conn
ected
with
a
straig
ht
line.
Thes
e
series
are
calle
d
mark
ers.
Picto
grap
h
Data
show
n in
the
form
of
pictu
res is
a pict
ograp
h.
Picto
rial
symb
ols
for
word
s,
objec
ts, or
phras
es
can
be
repre
sente
d
with
differ
ent
numb
ers.
Histo
gram
The
histo
gram
is a
type
of
graph
wher
e the
diagr
am
consi
sts of
recta
ngles
, the
area
is
prop
ortio
nal to
the
frequ
ency
of a
varia
ble
and
the
width
is
equal
to the
class
inter
val.
Here
is an
exam
ple
of a
histo
gram
.
Freq
uenc
y
Distr
ibuti
on
The f
reque
ncy
distri
butio
n tabl
e in
statis
tics
show
cases
the
data
in
ascen
ding
order
along
with
their
corre
spon
ding
frequ
encie
s.
The
frequ
ency
of
the
data
is
often
repre
sente
d by
f.
Skewness - In statistics, the word skewness refers to a measure of the asymmetry in a probability
distribution where it measures the deviation of the normal distribution curve for data. The value
of skewed distribution could be positive or negative or zero. The curve is said to be skewed when
it shifts from left to right. If the curve moves towards the right it is called a positive skewed and
if the curve moves towards the left, it is called left-skewed.
ANOVA Statistics - The word ANOVA means Analysis of Variance. The measure used in
calculating the mean difference for the given set of data is called the ANOVA statistics. This
model of statistics is used to compare the performance of stocks over a period of time.
Degrees of freedom - This model of statistics is used when the values are changed. Data that can
be moved while estimating a parameter is the degree of freedom.
Regression Analysis - In this model, the statistical process determines the relationship between
the variables. The process signifies how a dependent variable changes when an independent
variable changed.
Arithmetic Mean
Median
Mode
Geometric Mean
Harmonic Mean
Mean, Median and Mode in Statistics
Mean is considered the arithmetic average of a data set that is found by adding the numbers in a
set and dividing by the number of observations in the data set. The middle number in the data set
while listed in either ascending or descending order is the median. Lastly, the number that occurs
the most in a data set and ranges between the highest and lowest value is the mode. For n number
of observations, we have
Mean = ¯x=∑xn�¯=∑��
Median = n+12�+12th term if n is odd.
Median = n2th term +(n2+1)thterm2�2�ℎ term +(�2+1)�ℎterm2
Mode = The value which occurs most frequently
Measures of Dispersion in Statistics
The measures of central tendency do not suffice to describe the complete information about a
given data. Thus we need to describe the variability by a value called the measure of dispersion.
The different measures of dispersion are:
The range in statistics is calculated as the difference between the maximum value and the
minimum value of the data points.
The quartile deviation that measures the absolute measure of dispersion. The data points are
divided into 3 quarters. Find the median of the data points. The median of the data points to
the left of this median is said to be the upper quartile and the median of the data points to the
right of this median is said to be the lower quartile. Upper quartile - lower quartile is
the interquartile range. Half of this is the quartile deviation.
The mean deviation is the statistical measure to determine the average of the absolute
difference between the items in a distribution and the mean or median of that series.
The standard deviation is the measure of the amount of variation of a set of values.
The mean of the continuous frequency distribution is centered at its mid-point in each class.
Then the same procedure is followed as in the case of discrete frequency distribution.
Median = l+N2−Cf×h�+�2−��×ℎ, where the median class is the class interval whose cf
is ≥ N/2, N the sum of frequencies, l, f, h, and C are, the lower limit, the frequency, the width of
the median class and C the cumulative frequency of the class just preceding the median class.
After finding the median, |xi�� - M| is obtained.
Coefficient of Variation
We compare the coefficient of variations of two or more frequency distributions. This coefficient
of variation in statistics is the ratio of the standard deviation to the mean, expressed in
percentage.
CV = σ/ ¯x�¯ × 100.
The distribution that has a greater coefficient of variation has more variability around the central
value than the distribution having a smaller value of the coefficient of variation.
Important Notes
The discipline of data collection and organization is called statistics. We interpret results
based on the analysis done using the measures of central tendencies and the measures of
dispersion.
The frequency distribution of data is represented using bar graphs, histograms, pie charts,
stem and leaf plots, line graphs, or ogives.
The data collected can be either quantitative (numerical: discrete and continuous) or
qualitative(categorical).
☛ Also Check:
Statistics Worksheets
Discover the wonders of Math!
Explore
Examples of Statistics
Example 1:Compute the mean deviation about mean from the following data.
Size(x) 2 4 6 8 10
Frequency f 2 4 5 3 1
Solution: In statistics, we know that the mean deviation about mean is calculated using the
formla: Mean = (Σ f x)/N
= [(2×2 + 4×4 + 6×5 + 8×3 +10×1)/15
= (4 + 16+ 30 + 24 + 10)/15
= 84/15 = 5.6
Answer: The mean deviation about mean = 5.6
Example 2: The mean of 5 observations is 4.4 and their variance is 8.24. If 3 of the
observations are 1, 2, and 6, find the other two observations.
Solution: Given N = 5
Σx= 8+10+12+14+16 = 60
Variance = σ 2 =∑Ni=1(xi−¯x)2N∑�=1�(��−�¯)2�.
= 1/5[(8-12)2+(10-12)2+(12-12)2+(14-12)2+(16-12)2]
= 1/5[16 + 4 +0 + 4+ 16]
=40/5 = 8
FAQs on Statistics
What is Statistics?
Statistics is a branch of mathematics that deals with the study of collecting, analyzing,
interpreting, presenting, and organizing data in a particular manner. It is referred to as arriving at
conclusions of data with the use of data.
Descriptive Statistics: It is used to summarize the data and its properties using mean and standard
deviation.
Inferential Statistics: It is used to get a conclusion from the data collected.
Descriptive statistics describe the data features and provide summaries about the entire or sample
population. We calculate the measures of central tendencies and measures of dispersion to
summarize the data, in this type of statistics.
Inferential statistics predict and make inferences from the data is called inferential statistics.
Many statistical tests are performed to arrive at conclusions. This inferential statistics has
connections with probability and probability distribution.
Statistics is a part of applied mathematics that uses probability theory to simplify the sample data
we collect. The concept of probability comes under statistics where we can determine if the data
is true or false but mostly, the data is true.
Statistics helps in better understanding and accurate description. It also helps in proper planning
in the statistical study. Finally, statistics uses tables, diagrams, and graphs as representing the
information in a certain manner.
We consider a class of students as a sample of the population of all the students in the school.
We can calculate their average score in tests, their average height, weight etc based on the data
collected. The required parameters are determined using the statistical measures are analyzed and
interpreted further, as desired. For example, the scores of the students in the previous semester
and this semester