Statistics Overview
Statistics Overview
Definition:
“A discipline that includes
procedure and
techniques used to
o collect,
o Process,
o Analyze and
o Interpret
inferences and
to reach a decision
in situation of uncertainty.”
Statistic Statistics
Singular Word. Plural Word
A single piece of word and information. Study of collection, analysis, interpretation,
presentation, and organization of data.
A numerical value calculated from the dataset Branch of Mathematics dealing with data
(e.g. mean, median, mode). collection and analysis.
Describe the dataset or summarize data Understand data, make inferences using methods
and techniques.
Q: Difference between Descriptive Statistics and Inferential Statistics.
Descriptive Inferential
Summarize and describe the basic feature of data Make conclusion and predictions about a
population based on a sample data.
Organize and simplify the data Make inferences about the populations.
Provide overview of data characteristics Estimate population parameters.
Identify parameters and trends
Arithmetic mean
Measure of central
Branches of Statistics
Common Mode
Descriptive
Descriptive Statistics
Histograms
Inferential
Data Visualization
Bar charts
Population Sample
Entire group of individuals or items of interest Subset of individuals and items selected from the
population
Entire scope of data Representative subset of data
All possible observations Used to make inferences about population
Size is often large or infinite Size is small and manageable
Data collection is impossible and impractical Data collection is feasible
Time consuming Faster
Denoted by “N” Denoted by “n”
Example: All students in a country Example: 1000 students randomly selected from
the country.
Q: Difference between Parameter and Statistic.
Data Variable
Raw facts and figures Characteristic or attribute being measured
Quantitative or qualitative information Represent a data point or value
Collected from various sources Can change or vary
Example: Example:
Numbers Age
Text Gender
Images Income
Audio files Height
Q: Types of variables.
Variable
d. The amount of money a person spends per year for online purchases
Solution
d. Discrete, since the smallest value that money can assume is in cents
Q: Types of data.
Q: Measurement Scales.
Measurement To measure qualitative (categorical data)
Presentation of
data
1- Classification:
Grouping of data
Dividing in classes and sub-classes
Mutually, exclusive classes
Sorting of data into similar/homogeneous classes.
Consider all data nothing should be excluded in classification.
One – way criteria (education)
Two – way criteria (age + marks)
Aims of classification:
To reduce the large sets of data to an easily understood summary
To display the points of similarity and dissimilarity
To reflect the important aspects of the data
To prepare the ground for comparison and inference
If class 1 is (1 – 5) next should be start from 6 (6 – 10) not (5 – 10) this is class
overlapping.
2- Tabulation
Presentation of numeric data into rows and columns.
Organizing data into tabular format for better understanding
Major objectives:
o Simplify data
o Bring out essential features of data
o To facilitate comparison
o To facilitate Statistical Analysis
o Saving of space
General Table (all the data
presented in the table)
Example 2.2: Make a grouped frequency distribution from the following data, relating to the
weight recorded to the nearest grams of 60 apples picked out at random from a consignment.
98 110 78 185 162 178 140 152 173 146 158 194
Solution:
Step 1: Find number of classes that we need to create
Formula: k = 1+3.3log(N)
According to data
N = 60
K = 1 + 3.3(1.78)
K = 6.8 ~ 7
K=7
R = 204 – 68
R = 136
h = 19.4
h = 19
Conclusion we have to make 7 classes with the interval between class limits of 19.
64.5 9 0 60
84.5 10 0+9=9 60 – 9 = 51
104.5 17 9 + 10 = 19 51 – 10 = 41
124.5 10 19 + 17 = 36 41 – 17 = 24
144.5 5 36 +10 = 46 24 – 10 = 14
164.5 4 46 + 5 = 51 14 – 5 = 9
184.5 5 51 + 4 = 55 9–4=5
204.5 55 +5 = 60 5–5=0
9 9 9
60
= 0.15 60
∗ 100 =15
10 10 0.16 * 100 = 16
60
= 0.16
4- Diagrams
Q: Difference between Diagram and Graph.
Diagram Graph
Histograms
o A histogram is used to summarize discrete or continuous data.
o To represent Grouped frequency distribution
o No gap between bars
Is k ilawa chap 2 k last myn Frequency curve, positive and negative skewed graphs bny hoy hyn vo b dekh
lena ek bar … k kon sa graph kesy dikhta hy
Central Tendency:
The tendency of the observations to cluster in the central part of the data. Measures of
central tendency or location are generally known as Averages.
Types of
averages
Main points:
Measure of central tendency should be somewhere within the range of data.
It should remain unchanged by rearrangements of the observations in a different
order
Like 2,3,5,7,9
Population:
∑ xi
µ= N
= (3 + 2 + 5 + 7+ 9)/5 = 26/5 = 5.2
Sample:
𝜮𝒙𝒊
̅=
𝒙 𝒏
Population:
𝛴𝑓𝑖𝑥 𝑖̇
𝜇=
𝛴𝑓𝑖
𝛴𝑓𝑖 = 𝑁
Sample:
∑𝒇𝒊 𝒙𝒊
̅=
𝒙 ∑𝒇𝒊
𝜮𝒇𝒊 = 𝒏
∑ 𝑛𝑖 𝑥̅𝑖
= , (𝑖 = 1,2, … . 𝑘)
𝑛
2- Geometric mean
Ungrouped:
𝒏
√∑(𝒍𝒐𝒈 𝒙𝟏 +⋯+𝒍𝒐𝒈 𝒙𝒏 )
𝑮=
𝒏
Grouped:
𝟏
𝑮 = 𝑨𝒏𝒕𝒊 𝒍𝒐𝒈 [𝒏 𝜮(𝒇𝟏 𝒍𝒐𝒈 𝒙𝒊 )]
3- Median
A value which divides a data set that have been ordered, into two equal parts.
Drawbacks
Change of origin and scale effect the value
Just focused on midpoint and ignore the rest
2, 4, 8 ,2 ,10, 5,6, 9
n=8
2,2,4,5,6,8,9,10
= 8/2 = 4
𝒉 𝒏
Grouped data formula for MCQ’s = 𝒍 + ( − 𝑪)
𝒇 𝟐
4- Mode:
More frequent value
3, 4,6,2,4,3,6,2,6,5,2,2,3
2 is mode