0% found this document useful (0 votes)
14 views13 pages

Statistics Overview

The document provides an overview of statistics, detailing its history, definitions, and key concepts such as the differences between statistics and statistics, descriptive and inferential statistics, and various types of data and variables. It also covers methods for presenting data, including classification, tabulation, and frequency distribution, along with measures of central tendency and dispersion. Additionally, it discusses measurement scales and the importance of data visualization through diagrams and graphs.

Uploaded by

maown7742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

Statistics Overview

The document provides an overview of statistics, detailing its history, definitions, and key concepts such as the differences between statistics and statistics, descriptive and inferential statistics, and various types of data and variables. It also covers methods for presenting data, including classification, tabulation, and frequency distribution, along with measures of central tendency and dispersion. Additionally, it discusses measurement scales and the importance of data visualization through diagrams and graphs.

Uploaded by

maown7742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Statistics- Mid Syllabus Overview

Chapter: 1 ----- S/Q’s & MCQ’s

History of “Statistics”: (MCQ’s)


 First used by German Scholar Gotifried Achenwall as
o the collection of data used for the state. (18th century)
 19th century used as
o Covering numerical data and interpretation of data through appropriate analysis.
 1920’s used for analyzing the results of any experiments and surveys.
 Now the statistics became a
o Mathematical science of making decisions and drawing conclusions from data in
situation of uncertainty.

Definition:
“A discipline that includes

 procedure and
 techniques used to
o collect,
o Process,
o Analyze and
o Interpret

the numerical data for making

 inferences and
 to reach a decision

in situation of uncertainty.”

Uncertainty means incompleteness or instability of the data.

Q: Difference between Statistic and Statistics.

Statistic Statistics
Singular Word. Plural Word
A single piece of word and information. Study of collection, analysis, interpretation,
presentation, and organization of data.

A numerical value calculated from the dataset Branch of Mathematics dealing with data
(e.g. mean, median, mode). collection and analysis.

Describe the dataset or summarize data Understand data, make inferences using methods
and techniques.
Q: Difference between Descriptive Statistics and Inferential Statistics.

Descriptive Inferential
Summarize and describe the basic feature of data Make conclusion and predictions about a
population based on a sample data.

Organize and simplify the data Make inferences about the populations.
Provide overview of data characteristics Estimate population parameters.
Identify parameters and trends

Arithmetic mean

Mean Geometric mean

Measure of central
Branches of Statistics

Median Harmonic mean


tendency

Common Mode
Descriptive
Descriptive Statistics
Histograms
Inferential
Data Visualization
Bar charts

Basic terms in Statistics

Q: Difference between Population and Sample.

Population Sample
Entire group of individuals or items of interest Subset of individuals and items selected from the
population
Entire scope of data Representative subset of data
All possible observations Used to make inferences about population
Size is often large or infinite Size is small and manageable
Data collection is impossible and impractical Data collection is feasible
Time consuming Faster
Denoted by “N” Denoted by “n”
Example: All students in a country Example: 1000 students randomly selected from
the country.
Q: Difference between Parameter and Statistic.

Parameter Statistic (singular)


Numerical characteristics of a population Numerical characteristics of a sample
Describe the entire population Describe a sample from the population
Typically, unknown Used to estimate population parameters
Example: Example:
 Population mean µ  Sample mean x̅
 Population proportion p  Sample proportion pˆ
 Population standard deviation δ  Sample standard deviation s

Q: Difference between Data and Variable.

Data Variable
Raw facts and figures Characteristic or attribute being measured
Quantitative or qualitative information Represent a data point or value
Collected from various sources Can change or vary
Example: Example:
 Numbers  Age
 Text  Gender
 Images  Income
 Audio files  Height

Q: Types of variables.

Continuous (height, weight)


* takes any value within a
range or narrow interval.
Quantitative (numeric)
* variables that can be
counted or measured.
Discrete (number of pages)
* takes specific, countables,
distinct values.

Variable

Nominal (gender, color)


Qualitative (categorical)
* variables that have distinict
categories according
attribute
Ordinal (education level)
Exercise: Classify each variable as Discrete or Continuous Variables.
a. The highest wind speed of a hurricane

b. The weight of baggage on an airplane

c. The number of pages in statistics book

d. The amount of money a person spends per year for online purchases

Solution

a. Continuous, since wind speed must be measured

b. Continuous, since weight is measure.

c. Discrete, since the number of pages can be countable.

d. Discrete, since the smallest value that money can assume is in cents

Q: Types of data.

Univariate (use one variable for data analysis)


Data Bivariate (use 2 variables for data analysis)
Multivariate (use more than 2 variables for data analysis)
Primary data (raw facts and figures)
Secondary data (data after statistical treatment)
Quantitative data (numeric data)
Qualitative data (categorical data)
Time series data (time fix, variable change)
Cross-sectional data (time change, variable fix)
Pooled data (time and variable both change)

Q: Measurement Scales.
Measurement To measure qualitative (categorical data)

Scales Nominal scale No sequence, no order


(gender M, F) No numeric value just labeling

Qualitative data (categorical data)


Ordinal scale (level No numeric values
of education) Order wise ranking

Quantitative data (numeric data)


Interval Scale Numeric value, 0 ignored
Fix value in interval
Must follow order (1-3, 4-6)

Quantitative data (numeric data)


Ratio Scale Ordered

0 is meaning full, Numeric values

Chapter 2----- 1 L/Q + S/Qs and MCQ’s

Presentation of
data

Classifications Tabulation Frequency


(creating classes (presenting In distribution
or sub-classes) tabular format) (graphs)

Draw-back of Data Collection


 After data collection data confusing data
 Different to read and understand
 Massive data
 Un organized data
Presentation of data
Converting raw, unstructured data in structured format using:

1- Classification:
 Grouping of data
 Dividing in classes and sub-classes
 Mutually, exclusive classes
 Sorting of data into similar/homogeneous classes.
 Consider all data nothing should be excluded in classification.
 One – way criteria (education)
 Two – way criteria (age + marks)

Distribution: Arrangement according to values of a variable.

Aims of classification:
 To reduce the large sets of data to an easily understood summary
 To display the points of similarity and dissimilarity
 To reflect the important aspects of the data
 To prepare the ground for comparison and inference

Basic Principle of classification:


 The classes or categories into which data are to be divided, should be
o mutually exclusive and
o no overlap should exist between successive classes.

If class 1 is (1 – 5) next should be start from 6 (6 – 10) not (5 – 10) this is class
overlapping.

 All data should be included in the classes


 Conventional classification procedure should be adopted.

2- Tabulation
 Presentation of numeric data into rows and columns.
 Organizing data into tabular format for better understanding

Major objectives:
o Simplify data
o Bring out essential features of data
o To facilitate comparison
o To facilitate Statistical Analysis
o Saving of space
General Table (all the data
presented in the table)

Specific Table (derived from


Types of tables general tables stores
specific data)

Complex Table (summarizes


complicated info present
into interrelated category)

Main parts of tables


a. Title
b. Column caption and box head
c. Row caption and stub
d. The body
e. Prefatory Notes (additional specifications of data *)
f. Foot Notes
g. Source Notes
3- Frequency Distribution (important for long question)
Definition:

“Representation of Number of occurrences of each value or category in a dataset.”

Example 2.2: Make a grouped frequency distribution from the following data, relating to the
weight recorded to the nearest grams of 60 apples picked out at random from a consignment.

106 107 76 82 109 107 115 93 187 95 123 125

111 92 86 70 126 68 130 129 139 119 115 128

100 186 84 99 113 204 111 141 136 123 90 115

98 110 78 185 162 178 140 152 173 146 158 194

148 90 107 181 131 75 184 104 110 80 118 82

Solution:
Step 1: Find number of classes that we need to create

Formula: k = 1+3.3log(N)

According to data

N = 60

K = 1 + 3.3 log (60)

K = 1 + 3.3(1.78)

K = 6.8 ~ 7

K=7

We need to create 7 classes.

Step 2: Find range to find Interval

Formula: R = X max – X min

Maximum value from data = X max = 204

Minimum value from data = X min = 68

R = 204 – 68

R = 136

Step 3: Find interval between class limits


𝑹
Formula: h = 𝒌
136
h= 7

h = 19.4

h = 19

Conclusion we have to make 7 classes with the interval between class limits of 19.

Class limits Class Entries Frequency


Boundaries
65 – 84 64.5 – 84.5 68, 75, 76, 70, 78, 82, 82, 80, 84 9
85 – 104 84.5 – 104.5 86, 92, 99, 93, 90, 95, 98, 90, 104, 100 10
105 – 124 104.5 – 124.5 106, 107, 109, 107, 107, 113, 111, 110, 115, 17
118, 119, 115, 115, 110, 111, 123, 123
125 – 144 124.5 – 144.5 126, 129, 128, 125, 131, 130, 136, 139, 140, 141 10
145 – 164 144.5 – 164.5 146, 152, 158, 162, 148 5
165 – 184 164.5 – 184.5 178, 173,184, 181 4
185 – 204 184.5 – 204.5 185, 186, 194, 204, 187 5
Total 𝜮𝒇 60

 Cumulative Frequency Distribution


o Less than type
o More than type

Class Boundaries Frequency L. Cumulative M. Cumulative

64.5 9 0 60
84.5 10 0+9=9 60 – 9 = 51
104.5 17 9 + 10 = 19 51 – 10 = 41
124.5 10 19 + 17 = 36 41 – 17 = 24
144.5 5 36 +10 = 46 24 – 10 = 14
164.5 4 46 + 5 = 51 14 – 5 = 9
184.5 5 51 + 4 = 55 9–4=5
204.5 55 +5 = 60 5–5=0

 Relative Frequency Distribution

Frequency R. Frequency Percentage Frequency

9 9 9
60
= 0.15 60
∗ 100 =15

10 10 0.16 * 100 = 16
60
= 0.16

17 17/ 60 = 0.28 0.28 * 100 = 28


10 10/60 = 0.16 0.16 * 100 = 16
5 5/60 = 0.08 0.08 *100 = 8
4 4/60 = 0.06 0.06 * 100 = 6
5 5/60 = 0.08 0.08 * 100 = 8

4- Diagrams
Q: Difference between Diagram and Graph.

Diagram Graph

Use plain paper Use graph paper


Helpful for comparisons Helpful for representation of relationship
between 2 variables
Visual representation of information Visual representation of data
Simplify complex concepts Illustrate trends, patterns or correlations
Types: Types:
 Flow charts  Line graph
 Mind maps  Bar graph
 Organizational charts  Pie graph
 Venn Diagram  Histograms
 Swimlane diagram  Scatter plots

Histograms
o A histogram is used to summarize discrete or continuous data.
o To represent Grouped frequency distribution
o No gap between bars

Frequency Polygon: graphical device for understanding the shapes of distributions.


Good choice for displaying Cumulative frequency distributions.
Note: Jo assignment bnai thi wo wala graph shape dekh lena… I think maam mcq dyn gi k kon sa graph
histogram hy. or options myn different graphs…

Is k ilawa chap 2 k last myn Frequency curve, positive and negative skewed graphs bny hoy hyn vo b dekh
lena ek bar … k kon sa graph kesy dikhta hy

Chapter 3 – 2nd long + Formulas for MCQs and S/Q’s

How to summarize data:

1- Measure of central tendency (Mean, Median, Mode)


2- Measure of Dispersion

Central Tendency:
The tendency of the observations to cluster in the central part of the data. Measures of
central tendency or location are generally known as Averages.

Types of
averages

Arithmetic Geometric Harmonic


Median Mode
mean / mean mean mean

Main points:
 Measure of central tendency should be somewhere within the range of data.
 It should remain unchanged by rearrangements of the observations in a different
order

1- The Arithmetic mean


𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Mean = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

For ungrouped data

Like 2,3,5,7,9

Population:
∑ xi
µ= N
= (3 + 2 + 5 + 7+ 9)/5 = 26/5 = 5.2
Sample:
𝜮𝒙𝒊
̅=
𝒙 𝒏

For grouped data: (for MCQ’s)


Like in a table of classes and frequency

Population:
𝛴𝑓𝑖𝑥 𝑖̇
𝜇=
𝛴𝑓𝑖
𝛴𝑓𝑖 = 𝑁
Sample:
∑𝒇𝒊 𝒙𝒊
̅=
𝒙 ∑𝒇𝒊
𝜮𝒇𝒊 = 𝒏

Properties of Arithmetic mean:


1) For a set of data, the sum of the deviations of the observations 𝑥𝑖 ’s from
their mean 𝑥̅ , taken with their proper signs, is equal to zero,
For unclassified data, ∑ (x - x̄) = 0
And for a grouped frequency distribution, ∑f (x - x̄) = 0.
2) The sum of squared deviation of the 𝑥𝑖 ’s from the mean, x̄ is minimum.in
other words, ∑(𝑥𝑖 − 𝑥̅ )2 ≤ ∑(𝑥𝑖 − 𝑎)2 , where a is an arbitrary value other
than the mean.
3) If k subgroups of the data consisting of 𝑛1 , 𝑛2 , … . . 𝑛𝑘 , (∑ 𝑛𝑖 = 𝑛) 𝑥̅1 , 𝑥̅2 , … . , 𝑥̅𝑘
then 𝑥̅ the mean for all the data, is given by

𝑥1 𝑥̅1 + 𝑥2 𝑥̅2 + ⋯ + 𝑛𝑘 𝑥̅𝑘


𝑥̅ =
𝑛1 𝑛2 + ⋯ + 𝑛𝑘

∑ 𝑛𝑖 𝑥̅𝑖
= , (𝑖 = 1,2, … . 𝑘)
𝑛

4) Arithmetic mean is affected due to a change of origin and/or scale which


implies that if the original variable "x" is changed to another variable "y"
effecting a change of origin, say "a" and scale, say "b", of "x". That is y = a +
bx. Then we have, Arithmetic mean of "y” = a + bx̄
Change of Scale is multiplying of dividing the mean value with arbitrary
value.

2- Geometric mean
Ungrouped:
𝒏
√∑(𝒍𝒐𝒈 𝒙𝟏 +⋯+𝒍𝒐𝒈 𝒙𝒏 )
𝑮=
𝒏

Grouped:
𝟏
𝑮 = 𝑨𝒏𝒕𝒊 𝒍𝒐𝒈 [𝒏 𝜮(𝒇𝟏 𝒍𝒐𝒈 𝒙𝒊 )]

3- Median
A value which divides a data set that have been ordered, into two equal parts.

When to calculate median?

 When distribution is skewed


 In case of open-end classes
 When data is quantitative
 When we have outliers -> extreme values either lower extreme or upper
extreme
 Useful when to exact midpoint of distribution

Drawbacks
 Change of origin and scale effect the value
 Just focused on midpoint and ignore the rest

Find Median of ungrouped dataset (2nd expected long)


Example:

2, 4, 8 ,2 ,10, 5,6, 9

n=8

Step 1: Arrange dataset

2,2,4,5,6,8,9,10

Step 2: Find n/2

= 8/2 = 4

Condition for 3rd step if n/2 is integer (+1, -1,0…) then


𝑛 𝑛
( )𝑡ℎ 𝑜𝑏𝑠 +( +1)𝑡ℎ 𝑜𝑏𝑠 4+4+1
2 2
Median = 2
= 2
= 9/2 = 4

Condition 2 if n/2 is not integer (1/2,0.9…)

Median = [(n/2) th obs.]/2

𝒉 𝒏
Grouped data formula for MCQ’s = 𝒍 + ( − 𝑪)
𝒇 𝟐

4- Mode:
More frequent value

3, 4,6,2,4,3,6,2,6,5,2,2,3

2 is mode

You might also like