Statistics Lec 1
Statistics Lec 1
Introduction
Statistics is the science of collecting, organizing,
analyzing, interpreting and presenting data to
make informed decision.
Statistics started as tool for state administration.
The word “statistics” comes from Latin word
status, meaning “state”.
Ancient rulers used it to record population
counts, tax collection and land ownership (e.g.,
in ancient Egypt, China and Rome).
Development in the 17th and 18th Centuries:
Probability theory: Blaise Pascal and Pierre de Format
studied gambling, laying the foundation for
understanding chance. Governments used statistics to
analyze birth rates, deaths, and diseases for public
planning.
Modern Era (19th Century Onwards):
Mathematician like Karl Pearson, Francis Galton and R.A.
Fisher formalized statistics as a discipline. The focus
shifted to understanding variability and making
predictions in science, economics, and industry.
Statistics began as a way to count people and resources.
Today, it’s used to solve problem, predict outcomes, and
make better decision in every field.
Descriptive and Inferential Statistics
Descriptive statistics is that branch of statistics which
deals with the concepts and methods concerned with
summarization and description of the important aspects
of numerical data. This gives you the clear picture of the
data you are dealing with.
Inferential Statistics deals with the procedure for making
inferences about the characteristics of the larger group of
data or the whole called population.
In short, Descriptive Statistics helps you understand data,
while Inferential Statistics lets you use that understanding
to make informed decision and solve problems.
Population and Sample
Population: The collection of all possible observations
whether finite or infinite, relevant to some characteristic of
interest, is called the population or a statistical population. A
statistical population may be real such as the heights of
college students, average tensile strength of steel for
complete production run. The number of observations in a
finite population is called the size of population and is
denoted by N.
Sample: It is small part of the population that is used for
study. Primary objective is to create a subset of population
whose center, spread and shape are as close as that of
population. The number of observations included in a sample
is called the size of sample and is denoted by n.
Collection of data
Fact collected together for analysis and can be
divided/classified as follows:
1. Qualitative (categorical): Qualities, traits, or categories
that are classifiable but cannot be quantified are
described by this kind of data. It can be nominal (gender,
hair color, ethnicity etc.) or ordinal (letter grades,
economic status etc.)
2. Quantitative (numerical): Quantities that may be measured
or counted are included in this numerical data type. It can be
discrete (refer to variable that only be measured in certain
numbers) or continuous (refer to the variable that can take
any numerical value i.e. weight of student 105 lb etc.)
Classification of data
In statistics, grouped and ungrouped data refer to the way data is
organized for analysis.
Frequency
Class Boundaries
Example:
Suppose we have the following test scores of
students:
Scores: 10, 15, 20, 22, 25, 30, 32, 35, 37, 40, 45,
47, 50, 55, 60.
Create the histogram for the above data.
Example
• Construct the histogram for the following
frequency distribution relating to the ages (to
nearest birthday) of telephone operators.
Age (years) No. of operators
18-19 9
20-24 188
25-29 160
30-34 123
35-44 84
45-59 15
Frequency Polygon
A frequency distribution can be represented
graphically by a frequency polygon. It provides a
smooth depiction of the distribution of the data
by connecting the midpoints of class intervals
using a line graph. It is frequently used in place
of a histogram or to compare several datasets.
Example
Construct the frequency polygon for the
following data
Test Scores Frequency
49.5-59.5 5
59.5-69.5 10
69.5-79.5 30
79.5-89.5 40
89.5-99.5 15
Frequency Curve
A frequency curve is a smooth, continuous curve
drawn to represent the distribution of data in a
frequency distribution. It is obtained by
connecting points corresponding to the
frequencies at the midpoints of class intervals,
creating a smooth flow instead of straight lines.
This curve is commonly used to visualize
continuous data.
Numerical Methods
Central Tendency
It is the single value that attempts to describe a
set of data by identifying the central position
with in the set of data. The mean, median,
mode. GM. HM are all valid measure of central
tendency, but under different conditions, some
measured become more appropriate to use than
others.
Mean is commonly used to predict or to get common
value.
Median is the middle value which divide the set of data
into two halves, one half comprising of observations
greater than and the other half smaller than it. Or more
precisely, the median is the value at or below which 50%
of the data lie.
Mode is the value that occurs most frequently in a set of
data. A set of data may have more than one mode or no
mode
Note: Sample mean is denoted by 𝑥 and population mean
is denoted by 𝜇
Example
Given the following ungrouped data
8,9,10,10,10,11,11,11,12,13
Find the mean, median and mode from the
above data