0% found this document useful (0 votes)
17 views28 pages

Statistics Lec 1

Statistics is the science of data collection, analysis, and interpretation, originating from state administration needs. It has evolved through history, with significant contributions from mathematicians leading to its formalization as a discipline focused on variability and predictions. Key concepts include descriptive and inferential statistics, population vs. sample, data collection methods, and graphical representations like histograms and frequency polygons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views28 pages

Statistics Lec 1

Statistics is the science of data collection, analysis, and interpretation, originating from state administration needs. It has evolved through history, with significant contributions from mathematicians leading to its formalization as a discipline focused on variability and predictions. Key concepts include descriptive and inferential statistics, population vs. sample, data collection methods, and graphical representations like histograms and frequency polygons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Statistics

Introduction
Statistics is the science of collecting, organizing,
analyzing, interpreting and presenting data to
make informed decision.
Statistics started as tool for state administration.
The word “statistics” comes from Latin word
status, meaning “state”.
Ancient rulers used it to record population
counts, tax collection and land ownership (e.g.,
in ancient Egypt, China and Rome).
Development in the 17th and 18th Centuries:
Probability theory: Blaise Pascal and Pierre de Format
studied gambling, laying the foundation for
understanding chance. Governments used statistics to
analyze birth rates, deaths, and diseases for public
planning.
Modern Era (19th Century Onwards):
Mathematician like Karl Pearson, Francis Galton and R.A.
Fisher formalized statistics as a discipline. The focus
shifted to understanding variability and making
predictions in science, economics, and industry.
Statistics began as a way to count people and resources.
Today, it’s used to solve problem, predict outcomes, and
make better decision in every field.
Descriptive and Inferential Statistics
Descriptive statistics is that branch of statistics which
deals with the concepts and methods concerned with
summarization and description of the important aspects
of numerical data. This gives you the clear picture of the
data you are dealing with.
Inferential Statistics deals with the procedure for making
inferences about the characteristics of the larger group of
data or the whole called population.
In short, Descriptive Statistics helps you understand data,
while Inferential Statistics lets you use that understanding
to make informed decision and solve problems.
Population and Sample
Population: The collection of all possible observations
whether finite or infinite, relevant to some characteristic of
interest, is called the population or a statistical population. A
statistical population may be real such as the heights of
college students, average tensile strength of steel for
complete production run. The number of observations in a
finite population is called the size of population and is
denoted by N.
Sample: It is small part of the population that is used for
study. Primary objective is to create a subset of population
whose center, spread and shape are as close as that of
population. The number of observations included in a sample
is called the size of sample and is denoted by n.
Collection of data
Fact collected together for analysis and can be
divided/classified as follows:
1. Qualitative (categorical): Qualities, traits, or categories
that are classifiable but cannot be quantified are
described by this kind of data. It can be nominal (gender,
hair color, ethnicity etc.) or ordinal (letter grades,
economic status etc.)
2. Quantitative (numerical): Quantities that may be measured
or counted are included in this numerical data type. It can be
discrete (refer to variable that only be measured in certain
numbers) or continuous (refer to the variable that can take
any numerical value i.e. weight of student 105 lb etc.)
Classification of data
In statistics, grouped and ungrouped data refer to the way data is
organized for analysis.

Grouped Data Ungrouped Data


A data into groups after A raw data that has not
collection been grouped or arranged
after collection.
Prefer when analyzing data Prefer when collecting data
High accuracy Low accuracy
Frequency tables are List is used in this type of
mostly used data

Note: Ungrouped data is easily interpreted by common people. When


sample is small, ungrouped data is preferred.
Descriptive Statistics
Descriptive statistics can be divided into two
subject areas;
1. Graphical representation/Methods
2. Numerical Methods
Graphical Representation/Methods
Frequency Distribution
A frequency distribution is a table that summarizes data
by dividing it into groups or intervals and showing the
number of observations (frequency) in each group.
Width of class = Range/number of classes
Range = Difference of lowest and highest observations
Example:
Suppose we have the ages of 20 students:
Ages: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19.
The frequency distribution table might look like this:
Age Group Frequency
10-12 6
13-15 6
16-18 6
19-21 2
In terms of frequency distribution construction,
each class or group has lower and upper limits,
lower and upper boundaries and an interval.
1. Class Limits
Class limits define the smallest and largest values
that belong to each class or interval.
Types:
Lower Class Limit: The smallest value in a class.
Upper Class Limit: The largest value in a class.
2. Class Boundaries
Class boundaries are the actual limits of a class, taking into
account the gaps between consecutive classes. They ensure
no overlap between classes and provide a continuous range
for data representation.
How to Calculate Class Boundaries:
Subtract 0.5 from the lower class limit.
Add 0.5 to the upper class limit.
Example:
For the class 10-12:
Lower Class Boundary: 10 - 0.5 = 9.5
Upper Class Boundary: 12 + 0.5 = 12.5
The class boundary is 9.5 to 12.5.
3. Mid points
Midpoints are used in statistical calculations such as
finding the mean of grouped data. They represent the
typical or central value for a class.
Example:
Consider the class interval 10-20:
Lower Class Limit = 10
Upper Class Limit = 20
10+20
Midpoint= = 15
2
So, the midpoint is 15.
4. Class Intervals
A class interval is the range of values within each class. It is
the difference between the lower and upper class limits or
boundaries of a class. Class interval must be chosen so that
each data item belongs to exactly one class.
Formula:

Class Interval (Width)=Upper Class Limit − Lower Class Limit


Example:
For the class 10-12:
Class Interval=12−10=3
Class Mid Point
Class Limits Frequency Class Interval
Boundaries
10-12 9.5 - 12.5 6 11 3
13-15 12.5 - 15.5 6 14 3
16-18 15.5 - 18.5 6 17 3
19-21 19.5 - 21.5 2 20.5 3
Example
Make a frequency distribution from the following
data, relating to the weight recorded to the nearest
grams of 60 apples picked out at random from
consignment.
106, 107, 76, 82, 109, 107, 115, 93, 187, 95, 123, 125,
111, 92, 86, 70, 126, 68, 130, 129, 139, 119, 115, 128,
100, 186, 84, 99, 113, 204, 111, 141, 136, 123, 90, 115,
98, 110, 78, 185, 162, 178, 140, 152, 173, 146, 158, 194,
148, 90, 107, 181, 131, 75, 184, 104, 110, 80, 118, 82
Histogram
A histogram is a data visualization that shows the frequency of data within
predetermined intervals (also known as bins or class intervals) using
rectangular bars. In contrast to a bar graph, a histogram highlights the
continuous character of the data by having bars that contact one another.
When the class intervals are equal, the rectangles all have the same width
and their heights directly represent the class frequencies that is they are
numerically proportional to the frequencies in the respective classes.
If the class intervals are not all equal, the height of the rectangle over an
unequal class interval, is to be adjusted because it is area and not height that
measures frequency. This means that the height of a rectangle must be
proportionally decreased if the length of the corresponding class interval
increases. For example, if the length of the class interval becomes double,
then the height of the rectangle is to be halved so that the area, being the
fundamental property of the rectangle of the histogram, remains unchanged.
y

Frequency

Class Boundaries
Example:
Suppose we have the following test scores of
students:
Scores: 10, 15, 20, 22, 25, 30, 32, 35, 37, 40, 45,
47, 50, 55, 60.
Create the histogram for the above data.
Example
• Construct the histogram for the following
frequency distribution relating to the ages (to
nearest birthday) of telephone operators.
Age (years) No. of operators
18-19 9
20-24 188
25-29 160
30-34 123
35-44 84
45-59 15
Frequency Polygon
A frequency distribution can be represented
graphically by a frequency polygon. It provides a
smooth depiction of the distribution of the data
by connecting the midpoints of class intervals
using a line graph. It is frequently used in place
of a histogram or to compare several datasets.
Example
Construct the frequency polygon for the
following data
Test Scores Frequency
49.5-59.5 5
59.5-69.5 10
69.5-79.5 30
79.5-89.5 40
89.5-99.5 15
Frequency Curve
A frequency curve is a smooth, continuous curve
drawn to represent the distribution of data in a
frequency distribution. It is obtained by
connecting points corresponding to the
frequencies at the midpoints of class intervals,
creating a smooth flow instead of straight lines.
This curve is commonly used to visualize
continuous data.
Numerical Methods
Central Tendency
It is the single value that attempts to describe a
set of data by identifying the central position
with in the set of data. The mean, median,
mode. GM. HM are all valid measure of central
tendency, but under different conditions, some
measured become more appropriate to use than
others.
Mean is commonly used to predict or to get common
value.
Median is the middle value which divide the set of data
into two halves, one half comprising of observations
greater than and the other half smaller than it. Or more
precisely, the median is the value at or below which 50%
of the data lie.
Mode is the value that occurs most frequently in a set of
data. A set of data may have more than one mode or no
mode
Note: Sample mean is denoted by 𝑥 and population mean
is denoted by 𝜇
Example
Given the following ungrouped data
8,9,10,10,10,11,11,11,12,13
Find the mean, median and mode from the
above data

You might also like