0% found this document useful (0 votes)
48 views16 pages

Chapter One&2

The document discusses introduction to statistics including definitions, classification, and basic terms. It also covers sources of data, methods of data collection, organizing data through classification and tabulation, and presenting data using frequency distributions.

Uploaded by

Mekonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

Chapter One&2

The document discusses introduction to statistics including definitions, classification, and basic terms. It also covers sources of data, methods of data collection, organizing data through classification and tabulation, and presenting data using frequency distributions.

Uploaded by

Mekonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT ONE

INTRODUCTION TO STATISTICS
1.1. Definition and Classification of Statistics

Definition:

 Statistics is a collection of numerical facts and data.


 Statistics is a mathematical science dealing with the methods of collection, organizing,
presentation, analysis and interpretation of data.
Classification of Statistics
Statistics is broadly categorized into two categories based on how the collected data are used.

1. Descriptive Statistics
 deals with describing data without attempting to infer anything that goes beyond the given set
of data,
 Consists of collection, organization, summarization and presentation of data.

2. Inferential Statistics
 deals with making inferences and/or conclusions about a population based on data obtained
from a limited sample of observations,
 consists of performing hypothesis testing, determining relationships among variables and
making predictions.
Examples:

a) From past figures, it has been predicted that 31 0 0 of registered voters will vote in the November
election.
b) The average age of a student in Hawassa University is 20.1 years.

1.2. Definition of some Basic Terms


Lesson Objective: Demonstrate knowledge of statistical terminologies.
a) Population: Is the totality (collection) of all objects or items under consideration.
Example: All students of in Aksum University
b) Sample: Is a part of a population taken so that some generalization about the
population can be made. A sample should be a representative of the population.
c) Parameter: is a descriptive measure of a population, or summary value calculated
from a population. Examples: Average, Range, proportion, variance,
d) Statistic: is a descriptive measure of a sample, or summary value calculated from a
sample.
Example: Average, Range, proportion, variance …
1.3. Application and limitation of statistics
Statistics can be applied in any field of study which seeks quantitative evidence. For instance (in
engineering)
 To compare the breaking strength of two types of materials

1
 To determine the probability of reliability of a product.
 To control the quality of products in a given production process.
 To compare the improvement f yield due to certain additives (fertilizer, herbicides, (wee
decides), e t c
However, Statistics has the following limitations.
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It doesn’t study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative studies.
c) Statistical results are true only on the average. Examples: The probability of getting a head
in tossing a coin is 1|2 the germination percentage of a given variety of seed is 80%
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in
a particular year by women drivers is 10 while that committed by men drivers is 40. Hence
women drivers are safe drivers.

1.4. Types of Variables and Measurement Scales


A variable is a characteristic of an object that can have different possible values.
There are two types of variables.
a) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that cannot be quantified directly. Examples: color,
beauty, sex, location qualitative variables are also called categorical variables. And hence
we have two types of data; quantitative & qualitative data.
Qualitative variables can be further classified as
 Discrete variables, and
 Continuous variables
a) Discrete variables are variables whose values are counts.
Examples: number of students, number of households (family size), Number of pages of
a book.
b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.
There are four types of measurement scales for variables
1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping
individuals into different categories.

2
Examples: red, brown, black, short, tall, pass, fail

 In this scale, one is different from the other

 Addition, subtraction, multiplication and division are impossible,


comparison is impossible
2. Ordinal scale: - Data consisting of an ordering of ranking of measurements are said to be
on an ordinal scale of measurements.
Examples: Faster, taller, shorter, military ranks, ranks in race, e t c.

 One is different from and greater /better/ less than the other.

 Addition, subtraction, multiplication and division are impossible,


however, comparison is possible.
Man A weighs more than man B
Ethiopian athletes got 1st and 2nd ranks in the 10,000m women’s final in Sydney.
Ordinal scales data contain and convey more information than the nominal scale data, for relative
magnitudes are known, however, quantitative comparisons are impossible.
3. Interval scale: is a measurement scale in which:
 There is no true zero point (arbitrary zero paint)
 There is no physical significance to the zero point.
 There is a constant interval size between any adjacent units on the measurement
scale.
Example: oc oF (Measuring units of temperature)
 Interval scale data convey better information than nominal and ordinal scale data.
4. Ratio scale: is a measurement scale in which
 There is a constant interval size between any adjacent units on the measurement scale.
 There exists a zero point on the measurement scale and that there is a physical significance
to this zero point.
Examples: m, cm, kg, km/hr., cm/sec
Year, hour, second, ok, m3, e t c.
 One is different, larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
 This measurement scale provides better information than interval scale of measurement.

3
1.5. Sources of Data and Methods of Data Collection

Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
 Comparable
 Meaningful and
 Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
 It enables us to know the rang of the data set easy and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
 Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
 Secondary data: data collected from a secondary source.
The process of data collection from a primary source may be through:
a) Field trials
b) Laboratory experiments
c) Surveys – census survey
- Sample survey.

4
CHAPTER TWO
METHOD OF DATA ORGANIZATION AND PRESENTATION

2.1 Classification and Tabulation of Data

Classification: - is the process of arranging items/data into classes or categories according to


their similarities and/or differences.

Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data.

Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.

2.2 Frequency Distributions


Frequency: - is the number of times a certain value or set of values occurs in a specific group.

A frequency distribution is a table that presents data according to some criteria with the
corresponding number of items falling in each class (i.e. with the corresponding frequencies.)

1. Categorical frequency distribution


2. Ungrouped frequency distribution
3. grouped frequency distribution
1. Categorical frequency distribution: is a type of frequency distribution used for qualitative
variables

Example: A frequency distribution presenting the number of males and females in a class

Sex Frequency

Male 57

Female 39

Generally, there are two basic types of frequency distributions: Ungrouped and Grouped
frequency distributions.

2. Ungrouped frequency distribution

5
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution
is often constructed for small set of data or a discrete variable.

Example: The following data are the ages in years of 20 women who attend health education last
year: 30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.

Construct a frequency distribution for these data.

STEP 1. Find the range of the data:


Range  Maximumobservation  Minimumobservation

STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
distribution becomes as follows.

Age Tally Frequency

29 / 1

30 //// 4

31 / 1

32 /// 3

33 / 1

35 // 2

36 // 2

37 / 1

39 / 1

41 /// 3

42 / 1

6
3. Grouped frequency distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Some Important Definitions

– Raw data: data collected in original form.


– Array: data arranged, in ascending or descending order.
– Class: the different, on overlapping groups of data.
– Class limits: separate one class in a grouped frequency distribution from another. The limits
could actually appear in the collected data and have gaps between the limit of one class and
the lower limit of the next class.
– Class boundaries: separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do not appear in the
collected data. There is on gap between the upper boundary of one class and the lower
boundary of the next class. The lower class boundary (LCB) is found by subtracting 0.5 units
of measurement from the lower class limit (LCL) and the upper class boundary (UCB) is found
by adding 0.5 units of measurement to the upper class limit (UCL). That is,
LCB=LCL+ 1 2 U and UCB =UCL + 1
2 U
– Class width (W): the difference between the upper and lower boundaries of any class or the
lower limits of two consecutive classes, or the upper limits of two consecutive classes.
N.B. Class width is not equal to the difference between UCL and LCL of the same

class.

– Class mark (M): the midpoint of a class interval.


UCBi  LCBi
i.e. M
2

– Unit of measurement (U): the smallest difference between any two values of the variable being
measured.
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.

7
A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either less than or more than type; as defined above) is called a cumulative frequency
distribution.

– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequencyof that class
Re lative frequencyof a class 
Total frequency

Note:

 The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.

– Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values which are less than/more than
the upper/lower class boundary.

Guidelines to construct a grouped frequency distribution

STEP 1. Determine the unit of measurement, U


STEP 2. Find the maximum(Max) and the minimum(Min) observation, and then compute their
range, R Range  Max  Min
STEP 3. Fix the number of classes desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
– Use Sturge’s Formula: k  1 3.332log10 N where N is the total frequency. And
round this value of k up to get an integer number.
STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the

number up to get an integer value. W R


K
STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point is
the lower limit of the first class. Continue to add the class width to this lower limit to get
the rest of the lower limits.

8
STEP 6. Find the upper class limits. To find the upper class limit of the first claa, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.
STEP 7. Compute the class boundaries as: LCB  LCL  12 U and UCB  UCL  12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and
UCB= upper class boundary. The class boundaries are also half way between the upper limit of
one class and the lower limit of the next class.

STEP 8. Tally the data.


STEP 9. Find the frequencies.
STEP 10. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The number of hours 40 employees spends on their job for the last 7 working days

is given below.

62 50 35 36 31 43 43 43

41 31 65 30 41 58 49 41

37 62 27 47 65 50 45 48

27 53 40 29 63 34 44 32

58 61 38 41 26 50 47 37

Construct a suitable frequency distribution for these data using 8 classes.

STEP 1. Unit of measurement; U= 1year


STEP 2. Max = 65, Min = 26 so that R = 65-26 = 39
STEP 3. It is already determined to construct a frequency distribution having 8 classes.

STEP 4. Class width W  39  4.875  5


5
STEP 5. Starting point = 26 = lower limit of the first class. And hence the lower class limits become
26 31 36 41 46 51 56 61

STEP 6. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65

9
Class limits Class Tally frequency Cumulative Cumulative
boundaries frequency frequency
(less than (more than
type) type)

26 – 30 25.5 – 30.5 ///// 5 5 40

31 – 35 30.5 – 35.5 ///// 5 10 35

36 – 40 35.5– 40.5 ///// 5 15 30

41 – 45 40.5– 45.5 ///// //// 9 24 25

46 – 50 45.5– 50.5 ///// // 7 31 16

51 – 55 50.5– 55.5 / 1 32 9

56 – 60 55.5– 60.5 // 2 34 8

61 – 65 60.5– 65.5 ///// / 6 40 6

2.3. Diagrammatic and Graphic Presentation of Data


Lesson Objective: Represent data in frequency distributions graphically using pie chart, bar
graph, histograms, frequency polygons, ogives and others.
The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically.

Diagrams and graphs:

 are techniques for presenting data in visual displays using geometric figures;

 are visual aids which give a bird’s eye view about a given set of numerical data;

 have greater attraction than mere figures (numbers);

 facilitate comparison of data;

10
 are easily understandable by anyone who does have no statistical background

Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.

There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency
polygon, and cumulative frequency polygon (ogive).

I. Bar-diagrams/ Bar-charts

 Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.

 Bar-diagrams are usually used to represent one way or simple frequency distribution.

 Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-


diagrams are used for qualitatively classified data whereas vertical bar-diagrams are used
for quantitatively classified data.

1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.

Example: The following frequency distribution shows sales of production (in million birr) of
three products for 2004 production year.

Product Sale (in


million)

A 14

B 21

C 9

D 17

The bar-diagram presentation for these data is given below.

11
22

20

18

16

14

12

10

6
A B C D

Product

II. Pie-charts
A pie-chart is a circle that is divided into sections or wedges according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.

frequencyof theclass
i.e. sector angleof a class   3600
total frequency

Note that pie-charts are usually used for depicting nominal level data.

Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses.
Below is the breakdown of the various expenditure items. Draw an appropriate chart to portray
the data.

Expenditure item Amount (in


birr)

Fuel 603

Interest on car loan 279

Repairs 930

Insurance and license 646

12
Depreciation 492

Total 2,950

How to draw a pie-chart

- First find the percentages of each class


- Next calculate the degree measures for each class
- Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for
explanation.

Expenditure item Amount (in Percentage Degree


birr) (approx) (approx)

Fuel 603 20 74

Interest on car loan 279 9 34

Repairs 930 32 113

Insurance and license 646 22 79

Depreciation 492 17 60

Total 2,950 100 360

Now we can draw the pie-chart for the data.

13
17% 20%

Key
9% 22%

Fuel

Insurance and license


32%

Repairs

Interest on car loan

Depreciation

III. Histogram

A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes.

In drawing a pictogram, we put the class boundaries of each class on the horizontal axis and its
respective frequency on the vertical axis.

Example: Draw a histogram presenting the following data.

Frequency Cumulative Cumulative

Class Class Mark Frequency (less Frequency

Boundaries than type) (more than


type)

14
5.5 – 11.5 8.5 2 2 20

11.5 – 17.5 14.5 2 4 18

17.5 – 23.5 20.5 7 11 16

23.5 – 29.5 26.5 4 15 9

29.5 – 35.5 32.5 3 18 5

35.5 – 41.5 38.5 2 20 2

IV. Frequency Polygon

A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve.

Example: Present the data in the previous example using a frequency polygon.

10

6
Frequency

0
0.0 8.50 14.50 20.50 26.50 32.50 38.50

Class Marks

V. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative

15
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.

Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.

(i) Less than type cumulative frequency polygon

30

20

10

0
11.50 17.50 23.50 29.50 35.50 41.50

Upper class boundaries

(ii) More than type cumulative frequency polygon

30

20

10

0
5.50 11.50 17.50 23.50 29.50 35.50

Lower class boundaries

16

You might also like