0% found this document useful (0 votes)
63 views87 pages

AUCA Descriptive Statistics Lecture 1

Uploaded by

sylvainnsenga6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views87 pages

AUCA Descriptive Statistics Lecture 1

Uploaded by

sylvainnsenga6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Data Variable Organization and Presentation

Summarizing Data Variable

DESCRIPTIVE STATISTICS

Dr. Hategekimana Fidele

ADVENTIST UNIVERSITY OF CENTRAL AFRICA (AUCA)

Sept 11, 2024

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

1. Introduction

What does mean by Statistics?


“Statistics is a field of study (or science of numerical data) concerned with the
collecting, organizing, summarizing, analyzing and interpreting the data for the
meaningful inferences about a body of data when only a part of data is observed.”

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Subdivisions of Statistics

Statistics has two main divisions:


Descriptive statistics: Descriptive statistics is the science of studying the
methods and procedures used in collecting, organizing and, presenting,
summarizing, analyzing, interpreting numerical data.
Inferential statistics: Inferential statistics is a field that uses analytical tools to
conclude a population by examining random samples.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Cont.

Examples:
1 Based on sample information, the pollster predicted Demosthenes would be
elected. Inferential statistics
2 The population of Rwanda in 1984 was 5 million. Descriptive Statistics
3 According to the pool forecasting, Demosthenes would get 54.3 percent of the
votes cast. Inferential statistics
4 An engagement approach to learning work; the 29 students who generated
summaries and inferences performed 20 percent better than those who just
memorized. Descriptive Statistics

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Terms and Vocabulary

Population: The universe (the set) of all potential observations having a common
characteristic that is being studied and about which the experimenter wishes to
make some general statements or inference.
Sample: Census is practically impossible for an infinite population or for a
population with large size. In such cases, the enumeration will be restrained to a
limited number of individuals in the population called a sample.
Experimental unit: Person, thing, event, or any item involved with a statistical
study.
Eg. Height of the people, duration of the exam, distance from the school to my
house.
Here people, exam, and road/venue from the school to my house are experimental
units.
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Cont.

Sampling: The different methods and rules to apply for selecting atypical sample
of the population are called sampling. Sampling is often called sampling methods
or sampling procedures. These are different sampling methods:
1 Simple random sample: Draw a sample of size n from the population of size N in a
such way that every sample of size n has the same chance of being selected. A such
sample may proceeds from sampling with replacement or without replacement;
2 Systematic Sampling : Draw n values from the population for a sample according
to the initial values xi at the i th position, and other values are xi+kh at (i + kh)th
positions, k = 1, 2, . . . , n − 1 ,and h being the sampling period;
3 Stratified Random Sampling: Subdivide the population into k strata according to
different criteria; and then select from each strata a simple random sample of size
′ ′
n . The last are gathered in a final sample of size n = kn .

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Population size: The total number of items in a population. It is represented by


N but the population of the sample is noted by n
Census: An enumeration or evaluation of every member of a population.
Parameter: Descriptive measure obtained by calculations from numerical data of
the population or Any constant value calculated from the population. For example
the mean, proportion, and variance.
Statistic: Descriptive Measure obtained by calculation from numerical data of a
sample or any constant value calculated from the sample. For example the mean,
proportion, and variance.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Data Variable and Scale of Measurements

The variable is defined as observable characteristic common to each experimental


unit concerned with a statistics study, This characteristic may take different values on
different experimental units.
Examples:
diastolic blood pressure of a patient
heart rate of a cat
the heights of adult males
the weights of preschool children,
color of the t-chart
identification number of the student

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Cont. Variable

Statistics count two types of variables: Qualitative and Quantitative variables.


Quantitative Variables: A quantitative variable is one that can be assigned numerical
values by the process of measuring using an instrument of measuring (Continuous
variable), or by counting (Discrete variable).
Example:
height of adult males
weight of preschool children
age of patients seen in a dental clinic
Note: Measurements made on quantitative variables convey information regarding the
amount.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Qualitative Variables:
Some characteristics are not capable of being measured in the sense that height,
weight, and age are measured. But they can be categorized or identified by a number
only. Such characteristics are called qualitative variables.
Examples:
health status of a patient
color of the t-chart
gender of a student
nationality of the person
academic performance (grand distinction, distinction, satisfaction, and failure)
Measurements made on qualitative variables convey information regarding attributes.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Scale of measurements

Definition:
Measurement Measurement is defined as the assignment of a numerical value to
different experimental units in conformity with a set of rules. For this reason, various
scales result from the fact that measurement may be carried out under different sets of
rules.
There exist 4 scales of measurement of variables in statistics:
Nominal scale
Ordinal scale
Interval scale
Ratio scale

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

The Nominal Scale: The lowest measurement scale is the nominal scale. As the
name implies it consists of “naming” observations or classifying them into various
mutually exclusive and collectively exhaustive categories.
Example:
color of the t-chart (blue = 1, green =2, dark = 3, red = 4)
gender of a student (male = 1, female = 2)
nationality of the person (Rwandan = 1, Ugandan = 2, Kenyan = 3, Gabonese =
4, Chadian = 5, Zambian = 6)

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

The Ordinal Scale: Whenever observations are not only different from category to
the category but can be ranked according to some criterion, they are said to be
measured on an ordinal scale.
Example:
Convalescing patients may be characterized as (unimproved = 0, improved = 1,
much improved = 2)
Individuals may be classified according to socioeconomic status as (low = 1,
medium = 2, or high = 3)
The intelligence of children may be (above average = 2, average = 1, or below
average = 0)
academic performance (grand distinction = 1, distinction = 2, satisfaction = 3,
and failure = 4)
Note: The function of numbers assigned to ordinal data is to order (or rank) the
observations from lowest to highest and, hence, the term ordinal.
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

The Interval Scale: The interval scale is a more sophisticated scale than the nominal
or ordinal in that with this scale not only is it possible to order measurements but also
the distance between any two measurements is known.
We know, say, that the difference between a measurement of 20 and a measurement of
30 is equal to the difference between measurements of 30 and 40.
The ability to do this implies choosing arbitrarily two points of reference for measuring:

a zero point; 0
a unit distance; 1
Note: The selected zero point is not necessarily a true zero in that it does not have to
indicate a total absence of the quantity being measured.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Example: The following example are measured in interval scale:


Temperature of the body ( zero degrees doesn’t necessarily mean the absence of
the heat of the experimental unit)
price of the commodity on the market

The Ratio Scale: The highest level of measurement is the ratio scale. This scale is
characterized by the fact that equality of ratios, as well as equality of intervals, maybe
determined. Fundamental to the ratio scale is a true zero point. The measurement of
such familiar traits as height, weight, and length makes use of the ratio scale.
Example:
amount of money in the pocket ( zero degrees mean the total absence of money
or empty pocket)
number of the students per classroom ( zero students mean the total absence of
students in the classroom)
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

2. Data Variable organization and Presentation.


2.1 Frequency Distribution.
They are terms that need to be known for a better understanding of the frequency
distribution.
Array data: is a set of the data variable sorted in ascending or descending order.
For eg. For eg. 1, 2, 3, 4, 4, 5, 4, 2, 3, 6, 7, 7.
Its correspondence array data is 1, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7, 7.
Frequency: The frequency of the observed value xi is the number of times it
appears in the set of data. The frequency of the value xi is denoted by fi .
However, some books use ni to present this frequency.
For eg. 1, 2, 3, 4, 4, 5, 4, 2, 3, 6, 7, 7. In this data variable, the frequency of 1 is
1, the frequency of 2 is 2, the frequency of 3 is 2, the frequency of 4 is 3, the
frequency of 6 is 1, and the frequency of 7 is 2.
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Frequency distribution is a simple table of two rows for the different distinct
observed values xi with their respective frequency fi .
For eg. the frequency distribution of the 7 different distinct values is the table
here below:
xi 1 2 3 4 5 6 7
fi 1 2 2 3 1 1 2
This frequency distribution is appropriate for only discrete variable when the
number of the different distinct values k is less than 12 i.e. k ≤ 12. Otherwise,
values are grouped into class intervals. The frequency distribution with class
intervals is called ”grouped frequency distribution”

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Grouped Frequency Distribution: It is a table in two rows; the 1st row for
different class intervals in which fall observations and the 2nd exclusively for the
frequency of the corresponding class interval.
For eg. Find the grouped frequency distribution of the following data: 69 84 52
93 81 74 89 85 88 63 87 64 67 72 74 55 82 91 68 77
Note that this set of values has more than 12 different distinct values. Values
have to be grouped into class intervals.
Let’s group them into class intervals of length c = 10 starting from the value 50.
These 20 observed values are put into 5 class intervals as follows:
a-b 50 − 60 60 − 70 70 − 80 80 − 90 90 − 100
fi 2 5 4 7 2
Rmrk: Grouped frequency is recommended for a frequency distribution of continuous
data variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

There are some important elements that should be known before the construction of
the grouped frequency distribution. These are:
The minimum number k of class intervals required for the given data variable of
size N. The following Sturges’s rule should be used k = 1 + 3.322 ∗ log10 n for that
purpose.
For eg. Find the minimum number k of class intervals in order to represent the
following data by a grouped frequency distribution: 69 84 52 93 81 74 89 85 88
63 87 64 67 72 74 55 82 91 68 77
Apply Sturges’s rule by taking n = 20, we have
k = 1 + 3.322 ∗ log10 20 = 1 + 3.322 ∗ 1.3010299957 = 5.322022. We should
round up the value of k to the next natural number. i.e., k = 6. but k ≤ 12,
therefore k = 6, 7, 8, 9, . . . , 12. We have to modify the previously grouped
frequency distribution so as to have more than 5 class intervals.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Suppose now we take the number of class intervals k = 6, then the length of each
class interval; called class width c is equal to the ratio obtained from the division of
the difference of largest and smallest values divided by the number of class intervals
e.i., c = (maxvalue − minvalue)/k, and adjust its value to accommodate all values of
the given data variable.
Find the value of ”the class width” c needed to group the following data 69 84 52 93
81 74 89 85 88 63 87 64 67 72 74 55 82 91 68 77 into 6 class intervals
Solution: The value of c = (93−52)
6 = 6.833333 take an even number as the smallest
value 52 is even, thus c = 8.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

The modified grouped frequency distribution with 6 grouped class intervals is:

a-b 52 − 60 60 − 68 68 − 76 76 − 84 84 − 92 92 − 100
fi 2 3 5 3 6 1

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Extended Frequency Distribution Table:


Extended Frequency Distribution table is a table with k rows and headings i: the
counter of different distinct values, xi : the i th observed values, fi : the frequency of
the i th observed value, and cfi the cumulative frequency of the i th observed value
(or the position of the last xi value in an array of values sorted in ascending order)
Find the extended frequency distribution that corresponds to the following table:
xi 1 2 3 4 5 6 7
fi 1 2 2 3 1 1 2

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Note:
The cumulative frequency of the xi is given by the formula cf1 = f1 for the first value
x1 and cfi = cfi−1 + fi for all i > 1
Example:
Find the cumulative frequencies of the values presented by the previous frequency
distribution
cf1 = f1 = 1,
cf2 = cf1 + f 2 = 1 + 2 = 3,
cf3 = cf2 + f3 = 3 + 2 = 5,
cf4 = cf3 + f4 = 5 + 3 = 8,
...

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Extended table for Simple Frequency Distribution Table:


i xi fi cfi
1 1 1 1
2 2 2 3
3 3 2 5
4 4 3 8
5 5 1 9
6 6 1 10
7 7 2 12
P7
i=1 12
Note that the cumulative frequency distribution indicates the position of the observed
value xi in an array sorted in ascending order. For eg. the only one value of 1 is in
position 1, the 2nd value of 2 is in position 3, the 2nd value of 3 is in position 5, the
3rd value of 4 is in position 8, the only one values of 5 and 6 are in their positions 9
and 10, and the 2nd value of 7 is in position 12
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Extended Grouped Frequency Distribution Table:


This table is different from the last one. Apart to contain the counter, different
class intervals, frequency, and cumulative frequency, it has an inserted column for
midpoints mi just after the column for different class intervals.
The following table presents the extended grouped frequency distribution. Remark
that the midpoint of the i th class interval, a − b, is mi = (b+a)
2
i a-b mi fi cfi
1 52 − 60 56 2 2
2 60 − 68 64 3 5
3 68 − 76 72 5 10
4 76 − 84 80 3 13
5 84 − 92 88 6 19
6 92 − 100 96 1 20
P6
i=1 20
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Activity II
1 Represent the following data of the ages of 62 people who live in a certain
neighborhood by an appropriate frequency distribution. Construct its
corresponding extended frequency distribution table:
2, 5, 6, 12, 14, 15, 15, 16, 18, 19, 20, 22, 23, 25, 27, 28, 30, 32, 33, 35, 36, 36,
37, 38, 39, 40, 40, 41, 42, 43, 43, 44, 44, 45, 45, 46, 47, 47, 48, 49, 50, 51, 56,
57, 58, 59, 59, 60, 62, 63, 65, 65, 67, 69, 71, 75, 78, 80, 82, 84, 90, 96
2 Octane levels for various gasoline blends are given below: 87.9 84.2 86.9 87.7 91.7
88.8 95.3 93.5 94.3 88.1 90.2 91.4 91.3 93.9
Represent these data by an appropriate extended frequency distribution table.
Explain why you made a such choice.
3 The following are data on the number of students per classroom in AUCA, Faculty
of Education. Represent them by an appropriate frequency distribution table
14 11 10 8 12 13 11 10 16 11 11 9 9 7 14 12 9 10 11 6 13 8 11 11 9 8 13 16 10
11 9 8 12 11 10
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

2.2. Graphical Presentation of Data Variable

Statistical data variables are often presented by a graph or chart. The type of chart
depends upon the nature of the variable it may represent.
Qualitative variables are either represented by a pie chart or a bar chart.
Quantitative discrete variables are represented by rod/spike chart, frequency
polygon, and cumulative frequency chart (ogive in stairs form).
Quantitative continuous variables are represented by a Histograms chart, polygon
frequency chart, and ogives in a continuous curve form.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

2.2.1. Graphical Presentation of Qualitative Data Variable


Pie Chart:
A pie chart is often used to indicate relative frequencies when the data are not
numerical in nature (for categorical and nominal variable). A circle is constructed
and then sliced into different sectors; one for each distinct type of data value. The
relative frequency of a data value is indicated by the area of its sector, this area is
described by an angular sector θi proportional to the relative frequency Fi = fi /N
of the data value. θi = 3600 ∗ Fi
Example:
The following data relate to the different types of cancers affecting the 200 most
recent patients to enroll at a clinic specializing in cancer.

Type lung Bearst Colon Prostate Melanoma Bladder


No cases 42 50 32 55 9 12

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Solution:
Generate the extended frequency distribution table:

i category fi Fi = Nfi θi = 3600 ∗ Fi


1 lung 42 0.21 75.6
2 Breast 50 0.25 90
3 Colon 32 0.16 57.6
4 Prostate 55 0.275 99
5 Melanoma 9 0.045 16.2
6 Bladder 12 0.06 21.6
P6
i=1 200 1 360

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Bar Chart:
Bar chart is another alternative representation of qualitative data variable. It
consists of a sequence of the equidistant vertical rectangles proportional to the
frequency fi for each value xi , drawn in the XY −plane
Example:
Frequency distribution of the enrollment of four classes in a high school is given in the
following table.

i Class fi Fi = Nfi distance(d)(cm)


1 Algebra 26 0.26 7.8
2 English 30 0.30 10
3 Physics 19 0.19 5.7
4 Biology 24 0.24 7.2
P4
i=1 99 1

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Note: The bar chart is fitted within the available space by the scale defined by the
distance at which the largest frequency is fixed. i.e at x cm from the origin of the
y-axis (axis of frequencies).
30 freq −→ 10 cm
1 freq −→ 1030 = 0.3cm The positions are indicated in the last column of the
above-extended frequency distribution table.
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

2.2.2. Graphical Representation of Quantitative Discrete Data Variable

Rod/Spike Chart:
This graphic is drawn in XY-plane where different distinct values xi are found on
X-axis, and their frequency fi on Y-axis. The graphic consists of vertical line
segments starting from X-axis at the point xi , and of height proportional to the
frequency fi .
Example:
Represent the following data variable by Rod / Spike Chart:

xi 1 2 3 4 5 6 7
fi 1 2 2 3 1 1 2

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

The Extended frequency distribution table of the given example is:

i xi fi cfi location(cm)
1 1 1 1 3
2 2 2 3 6
3 3 2 5 6
4 4 3 8 9
5 5 1 9 3
6 6 1 10 3
7 7 2 12 6
P7
i=1 12

Let’s locate the largest frequency 3 on Y-axis (axis of frequencies) at 9 cm. The
location of other frequencies follows from the correspondence 3 freq −→ 9 cm. We
have the correspondence: 1 freq −→ 93 = 3cm (one unit of frequency must be marked
at 3 cm from the origin 0 on Y-axis.)
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Finally, we have the following Rod/Spike Chart:

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Polygon Frequency:
Polygon frequency is obtained by joining every two consecutive upper points of
the rod/spike chart by a line segment. Here below is the polygon chart generated
from the previous rod/spike chart.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Cumulative frequency chart (Ogive):


Ogive is the chart drawn in XY-plane defined by the step function. f (x) =
0, x ≤ x1
cfi , xi ≤ x < xi+1 , for i = 1, 2, . . . , k − 1
12, x ≥ xk

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

2.2.3. Graphical Representation of Quantitative Continuous Data Variable


In this part, we shall use the following extended grouped frequency distribution table
that presents the lifetimes of 200 incandescent lamps.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Histogram:
Histogram is a series of contingent rectangles of equal breadth, drawn in
XY-plane, whose heights are proportional to the frequency of each class interval.
Remember that to draw the histogram, it is necessary to fix first the largest frequency
at a specified distance x (units of distance) from the origin 0 of the axis of frequencies.
The next ppt shows a histogram that corresponds to the lifetimes of 200 incandescent
lamps. We should fit the chart within 10 cm i.e. fix the frequency 58 at 10 cm from
the origin of the axis of frequencies

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Polygon frequency:
Polygon frequency of the grouped frequency distribution is obtained by joining the
consecutive upper midpoint of each class interval (midpoint of the upper bases of
the rectangles) by line segments.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Cumulative frequency (Ogive):


Cumulative frequency (ogive) that represents the grouped frequency distribution is
a simple smooth curve through the upper points of midpoint drawn from the first
and then through all next other midpoints in their successive orders.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

2.2.4. Data Presentation by Stem-and-Leaf Graphic (Chart)


One simple graph, stem-and-leaf or stemplot, comes from exploratory data analysis. It
is a good choice when the data sets are small or for grouped data.
To create the chart, divide each data observation into a stem and a leaf. The leaf
consists of the last significant digit of the observed value, and the stem is the
remaining part of that value.
For example:
23 has stem 2 and leaf 3 (3=the last significant digit)
432 has stem 43 and leaf 2 (2=the last significant digit)
5,432 has stem 543 and leaf 2 (2=the last significant digit)
9.3 has stem 9 and leaf 3
A stemplot is a table with two columns in which the first column contains all
stems in ascending order, and their respective leaves sorted in ascending order,
in the second column
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Example:
Generate the stemplot corresponding to the following scores for the final exam of
descriptive statistics.
33 42 49 49 53 55 55 61 63 67 68 68 69 69 72 73 74 78 80 83 88 88 88 90 92 94 94
94 94 96 100
The following is the stem-and-leaf graphic of the above data

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3. Summarizing Data Variable

In statistics, data variables are summarized by some values computed (or determined)
from the values of that data variable. These values are often called statistical
descriptor measures of the statistical data variable.
The statistical descriptor values are classified into two categories: measures of
central tendency and measures of spread. The measures of central tendency
comprise the mean, the median, the mode, and the quantiles while the tree
measures; the ranges, variance, standard deviation, and coefficient of variation
are expressing the spread or volatility of the data variable.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3.1. Measures of Central Tendency

One of the important objectives of statistical analysis is to determine various numerical


measures that describe the inherent characteristics of a frequency distribution. These
measures are called averages. They condense the set of numerical data into single
numerical values to represent the entire distribution. Averages are the values that lie
between the smallest and the largest observations of the distribution. They reflect the
pattern of concentration of the values in the central part of the distribution. Averages
are useful for the following reasons:
1 Describing the distribution in a concise manner;
2 Comparative study of different distributions;
3 Computing various other statistical measures such as correlation, regression,
dispersion, skewness, and other various basic characteristics of a mass of data.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3.1.1. The mean


The mean in statistics stands for the representative of other values. It is involved in
good number of calculations needed for statistical analysis of a data variable(s). In
statistics, there are different kinds of means which are arithmetic mean, geometric
mean, harmonic mean, quadratic mean, and weighed mean
Arithmetic mean:
The arithmetic mean of N observed values equals the sum of the resulting values
divided by the total number of these values (i.e. N).
The arithmetic mean of the statistical variable x, is the value denoted by µ and x̄
(for the sample) and is given by the direct formula by:
N
1 X
x̄ = xi
N
i=1

This formula is given when the data are given in form of the raw data
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Cont.
Example:
Find the arithmetic mean of the following data of the variable x: 24 39 7 48 16 29 34
20 43 18
10
1 X
x̄ = xi
10
i=1
24 + 39 + 7 + 48 + 16 + 29 + 34 + 20 + 43 + 18 278
x̄ = = = 27.8
10 10

Arithmetic mean for data in a simple frequency distribution:


If data are given in frequency distribution, the previous formula must include the
frequency for each distinct observed value xi . The arithmetic mean is given by
k
1 X
x̄ = xi fi
N
i=1
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Example:
Find the arithmetic mean of the data variable x represented by the following frequency
distribution:
xi 1 2 3 4 5 6 7
fi 1 2 2 3 1 1 2
Solution:
The problem must be solved by adding column for xi fi to the extended frequency
distribution table. This looks like:
i xi fi cfi xi fi
1 1 1 1 1
2 2 2 3 4
3 3 2 5 6
4 4 3 8 12
5 5 1 9 5
6 6 1 10 6
7 7 2 12 14
Dr. Hategekimana
P7 Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

7
1 X
x̄ = xi fi
12
i=1
48
x̄ = =4
12
Note:
When the different distinct values xi and fi have many digits, the product xi fi may
take time to be computed to fill in the columns of the extended frequency distribution
table. To simplify the task, we have to use the following formula called the short-cut
formula for the calculation of the arithmetic mean
k
1 X
x̄ = A + di fi
N
i=1
Where, A: the assumed mean (One of the xi value taken in the central part of the
column of xi ). For the previous example A = 4, the 4the observed value.
di : the deviation of the value xi from the assumed mean A. e.i, di = xi − A
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Find the arithmetic mean of the previous example by using the short-cut formula:
i xi fi cfi di = xi − 4 di fi
1 1 1 1 −3 −3
2 2 2 3 −2 −4
3 3 2 5 −1 −2
4 4 3 8 0 0
5 5 1 9 1 1
6 6 1 10 2 2
7 7 2 12 3 6
P7
i=1 12 48 0
7
1 X
x̄ = 4 + di fi
12
i=1
0
x̄ = 4 + =4
12
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Arithmetic mean for data in a grouped frequency distribution


When data variable x is represented par a grouped frequency distribution, the
arithmetic mean x̄ is obtained by the formula,
k
c X
x̄ = A + Di fi
N
i=1

where
k: the number of different class intervals
A: the assumed mean which is one of the mid-points mi picked from the central
part of the column of the mid-points.
Di : the reduced deviation of the mid-point mi from the assumed mean A i.e.,
Di = mi c−A
c: Class width or the length of each class interval
Note: The above formula is called step-deviation formula and gives an
approximate value of the arithmetic mean.
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Find the arithmetic mean of the data variable here below represented by the following
grouped frequency distribution with the first class interval: 0 − 10
class 1 2 3 4 5 6 7 8
fi 5 10 25 30 20 10 5 5
Solution:
−35
i a-b mi fi cfi Di = mi10 Di fi —
1 0 − 10 5 5 5 −3 −15
2 10 − 20 15 10 15 −2 −20
3 20 − 30 25 25 40 −1 −25
4 30 − 40 35 30 70 0 0
5 40 − 50 45 20 90 1 20
6 50 − 60 55 10 100 2 20
7 60 − 70 65 5 105 3 15
8 70 − 80 75 5 110 4 20
P7
i=1 110 15
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Finally, we obtain the approximate value of the arithmetic mean equal to:
k
c X
x̄ = A + Di fi
N
i=1
10x15
x̄ = 35 +
110
15
x̄ = 35 +
11
x̄ = 36.36

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Properties of the arithmetic mean:


The arithmetic mean of the data variable x satisfies the following properties:

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3.1.2.Other Types of Mean

Apart from the arithmetic mean, as we say, the central value tendency knows other
measure under the global name of mean:

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Solve the problems of the activity here below using the theory of weighted mean.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3.1.2. The Mode of the Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

The mode of the data in grouped frequency distribution


When data are presented by a grouped frequency distribution, the mode of the data
variable is the value Mo defined by the formula:

(f1 − f0 )
M0 = L + xc
(f1 − f0 ) + (f1 − f2 )

, Where
L :is the lower limit of the modal class interval (The modal class interval is the one
that has the highest frequency).
f0 : is the frequency of the class interval that comes just before the modal class
interval
f1 : is the frequency of the modal class interval
f2 : is the frequency of the class interval that comes just after the modal class interval
c : is the class width
Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS
Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

3.1.3. The Percentiles and Quartiles.

Given a set of numerical observations, we may transform it into an array of data (order
the data in ascending order). In statistics, it is very important to understand the role
of percentiles. The percentiles are positional values. They are describing the number in
percent of the data less or equal to the value in a specific position within the set of the
whole values.
Example:
If the grade of the student is in the 90th percentile, this does mean that 90% of his or
her classmates have got grades less than or equal to his (or her) grades.

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS


Data Variable Organization and Presentation
Summarizing Data Variable

Dr. Hategekimana Fidele DESCRIPTIVE STATISTICS

You might also like