0% found this document useful (0 votes)

266 views

Statistics

Uploaded by

jaishireshraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

266 views

Statistics

Uploaded by

jaishireshraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 212

STATISTICS

BSTA01-01_E
STATISTICS
MASTHEAD

Publisher:
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt

Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf
[email protected]
www.iu.de

BSTA01-01_E
Version No.: 001-2023-1103
Concept: IU Internationale Hochschule GmbH
Author(s): Heike Bornewasser-Hermes
Translation: Heike Bornewasser-Hermes and Nazli Andjic

© 2023 IU Internationale Hochschule GmbH

This course book is protected by copyright. All rights reserved.
This course book may not be reproduced and/or electronically edited, duplicated, or dis-
tributed in any kind of form without written permission by the IU Internationale Hoch-
schule GmbH (hereinafter referred to as IU).
The authors/publishers have identified the authors and sources of all graphics to the best
of their abilities. However, if any erroneous information has been provided, please notify
us accordingly.

2
TABLE OF CONTENTS
STATISTICS

Introduction
Signposts Throughout the Course Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Suggested Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Unit 1
Introduction 9

1.1 Subject of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Basic Concepts of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Procedure of Statistical Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Unit 2
Analysis Methods of One-Dimensional Data 19

2.1 Frequency Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Unit 3
Analysis Methods of Two-Dimensional Data 57

3.1 Contingency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Rank Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Correlation Measures for Different Scale Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Unit 4
Linear Regression 87

4.1 Basics of Simple Linear Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2 Determination of the Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Unit 5
Fundamentals of Probability Theory 103

5.1 Random Experiments and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Probability of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Random Variables and Their Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

3
Unit 6
Special Probability Distributions 135

6.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Unit 7
Statistical Estimation Methods 161

7.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Unit 8
Hypothesis Testing 175

8.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.2 One-Dimensional Expected Value Test With Known Standard Deviation (z-Test) . . . 187
8.3 One-Dimensional Expected Value Test With Unknown Standard Deviation (t-Test) . 197

Appendix
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
List of Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

4
INTRODUCTION
WELCOME
SIGNPOSTS THROUGHOUT THE COURSE BOOK

This course book contains the core content for this course. Additional learning materials
can be found on the learning platform, but this course book should form the basis for your
learning.

The content of this course book is divided into units, which are divided further into sec-
tions. Each section contains only one new key concept to allow you to quickly and effi-
ciently add new learning material to your existing knowledge.

At the end of each section of the digital course book, you will find self-check questions.
These questions are designed to help you check whether you have understood the con-
cepts in each section.

For all modules with a final exam, you must complete the knowledge tests on the learning
platform. You will pass the knowledge test for each unit when you answer at least 80% of
the questions correctly.

When you have passed the knowledge tests for all the units, the course is considered fin-
ished and you will be able to register for the final assessment. Please ensure that you com-
plete the evaluation prior to registering for the assessment.

Good luck!

6
SUGGESTED READINGS
GENERAL SUGGESTIONS

Freeman, J., Shoesmith, E., Sweeney, D., Anderson, D., & Williams, T. (2017). Statistics for
business and economics (4th ed.). Cengage Learning. http://search.ebscohost.com.pxz.
iubh.de:8080/login.aspx?direct=true&db=cat05114a&AN=ihb.45823&site=eds-live&sc
ope=site

Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its applica-
tion (5th ed.). Prentice Hall.

Reid, H. M. (2014). Introduction to statistics: Fundamental concepts and procedures of data

analysis. SAGE Publications.

Sanders, D. H. (1990). Statistics: A fresh approach (4th ed.). McGraw-Hill International.

UNIT 1

Halfens, R. J. G., & Meijers, J. M. M. (2013). Back to basics: An introduction to statis-

tics. Journal of Wound Care, 22(5), 248–251. http://search.ebscohost.com.pxz.iubh.de:
8080/login.aspx?direct=true&db=ccm&AN=87818359&site=eds-live&scope=site

7
LEARNING OBJECTIVES
The term “statistics” generally describes two phenomena: (1) the tabular and/or graphical
preparation of data as well as (2) the statistical methods used to collect, prepare, and
draw conclusions from data. From this, it follows that our confrontation with statistics is
omnipresent – whether it is in our studies (e.g., consider how statistical methods are
applied in related courses or bachelor theses) or everyday professional life (e.g., managers
are confronted on a daily basis with statistical evaluations that they have to understand
and interpret).

The aim of your Statistics course book is to teach you the most essential elements of stat-
istical procedures. Methodologically, the course book comes in three parts. First, you will
be presented with the theoretical foundations of individual statistical methods, which will
then be deepened through the use of small examples and illustrations. Second, you will
apply and run the methods learned on application cases. Third, you will learn how to cre-
ate and interpret conclusions drawn from a “sample.”

8
UNIT 1
INTRODUCTION

STUDY GOALS

On completion of this unit, you will be able to ...

– assess the necessity of statistics.

– differentiate between approaches to research design.
– distinguish the population from the sample.
– name and apply the scale levels.
– assign the appropriate scale level to any given characteristic.
1. INTRODUCTION

Introduction
A business owner wonders how their turnover will develop in the future. The head of a
hospital wants to know what its capacities look like in terms of free beds. A private house-
hold needs to compare its monthly income and expenditure. All three use statistical analy-
ses – no matter how simple – to get one step closer to their desired insights. The unit dis-
cusses the most important goals of statistics, the basic concepts, and the process of
statistical investigations.

1.1 Subject of Statistics

In almost every scientific discipline, one of the primary goals is to gain new knowledge
about specific topics. For example, in the field of economics, many researchers are
extremely interested in knowing more about how unemployment figures will develop and
whether the inflation rate will continue to rise. Such knowledge enables them to make
possible forecasts for future developments and derive the need for action.

In the medical field, it is common to investigate whether a newly developed drug actually
has the desired effect on patients. For this purpose, experts often conduct experiments in
which one group of patients takes a drug for a certain period (i.e., the experimental
group), while another group of patients takes a placebo (i.e., the control group). Here, the
aim is to determine whether the health status of the experimental group improves in com-
parison to that of the control group. Numerous data are collected over the course of such
an experiment. The health status is always recorded at the beginning of the study, and fur-
ther data are obtained during the experiment.

Studies in the field of psychology also focus on gaining new insights. For instance, many
psychologists want to increase their knowledge of how much stress different individuals
experience in certain situations. For this purpose, they often carry out experiments to
obtain reliable data on human behavior.

In essence, statistical procedures are necessary to collect findings and draw appropriate
conclusions from different research inquiries – such as those described above – and their
associated data.

The Importance of Statistics

To begin this section, let’s take a look at a situation from autumn 2022. During this time,
numerous parts of the health system were severely tested by the ongoing COVID-19 pan-
demic. Many hospitals were pushed to their capacity, nurses and doctors were at the limit
of their ability to cope, and many employees experienced great psychological strain due to

10
social isolation and the constant need to work from home. Statisticians were essential in
obtaining and maintaining an overview of the situation. With this in mind, let’s look at
some specific examples that illustrate the importance of statistics.

At the time, it was crucial to know the current number of available hospital beds. Thus, in
many regions, the number of free and occupied intensive care beds was analyzed on a
daily basis. Likewise, the proportion of patients infected with COVID-19 in the intensive
care section was also examined to determine the available capacities for patients not
infected with COVID-19. With the help of this information and the necessary statistical
know-how, it was possible to estimate both how long the available capacities would last
and when, in the worst case, medical care would collapse.

It should be clear why employee health is important in the context of a pandemic. In a

more general work context, the use of statisticians is indispensable to, for instance, find
out whether health promotion measures are likely to be as successful as expected or not.
If a company provides their employees with sport courses such as yoga, a free gym mem-
bership, a job bike, and other health-promoting offers, this raises questions about the
extent to which these measures will impact employee performance, rate of sickness, and
general well-being in the workplace. For this purpose, data must be collected and evalu-
ated statistically so that an appropriate conclusion can be drawn, such as deciding if it is,
indeed, worthwhile for companies to invest in the health of their employees.

Notably, there was also a great interest in observing the interaction between the incidence
of mental illness and the demand for therapists. A researcher at the time might have found
themselves with questions like the following ones:

• Is the pandemic encouraging a trend in mental illnesses that were not so much in the
spotlight before?
• Is the supply of psychotherapeutic services sufficient to support these illnesses?
• How long on average does it take for a sick person to obtain a place in therapy?

Reliable data have proven to be indispensable in investigating all of these questions. Only
with such data could experts determine when the situation was under control. Together,
these three examples should demonstrate how important the field of statistics is. Based
on new findings, it enables us to take the best actions.

1.2 Basic Concepts of Statistics

Let’s assume that we want to conduct a study that will provide us with an overview of the
situation in German hospitals. Among other things, we are mainly interested in the follow-
ing aspects of each hospital:

• Is it an urban or rural hospital?

• Is it a university hospital?
• What is the catchment area of the hospital in kilometers?
• Which wards are associated with the hospital?

11
• How many intensive care beds are occupied?
• How many beds in the normal wards are occupied?
• On a scale from “very good” to “very bad,” how well are the patients cared for and trea-
ted?

These and many other points could be useful to help us obtain an overview of German
hospitals and their care services. With the help of this example, let’s go through the most
important terminology that is repeatedly used in statistical contexts.

Object, Population, and Sample

In our example, every German hospital that can potentially participate in the study repre-
sents an object. Individuals, companies, and other institutions can also be objects. In
Object short, an object is anyone or anything that you would like to gain knowledge about. At the
An object is a person or beginning of a study, it is always crucial to know and be able to name the totality of all
subject of interest about
which one wants to gain objects. This defines the population, and in the present study, it includes all carriers of
information. the characteristics that we want to investigate. In this case, all German hospitals form the
Population population of our study.
The population includes
all objects.
It would be desirable to consider the whole population within the framework of our study.
However, for reasons of time and cost, you often have a much smaller number of objects
Sample at your disposal in practice. A sample consists of the objects that are finally taken into
The sample includes all account. In our case, our sample consists of the hospitals that we examine and can make
actually examined
objects. statements about.

Regarding the sample’s composition, it is important that the objects be representative of

the population, as only then can the statements that we make about the population
improve their reliability. For instance, if our sample only consists of provincial or university
hospitals, the results will certainly not reflect the population of all hospitals in a country
like Germany. In principle, the larger the sample size, the more precise the statements that
can be made about the population. However, we must also call for caution here. Namely, if
the sample is large but contains non-representative objects, then this will not be helpful
either (Bortz & Schuster, 2010, pp. 79–80).

Differentiation of Samples

There are many ways to make a sample or draw a sample from the population. The first
one is to make a random sample. Here, we make a distinction between three types of ran-
Simple random dom sampling. In simple random sampling, the objects are selected randomly, and each
sampling entity has an equal chance of being selected. In the previous example, each hospital has
A simple random sample
provides the best basis for the same probability of being included in the sample. This procedure is only possible if the
meaningful results. entire population is known.

Stratified sampling is another type of random sampling. Here, the population is first divi-
ded into subpopulations according to relevant variables. Samples are then drawn from
the subpopulations. In order to ensure that hospitals of all sizes are included in the exam-

12
ple, you could first form three strata of hospitals, such as (1) hospitals with up to
500 patients, (2) those with more than 500 patients to 1,000 patients, and (3) those with
more than 1,000 patients. Then, hospitals are randomly selected from all three strata.

The final type of random sample discussed here is a lump sample. In this case, naturally
existing subsets of the population are randomly selected and then fully investigated. For
example, to ensure that hospitals from all areas of Germany are included in the study, you
could randomly select some of the 294 German counties. Within the selected counties, all
hospitals are then included (Halfens & Meijers, 2013, pp. 249–250).

Note that only random samples are considered to form a suitable basis for statistical anal-
yses that you want to leverage to draw conclusions about the general public. Some inqui-
ries use ad hoc samples, in which the characteristics that are available at the time of data
collection are selected. This is often the case for street surveys that collect opinions on
current topics. Suppose thata reporter goes to the center of a major city and asks the peo-
ple passing by what they think about a new political leader. They might interview every-
one who is willing to provide a statement and then include them in the sample, making
this an ad hoc sample. Overall, such forms of sampling create insufficient bases for making
generally valid statements (Raithel, 2008, pp. 56–57).

Variables and Values

Returning to our hospital study, we are interested in a lot of different information about
the hospitals in question. For example, we would like to know whether the hospitals are
university hospitals, how many intensive care beds are still available, and whether the
patients are satisfied with the hospital. All these characteristics that can be measured are
known as variables. The values that a variable can assume are, indeed, called values. In Variable
other words, a value is a possible observation of a variable. For example, as we have out- A variable is a property of
interest of an object.
lined it, the environment of a hospital has two possible values: urban or rural. In contrast,
Value
the number of free intensive care beds can contain any number as its value, whereas the A value is a possible
general satisfaction of the patients with the hospital has values that range from “very observation of an varia-
ble.
good” to “very poor.”

The type of value is of decisive importance, and different statistical procedures may be
used for individual variables. If the values consist of numbers, far more statistical calcula-
tions can be carried out than if they are only described using words. For example, we can
calculate how high the minimum number of free intensive care beds is, how many beds
are still free on average across all hospitals, and many other measures. However, in terms
of the environment, we can only determine how many of the hospitals are in urban areas
and how many are in rural areas.

Scales of Measurement

To differentiate between the types of statistical analyses that can be carried out for vari-
ous characteristics, the variables are assigned to scales of measurement. Scales of meas- Scale of measurement
urement reflect how variables are defined. A distinction is generally made between three The scale of measure-
ment determines the pos-
different scale levels. sible statistical analyses.

13
Nominal scale The weakest scale, in terms of evaluation possibilities, is the nominal scale. Nominal
The nominal scale allows scale variables are those whose values are actually names or categories, and they cannot
the least statistical evalu-
ation possibilities. be placed in a meaningful order. For instance, the type of hospital ward is a nominal scale
variable. A hospital ward might be, for example, an oncology ward, a trauma surgery, or an
orthopedics ward. These categories cannot be meaningfully ordered. A special kind of
nominal scale variable is a dichotomous variable. A dichotomous variable has exactly two
possible values. For instance, if we treat the environment of a hospital as either urban or
rural, then the environment is a dichotomous variable. Likewise, we could investigate
whether a hospital is a university hospital or not.

Ordinal scale The next strongest scale of measurement is the ordinal scale. Ordinal scale variables also
The ordinal scale allows measure non-numeric concepts, like names and categories. In contrast to the nominal
more statistical evalua-
tion possibilities than the scale, however, these categories can be placed in a meaningful order. For instance, in our
nominal scale but less example, the general satisfaction of the patients with the hospital is an ordinal scale varia-
than the cardinal scale. ble. If we assume that the options that a patient can select are “very good,” “good,” “aver-
age,” “bad,” and “very bad,” then these are categories that can be sorted in a meaningful
way. However, note that the difference between the values cannot be interpreted in a
mathematically meaningful way. For example, one cannot say that a “very good” assess-
ment of a hospital is twice as good as a “good” assessment. Both nominal and ordinal
scale variables can be referred to as qualitative variables.

The strongest scale of measurement, in terms of statistical evaluation possibilities, is the

Cardinal scale cardinal scale. Cardinal scale variables – also called numeric variables – have values that
The cardinal scale allows are numbers, which means that we can measure their differences in a mathematically
the most statistical evalu-
ation possibilities. meaningful way. The number of free intensive care beds, for example, is described by a
number. If 20 intensive care beds are still available in one hospital and only ten in another,
we can state that the first hospital has (1) twice as many intensive care beds and (2) ten
more intensive care beds than the other hospital.

We can distinguish between types of cardinal scales in two ways. The first way is to make a
Interval scale distinction between an interval scale and a ratio scale. An interval scale applies to cardi-
The interval scale has no nal scale variables that do not have a natural zero point, while a ratio scale applies to vari-
natural zero point.
ables that do have a natural zero point.
Ratio scale
The ratio scale has a natu-
ral zero point. But what do we mean by a natural zero point? Let’s consider the examples of temperature
and credit balance. Temperature is typically measured in degrees Celsius (°C) or Fahren-
heit (°F). Since 0°C does not describe the same degree of heat or cold as 0°F, there is no
natural zero point. Therefore, temperature is an interval scale variable because the num-
ber zero does not have the same meaning in all its units. If, conversely, you look at the
asset or credit balance in your account, it is, in this case, irrelevant which currency this is
measured in. If your credit balance is €0 or $0, it means that there is simply no credit bal-
ance. If the number zero has one and the same meaning in all conceivable units, this
means we have a ratio scale characteristic (Bortz & Schuster, 2010, pp. 12–14).

The cardinal scale is further subdivided into discrete and continuous characteristics based
Discrete variable on the number of different values. A discrete variable refers to countable and distinct val-
A discrete variable has ues (i.e., a finite or countably infinite number of values). An example of a discrete variable
only a few different val-
ues. is the number of people living in a household. In the hospital example, the number of free

14
intensive care beds is also a discrete variable. Even though there may only be a few free
intensive care beds, the number can be determined by counting each bed. A continuous
variable, conversely, refers to a variable that takes on any value within a range, meaning Continuous variable
that the number of possible values within that range is infinite. Such variables are able to A continuous variable has
very many different val-
assume all conceivable interval values (Handl & Kuhlenkasper, 2018, p. 10). In the context ues.
of the hospital study, suppose that we want to consider the expenditure of each hospital.
An infinite number of values is possible simply by measuring the exact amount. Cardinal
scale variables can be referred to as quantitative variables. The following figure summari-
zes the different scale levels discussed in this unit.

Figure 1: Variable Classification by Scales of Measurements

Source: Heike Bornewasser-Hermes (2023).

1.3 Procedure of Statistical Investigations

Let’s continue our fictional study in which we would like to get an overview of German
hospitals. To obtain our findings, we must first follow several steps.

Data Collection

The first step is to collect data. The data can be primary or secondary. If the data are col-
lected using written or oral surveys or experiments, then we are collecting primary data. In
our example, if we involve the selected hospitals in a survey, then we are collecting pri-
mary data. However, it is possible to use already-existing data, which is known as secon-
dary data. In the present context, we might be able to use data already collected from the
previous year.

15
Data collection can look very different depending on the temporal dimension. Here, we
Cross-sectional design make a basic distinction between a cross-sectional design and a longitudinal design. If
In a cross-sectional our survey of hospitals is only carried out at one point in time or within a short period of
design, data are collected
only over a short period time (approximately two to four weeks), then we have carried out a cross-sectional design
of time. survey. However, if we want to find out how the situation changes over time, we should
Longitudinal design collect data according to a longitudinal design, which means that we repeatedly collect
In a longitudinal design,
the data on the same variables at several successive points in time.
the same data are repeat-
edly collected at several
successive points in time. When we are discussing longitudinal design, a distinction is made between trend design
Panel design and panel design. If we collect data according to a trend design, we repeatedly examine
The panel design produ- hospitals at regular intervals to investigate our specific questions, and it is not necessary
ces the results with the
most information. to study the same hospitals at each point in time. However, if we use a panel design, we
must precisely fulfill the requirement of evaluating the same objects. In this case, we
would need to involve the same hospitals in the study at intervals of, say, half a year.

The greatest advantage of a panel design is that intraindividual changes can be observed
over time. For example, it is possible to determine how the utilization of intensive care
Panel effects beds changes over time. The disadvantages of this design are the panel effects or learning
This means that the same effects (which is less of a problem in the present example) as well as panel mortalities.
answers are given over
and over again.
Panel mortalities Ultimately, studies that are conducted according to a panel design provide the highest
Over the course of time, information content, followed by trend design and, finally, cross-sectional design (Raithel,
some participants (e.g.,
hospitals) may drop out
2008, pp. 50–51).
of the survey for various
reasons. Data Handling

Once the data have been collected, such as through a survey or observation, data han-
dling must be carried out. This means that the collected data are prepared with the help of
statistical software such as Excel, SPSS, R, and Stata so that they can be evaluated statisti-
cally. This is a very important step because it lays the foundation for the possibility of a
clean evaluation. Often, important data are missing or transferred incorrectly. A precise
examination of the data, here in this second step, is, therefore, indispensable.

Data Analysis

Once all the data have been carefully entered and processed in a statistical program, the
analysis of the data can begin. A distinction is made between three major areas of statis-
Descriptive statistics tics. The first area is descriptive statistics, where the collected data are first described.
The descriptive statistics Tables, graphs, or measures are used for this purpose. Descriptive statistics, thus, serve
are used to describe the
data collected. the purpose of summarizing the data in a condensed form. For example, we can use the
mean value to determine the average number of occupied intensive care beds across all
hospitals.

Inferential statistics The second important area is inferential statistics. These statistics check the transferabil-
The inferential statistics ity of the descriptive results to the population. Inferential statistics are used when, for
check the transferability
of the descriptive results example, not all German hospitals can be included in the hospital study. When we only
to the population. have a selection of them, it becomes important to check whether the descriptive results
for the selected hospitals can be generalized to all hospitals. For this purpose, hypotheses

16
are usually formulated that need to be tested within the framework of inferential statis-
tics. For example, we might test a hypothesis about whether university hospitals have a
larger catchment area than non-university hospitals.

The third area is exploratory statistics. These statistics explore new, under-researched Exploratory statistics
areas (i.e., they are used precisely when there is little or no knowledge of the planned The exploratory statistics
explores new, under-
research area). At the beginning of the COVID-19 pandemic, all statistical analyses were researched areas.
initially of an exploratory nature, since all researchers were moving into an unexplored
area.

In order to understand the basics of statistics, we must, without exception, deal with
cross-sectional data. We will analyze cross-sectional data on paper – without any statisti-
cal programs – and examine both descriptive statistics and inferential statistics inten-
sively.

SUMMARY
Statistics involves the analysis of data with the aim of gaining new
insights. Statistical work is divided into the three steps of (1) data collec-
tion, (2) data preparation, and (3) data analysis. In the last step, a dis-
tinction is made between descriptive, inferential, and explorative analy-
ses. A panel design offers the most informative data design. In this case,
the same characteristic bearers are repeatedly interviewed at regular
intervals on one and the same topic.

As a rule, you only examine data from a sample. However, the composi-
tion of the sample should be as representative as possible of the popula-
tion so that we can draw meaningful conclusions for the public. Within
the framework of such studies, the objects are examined based on a
wide range of variables. The values of these variables are, in turn, deci-
sive in terms of the types of statistical evaluations that are permitted.
Different variables are each assigned to a possible scale of measurement
to ensure a suitable differentiation between them.

17
UNIT 2
ANALYSIS METHODS OF ONE-DIMENSIONAL
DATA

STUDY GOALS

On completion of this unit, you will be able to ...

– describe the graphical representations of individual variables.

– evaluate and explain the most significant location parameters.
– calculate and interpret the most important measures of dispersion.
2. ANALYSIS METHODS OF ONE-
DIMENSIONAL DATA

Introduction
Let’s assume that we have conducted a small survey in a hospital. We are interested in
finding out how satisfied the patients are with the nursing robots and how many times
they have encountered such robots before this hospital stay. In addition, we asked about
the gender and age of the patients. Before we begin the analyses and, if necessary, test the
previously formulated hypotheses, it is crucial to describe the collected data. This is done
in the context of descriptive statistics.

2.1 Frequency Distributions

In the context of descriptive statistics, a distinction must be made between univariate and
bivariate analysis. If we analyze only gender and, thus, only one variable, we must perform
a univariate analysis, which is also known as a one-dimensional data analysis. If, however,
we wanted to uncover a possible relationship between gender and satisfaction with nurs-
ing robots, we need to use bivariate statistics because we must examine two variables.

Univariate analysis In this section, we will first discuss univariate analysis. We will examine only one variable
The univariate analysis at a time and use tables, graphs, and various measures for this purpose. In the next sec-
examines exactly one var-
iable. tion, we will carry out a bivariate analysis with the help of various correlation analyses. In
Bivariate analysis all these statistical analyses, the scale level is of decisive importance. Depending on the
The bivariate analysis available scale level, we will explain which analyses are feasible and which are not.
examines the relationship
between two variables.

EXAMPLE: SURVEY ON CARE ROBOTS

Suppose that we have conducted a small survey in a hospital. Twenty-five
patients answered the following four questions:

1. What is your gender? The options are “female,” “male,” and “diverse.”
2. How good are the nursing robots in your opinion? The options are “very
good,” “good,” “satisfactory,” “sufficient,” and “poor.”
3. How many times have you encountered similar nursing robots before?
The options are “0,” “1,” “2,” “3,” and so on.
4. How old are you? This is given in years.

The answers of the 25 patients are given in the following table.

20
Table 1: Results of Patient Survey

Patien Gender Satisfaction Previ- Age

t ous
con-
tact

1 female good 1 16

2 female good 5

3 female good 0 50

4 female good 0 35

5 male satisfactory 1

6 female satisfactory 1 47

7 female satisfactory 2 15

8 female satisfactory 1 20

9 male good 1 47

10 male satisfactory 1 48

11 female satisfactory 1 44

12 male 1

13 female good 2 55

14 female good 1 56

21
Patien Gender Satisfaction Previ- Age
t ous
con-
tact

15 female good 0 35

16 female sufficient 3 48

17 female good 1

18 female good 1 52

19 male very good 0 49

20 female sufficient 3

21 female good 0 68

22 female satisfactory 1 17

23 female good 1 26

24 female satisfactory 2 39

25 female satisfactory 1

Source: Heike Bornewasser-Hermes (2023).

Each row of this table contains the answers of one patient. For example, the first
patient is female, she finds the nursing robots good, she has encountered simi-
lar robots once before, and she is 16 years old. It is not uncommon that (1) the
interviewees do not want to or cannot answer some questions or (2) they simply
forget to answer. For example, the second patient did not write her age, and the
12th patient did not provide any information on his satisfaction with the nursing
robots. As you can see, there are also several more gaps in the table.

22
This sample data set is the basis for the entire lesson. We will look at each of the four
questions above separately and use various measures to present the data collected in a
clearer way. It should be noted that this compilation of patients forms the sample and not
the population

Basics of Univariate Analysis

The scale level is of decisive importance in all forms of statistical analysis, including uni-
variate analysis. The scale level determines which statistical analyses are permitted and
which ones are not. Let’s now go through the four characteristics of the patients:

• Gender is nominally scaled. The expressions are categories that cannot be meaning-
fully ordered.
• Satisfaction with nursing robots is ordinally scaled. The expressions are categories
that can be meaningfully ordered.
• Both previous contact frequency with similar robots and age are cardinally scaled.
The expressions and intervals are numbers. We will also clarify whether these variables
are discrete or continuous.
◦ Previous contact frequency with similar robots only had five different responses (0, 1,
2, 3, and 5). For this reason, this variable should be treated as a discrete one.
◦ Age had 16 different values among a total of 19 responses. In such cases, it is recom-
mended to treat this variable as a continuous one and form age categories from the
given ages. In the following analyses, we will observe that such a procedure proves to
provide much clearer results.

Before starting the evaluation, we must introduce a little notation. One refers to the initial
data, which are available with respect to a variable x, as the primal list or raw data. In Primal list
general terms, the primal list is notated as follows (Bamberg et al., 2022, p. 23): The primary list (or raw
data) contains the collec-
ted data of a variable.
x1, x2, . . ., xn

x1 describes the value for the first person, x2 the value for the second person, and xn the
value for the nth person. n stands for the total number of persons or the sample size. For
the individual persons, a person index i = 1, . . . , n is created, which runs through all per-
sons. The primal list can be also denoted by xi for i = 1, . . . , n. As a starting point for a
data analysis, this original list is available for any scale level.

In the table above, each of the four feature columns is actually a primal list. The column
“Gender” represents the primal list of gender. All 25 patients provided an indication of
their gender, so, in this case, n = 25. The assessment of the nursing robots was ignored by
one person: So, there are responses for n = 24 patients here. The frequency of previous
contact with similar nursing robots was, again, answered by all patients (n = 25), whereas
age was only answered by 19 patients (n = 19).

23
Tables and Graphics

Tables and graphics are commonly used to present the collected data in a clear fashion.
Frequency table Both serve the purpose of summarizing the data of a variable. A frequency table is one
A frequency table sum- the most important tools used to present the collected data of a variable in a clear form.
marizes the collected
data of a variable in a Such a table summarizes which values a variable can take on, how often they occur in a
compressed form. sample, and what proportion they make up of the total number of all the attribute holders
in the sample (Bamberg et al., 2022, p. 23). When setting up a frequency table, the impor-
tant subdivision variables based on their scale levels become noticeable. For this reason,
we will discuss the frequency table for each individual scale level.

Nominal scale variables

It is important to begin here, as when discussing the other scale levels, by clarifying our
notation. First of all, a proficiency index number

j = 1, …, k

is applied to each individual value. In general, a variable has k different values. The indi-
vidual values themselves are identified as

aj with j = 1, …, k .

In the original list, if we count the number of variable carriers that assume the individual
Absolute frequencies values, the absolute frequencies are as follows:
The absolute frequencies
count the occurrence of
the individual variable nj with j = 1, …, k
values.

If we relate the individual absolute frequencies to the total number of variable carriers in
Relative frequencies the sample, we obtain the relative frequencies for the respective variable values:
The relative frequencies
reflect the proportions of
the individual variable f j with j = 1, …, k
values.

Note that the relative frequencies can only take values from 0 to 1. The sum of all relative
frequencies f1 + f2 + … + fk must always be 1. If the individual relative frequencies are
multiplied by 100, the corresponding percentages are obtained. This information is sum-
marized in the following frequency table. For nominal scale variables, this table basically
consists of four columns:

Table 2: General Structure of a Frequency Table for Nominal Scale Variables

j aj nj fj

Index Characteristics Absolute frequency Relative frequency

1 a1 n1 f1

24
2 a2 n2 f2

⋮ ⋮ ⋮ ⋮

k ak nk fk

Σ n 1

Source: Heike Bornewasser-Hermes (2023).

Each row of the frequency table summarizes the most important information about a vari-
able.

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

For the nominal scale variable “gender” from the above example, we can calcu-
late the following frequency table based on the original list (f = female and m =
male):

f; f; f; f; m; f; f; f; m; m; f; m; f; f; f; f; f; f; m; f; f; f; f; f; f

Table 3: Frequency Table for Gender

j aj nj fj

1 male 5 0.2

2 female 20 0.8

Σ 25 1

Source: Heike Bornewasser-Hermes (2023).

Counting from the original list, we found out that five male patients were inter-
viewed. Dividing the five male patients by the total of all 25 patients results in a
relative frequency of 0.2 or 20%. Consequently, 20 female patients remain with a
proportion of 0.8 or 80%. The total of all respondents (here, 25) is always the
sum under the column of absolute frequencies. The sum of all relative frequen-
cies in the amount of 1 or 100% is noted below the column of relative frequen-
cies.

25
Ordinal scale variables

For ordinal scale variables, just like cardinal scale variables, the frequency table is exten-
Cumulative frequencies ded by a fifth column. In this column, the cumulative frequencies are entered, which add
The cumulative frequen- up all relative frequencies from the first to the mth relative frequency for any variable
cies sum up the relative
frequencies. j = m:

m
F m = f1 + f2 + … + fm = ∑ f j
j=1

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

For example, we might want to know the proportion of patients who rated the
nursing robot “good” or better. To do this, we look at the proportions of those
who rated the nursing robot as “very good” or “good.” If these two proportions
are summed up in the form of the corresponding relative frequencies, a cumula-
tive frequency is obtained. This type of calculation is only possible if a meaning-
ful sequence can be introduced into the variable values. The general structure of
a frequency table for an ordinal scale variable is shown in the table below.

Table 4: General Structure of a Frequency Table for Ordinal Scale

Variables

j aj nj fj Fj

Character- Abs. fre- Rel. fre- Cum. fre-

Index istics quency quency quency

1 a1 n1 f1 F1

2 a2 n2 f2 F2

⋮ ⋮ ⋮ ⋮ ⋮

k ak nk fk Fk

Σ n 1

Source: Heike Bornewasser-Hermes (2023).

26
For the ordinal scale variable “satisfaction with a nursing robot,” we can make
the following list based on the original list (vg = very good, g = good, s = satisfac-
tory, and sf = sufficient):

g; g; g; g; s; s; s; s; g; s; s; g; g; g; sf; g; g; vg; sf; g; s; g; s; s

So, we obtain the following frequency table:

Table 5: Frequency Table for Satisfaction (1)

j aj nj fj Fj

1 very good 1 1/24 1/24

2 good 12 12/24 13/24

satisfac-
3 tory 9 9/24 22/24

4 sufficient 2 2/24 1

Σ 24 1

Source: Heike Bornewasser-Hermes (2023).

We can see that with 12 patients (half of the respondents) had a good impres-
12
sion of the nursing robots. This represents a proportion of and 50%, respec-
24
tively. Let’s now look at the cumulative frequencies: In the first row of the table,
we find the number of patients who rated the nursing robots “very good,” i.e.,
1
only one patient. This makes a relative and cumulative proportion of and
24
4.2%, respectively. In the cumulative frequency of the second row, the patients
1 12
who found the robots “very good” or “good” are accounted for. Thus,
24 24
13
the sum comes up. Approximately 54.2% of the interviewed patients rated
24
the nursing robots at least “good.”

According to this principle, the calculation of the cumulated frequencies contin-

9
ues. In the third row, the relative frequency of the third row is added to the
24
13 22
cumulative frequency of the second row to arrive at . Finally, in the last
24 24

27
row, we always arrive at 1 or 100%. It should be noted that the frequency table
could have started with the worst category (“sufficient”). In most cases, how-
ever, one starts with the best expression.

Discrete cardinal variables

The frequency table for discrete cardinal variables takes the same form as the table for
ordinal scale variables. Here, too, cumulative frequencies can be calculated, since the
nature of the variables in the form of numbers allows sorting from the smallest to the larg-
est number. This is exactly the order that is always chosen for cardinal scale variables: One
starts with the smallest expression and ends with the largest one. The shape of the fre-
quency table repeats that for ordinal scale variables.

Table 6: General Structure of a Frequency Table for Discrete Cardinal Variables

j aj nj fj Fj

Index Values Abs. frequency Rel. frequency Cum. frequency

1 a1 n1 f1 F1

2 a2 n2 f2 F2

⋮ ⋮ ⋮ ⋮ ⋮

k ak nk fk Fk

Σ n 1

Source: Heike Bornewasser-Hermes (2023).

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

In our example, the patients were asked how many times they had contact with
similar nursing robots before their current hospitalization. The answers of the 25
respondents were as follows:

1; 5; 0; 0; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 0; 3; 1; 1; 0; 3; 0; 1; 1; 2; 1

This primal list results in the following frequency table.

Table 7: Frequency Table for Previous Contact (1)

j aj nj fj Fj

28
1 0 5 0.2 0.2

2 1 14 0.56 0.76

3 2 3 0.12 0.88

4 3 2 0.08 0.96

5 5 1 0.04 1

Σ 25 1

Source: Heike Bornewasser-Hermes (2023).

It can be seen here that the answer “4” previous contacts with such nursing
robots does not exist among the answers. For this reason, the frequency table
goes directly from “3” to “5.” Aside from this, the frequency table can be under-
stood in the same way as the previous one. For example, let’s look at the second
row of the frequency table: 14 patients had contact with similar nursing robots
14
once before their current hospital stay. This accounts for 0.56 or 56%,
25
whereas 76% (20% + 56%) had such contact at most once.

Continuous cardinal variables

The frequency table for continuous cardinal variables differs from that for discrete varia-
bles in the second column. Where individual values were previously listed in each row,
there are now classes of values. This means that individual values are summarized in each
class. The first column no longer numbers the individual values but rather the individual
classes from 1 to k. Each class is numbered by a subclass and a class number. Each class is
characterized by a lower and an upper limit.

Let the lower limit of a class be characterized by xj*− 1 such that * denotes that it is a class
boundary and j − 1 denotes the lower boundary. The upper bound of a class j is denoted
by xj*. If there are k classes in general, then the upper and lower bounds of these classes
are as follows:

j = 1: x0*; x1*
j = 2: x1*; x2*
…
j = k: xk* − 1; xk*

29
For a concrete variable, these generally formulated class limits are replaced by concrete
numbers. We see that, for example, the upper limit of the first class x1* is equal to the
lower limit of the second class x1*. If, for age, x1* = 20 years, then the first class ends at 20
and the second starts at 20.

However, the question that in which class a 20-year-old person would be classified
remains open. This problem is solved by placing different brackets at the class bounda-
ries. At the upper limit of a class, a square bracket is placed: . This indicates that this
upper limit belongs to the corresponding class. The lower boundary of a class, except for
the first lower boundary, is provided with a round bracket: . This means that the class
starts at the next larger number than the lower limit itself. Referring to the example above,
a 20-year-old person would be sorted into the first class. A theoretical 200,001-year-old
person would then be assigned to the second class. Overall, the structure of the frequency
table is then as follows:

Table 8: General Structure of a Frequency Table for Continuous Cardinal Variables

j xj* − 1, xj* nj fj Fj

Index Classes Abs. frequency Rel. frequency Cum. frequency

1 x0*, x1* n1 f1 F1

2 x1*, x2* n2 f2 F2

⋮ ⋮ ⋮ ⋮ ⋮

k xk* − 1, xk* nk fk Fk

Σ n 1

Source: Heike Bornewasser-Hermes (2023).

Each row of a frequency table represents a class of the summarized variable values. This
means that the absolute frequencies now summarize the number of people who are to be
classified in this interval. Accordingly, the relative frequency indicates the proportion of
those who belong to this interval. The cumulative frequency summarizes the proportion
of people who take the value of the upper limit in the maximum case.

Finally, the way the class boundaries should be chosen must be clarified. In principle,
class division should take the form that the proportion of persons in each class increases
linearly with increasing age. This may result in all classes having the same width. However,
the classes can also have different widths. Especially the latter variant often makes sense
if there are only a few very small and/or large values. Then, in the smaller and/or larger
value range, larger classes are often formed so that we can summarize only few observa-
tions to a class. If one works with statistics programs, this optimally divides the values into
classes.

30
RUNNING EXAMPLE: SURVEY ON CARE ROBOTS
As an example, we will now work with the following class division of age:
15; 30 ; 30; 45 ; 45; 50 ; and 50; 70 . We see that the classes have differ-
ent widths. With the first two classes, an age range of 15 years is considered. The
third class includes a range of only 5 years, and the last class a wider range of 20
years. Consider the original list:

16; 50; 35; 47; 15; 20; 47; 48; 44; 55; 56; 35; 48; 52; 49; 68; 17; 26; 39

This results in the following frequency table.

Table 9: Frequency Table for Age (1)

j xj*− 1, xj* nj fj Fj

1 [15; 30] 5 5/19 5/19

2 (30; 45] 4 4/19 9/19

3 (45; 50] 6 6/19 15/19

4 (50; 70] 4 4/19 1

Σ 19 1

Source: Heike Bornewasser-Hermes (2023).

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

To explain the table, let’s discuss the second class. To be in this class, someone
must be (1) older than 30 years and (2) a maximum of 45 years of age. So, the
relevant patients have the following ages: 35, 44, 35, and 39 years. This gives
us the absolute frequency of 4. The fact that we have age data for a total of 19
4
patients leads us to a relative frequency of or 0.211. Finally, the sum of the
19
5
first cumulative frequency and the relative frequency just mentioned gives
19
9
us a cumulative frequency of . This means that 47.3% of the respondents are
19
45 years old or less.

31
Graphical Representation

As an alternative or in addition to frequency tables, variables and their frequency distribu-

tions can also be displayed using graphical representations. As a rule, only relative fre-
quencies and – if possible – cumulative frequencies are presented graphically. The presen-
tation of absolute frequencies is generally omitted, since the ratios can be better read
from the relative frequencies. For reasons of practical applicability, we will explain the
representation of relative frequencies, and even if there are graphical representation
options for cumulative frequencies, we will refrain from using them. As with the frequency
tables, there are also differences in the graphical representation depending on the scale
level (Bamberg et al., 2022, p. 25).

Nominal scale variables

For nominal scale variables and their relative frequencies, there are three possibilities for
graphical representation:

• pie chart
• bar chart
• Pareto chart

Pie chart In a pie chart, the individual characteristic values are given a specific area in the circle
A pie chart shows the fre- according to their share in the sample. Therefore, it is important to determine which area
quency distribution of a
variable. the individual variable values occupy in the circle while drawing the circle diagram. Since
a circle has a total angle of 360°, the individual angles αj (alpha) for the variable a j are
determined by

αj = f j · 360° for j = 1, …, k

The relative frequency is, therefore, multiplied by 360° to obtain the corresponding angle.

In this case, the following angles are obtained with the help of the frequency table:

α1 = 0.2 · 360° = 72° α2 = 0.8 · 360° = 288°

This gives the following pie chart.

32
Figure 2: Pie Chart for Gender

Source: Heike Bornewasser-Hermes (2023).

We can see that most of the 25 patients are female, since this variable expression occupies
the largest area in the circle.

In a bar chart, which is also known as a column chart, the variable values are plotted on Bar chart
the x-axis and the relative frequencies on the y-axis. Finally, a bar is drawn over each vari- A bar chart displays the
frequency distribution of
able in the amount of the relative frequency. a variable in the form of
bars or rods.
Using the data from our example, we obtain the following bar chart.

Figure 3: Bar Chart for Gender

Source: Heike Bornewasser-Hermes (2023).

Of course, the same result can be read from the bar chart as from the pie chart.

33
Pareto chart A Pareto chartis a special form of the bar chart. It simply arranges the bars in the chart
A Pareto chart orders the according to the height of their relative frequencies. This arrangement can be ascending
characteristic values
according to the size of or descending. For the present variable of gender, the Pareto chart is not particularly use-
their occurrence. ful, as there are only two expressions.

Ordinal scale variables

Ordinal scale variables can use the following graphical options:

• pie charts
• bar charts

A Pareto chart is not used for this scale level, since the meaningful order in the variable
values should not be changed. To create the pie chart, the angles in the circle for the indi-
vidual variable values must first be determined. This is done in the same way as in the
nominal scaled example above.

This results in the following pie chart α1 = 15°, α2 = 180°, α3 = 135°, α4 = 30° .

Figure 4: Pie Chart for Satisfaction

Source: Heike Bornewasser-Hermes (2023).

As we can see, most of the inpatients rated their satisfaction with the nursing robots
“good” or “satisfactory,” as these answers occupy the largest areas in the chart.

The construction of the bar chart is also identical to that of nominal scale variables. Con-
sequently, we obtain the following diagram.

34
Figure 5: Bar Chart for Satisfaction

Source: Heike Bornewasser-Hermes (2023).

This bar chart also gives us the same insights, given that the two highest bars are for
“good” and “satisfactory,” respectively.

Discrete cardinal variables

For the graphical representation of a discrete cardinal variable, only the bar chart is used.
This is done as follows for the variable “previous contact.” As a rule, a pie chart is not used
if the values are numbers.

35
Figure 6: Bar Chart for Previous Contact

Source: Heike Bornewasser-Hermes (2023).

Most of the patients have had contact with similar nursing robots once before, followed by
those who have had no such contact at all. Very few patients have had contact with similar
nursing robots twice or even more.

Continuous cardinal variables

Histogram A histogram chart is used only for continuous features. As we will see for continuous car-
A histogram is drawn only dinal variables, the histogram uses a completely different diagram than those we have
for continuous features.
seen for the previous scale levels (Fahrmeir et al., 2016, p. 38). The reason is due to the
class formation. The histogram ensures that the individual classes can be compared with
each other because these are often (as in the present example) of different widths.

A histogram, which is also called an empirical density function, is denoted by f x . To

draw a histogram, the class boundaries are plotted on the x-axis. On the y-axis, we do not
Densities consider the relative frequencies. Rather, we consider the densities (Fahrmeir et al., 2016,
The densities put the rela- p. 81). These must first be calculated by
tive frequencies in rela-
tion to the width of a
class. fj
Δj
for class j

where Δj (delta) is the class width. The histogram is finally created by doing this calcula-
tion of the densities for all classes:

36
fj
f x = Δj
for all j = 1, …, k

To draw a histogram, the first step is to calculate the densities are calculated. Next, a rec-
tangle is drawn over each class at the height of the calculated densities. The area of each
rectangle is characterized by the fact that it reflects the relative frequency f j of the corre-
sponding class. Thus, the total area under the histogram must be 1.

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

Let’s draw a histogram for our example of age. To do this, we first calculate the
densities, which we simply append as a column to the frequency table.

Table 10: Frequency Table for Age Including Densities

j xj* − 1; xj* nj fj Fj f x

5/19
1 [15; 30] 5 5/19 5/19 = 0.018
15

4/19
2 (30; 45] 4 4/19 9/19 = 0.014
15

6/19
3 (45; 50] 6 6/19 15/19 = 0.063
5

4/19
4 (50; 70] 4 4/19 1 = 0.011
20

19 1

Source: Heike Bornewasser-Hermes (2023).

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

5
In the first class, for example, the relative frequency of was divided by the
19
class width of 15 (distance from 15 to 30 or 30 − 15 = 15). This results in a den-
sity of 0.018. After calculating this density for the remaining three classes, the
following histogram is generated.

37
Figure 7: Histogram for Age

Source: Heike Bornewasser-Hermes (2023).

As we can see, most of the 19 patients are older than 45 and at most 50 years
old. Even though the absolute frequency is highest in this class with 6, such a
clear difference to the other classes only becomes apparent here. Due to the
small width of the considered class in relation to the other classes, a relatively
high density results. Now, let’s consider this class and clarify once again why the
area of this rectangle is equal to the relative frequency. The area of a rectangle is
obtained by multiplying the width and height. In this case, the width is 5 and the
6
height is 0.063 according to the density. With 5 · 0.063, we get 0.315 or .
19

2.2 Measures of Central Tendency

Individual variables can be described by different measures. A distinction is made
between measures of position and measures of dispersion. We will discuss the latter in the
following section. In most cases, the location parameters describe the center of a data set.
An important parameter is mean value. If, for example, we want to consider the age of the
surveyed patients, we should determine the average age. Other important positional
parameters are the quantiles and the median. The mode is also considered a parameter
for the location (Bortz & Schuster, 2010, p. 25). These three measures cannot be deter-
mined for each of the scale levels discussed. The following table provides an overview.

38
Table 11: Location Parameters and Their Possible Applications

Mode Quantile Mean

Nominal ✓ - -

Ordinal ✓ (✓) -

Cardinal (discrete/continuous) ✓ ✓ ✓

Source: Heike Bornewasser-Hermes (2023).

While the mode can be determined for any scale level, quantiles and the mean place cer-
tain requirements on the scale level. In the following section, we will present the individ-
ual measures. It will be shown what the individual measures mean and how they can be
determined and interpreted.

Mode

The mode, which is also known as the modal value, is the characteristic expression of a Mode
variable that occurs most frequently in the sample (Bamberg et al., 2022, p. 16). The math- The mode represents the
most frequent variable
ematical abbreviation of the mode is value.

xmod .

A variable can have one mode or several modes. If there is only one mode, it is called a
unimodal distribution. A distribution with two modes is called bimodal, and one with
more than two modes is called multimodal. We will discuss the determination of the
mode or modes for the different scale levels.

Nominal scale variables

The mode can be determined using the frequency table. To do this, look at the row with
the highest absolute or relative frequency and read off the corresponding value. Alterna-
tively, you can look at the pie chart or the bar chart to determine the mode. The expres-
sion with the largest area in the circle or the largest bar represents the mode. Recall the
frequency table, pie chart, and bar chart for the gender of our 25 patients. The expression
“female” is the most frequently represented one among the patients. Consequently,
xmod = female.

Ordinal scale variables

For the ordinal scale variable of satisfaction with the nursing robots, let’s look back at the
table and two graphs. The table and the graphs all show that the attribute “good” was the
most frequently one chosen by the patients. Accordingly, xmod = good.

39
Discrete cardinal variables

The analysis of the number of “previous contact” with similar nursing robots showed that
a single previous contact was mentioned most frequently by 14 patients, giving a share of
56%. Therefore, xmod = 1 is valid.

Continuous cardinal variables

Only for continuous variables must we determine the mode in a slightly different way.
Because of the different widths of the classes, we cannot simply fall back on the absolute
or relative frequencies; we should determine the mode on the basis of the densities. The
class that has the greatest density is called the mode. The frequency table for age includ-
ing densities shows that the density is greatest in the third class at 0.063. Consequently,
the mode is xmod = 45; 50 . The mode is, therefore, a whole class in this case.

With the help of the histogram, we would also obtain this result because the rectangle in
the class 45 to 50 is the highest compared to the other classes. In some literature, the
mode for a continuous variable is also given in the form of a single number. In that case,
the middle of the class is given as the mode where the density is highest. The middle
45 + 50
between 45 and 50 would be 47.5 2
.

Quantiles

Quantile Another measure of the position of a variable is a quantile. A quantile is a variable value
A quantile is determined that is not exceeded by a certain proportion of objects (Fahrmeir et al., 2016, p. 60). This
by an expression that is
not exceeded by of the proportion can be chosen arbitrarily. Mathematically, one generally notates a quantile by
trait carriers.
xp .

It indicates the variable expression that is not exceeded by percent of the variable carriers.
The remaining 1 − p percent is, therefore, the quantile.

There are three quantiles that are considered particularly important in statistics. One is
Median the median x0.5. This lies exactly in the middle of the ordered data set. 50% of the variable
The median is the most
values are at most as large as x0.5. Accordingly, 50% of the variable values are at least as
important quantile and
forms the center of the large as x0.5. Two other important quantiles are the quartile x0.25 (i.e., the lower quartile:
ordered data set. value not exceeded by 25% of the objects) and the quartile x0.75 (i.e., the upper quartile:
Quartiles value not exceeded by 75 % of the objects). Together with the median, these divide the
Quartiles, together with
the median, provide four sorted data set into four equal sections.
equally sized ranges in
the ordered data set.
In principle, quantiles cannot be determined for nominal scale variables because their cal-
culation requires sorting of the observations. As we will see later, calculating them for
ordinal scale variables fails in some places. For cardinal scale variables, whether discrete
or continuous, quantiles can be computed in any case. We can determine the quantiles
either based on a primal list or a frequency table. Let’s start with the determination from a
primal list.

40
The determination of quantiles from a primal list is done in the same way for all possible
scale levels. We start from the primal list

x1, x2, …, xn,

which contains, in an unsorted order, all expressions of the n variable carriers of a sample.
This list must be sorted in the first step from small to large. Thus, the following ordered
data set is created:

x 1 , x 2 , …, x n

x 1 stands for the smallest and x n for the largest observation. The individual quantiles
can now be determined according to a general procedure and independent of p. The aim
is to find the position in the ordered data set at which the sought-after quantile is located:

x np , if np is not an integer
xp = x np + x np + 1
2
, if np is an integer

In the first step, depending on the quantile xp to be calculated, n · p is calculated with the
help of the sample size. Let’s assume that we are looking for the quantile x0.4 and n = 11.
This would mean that n · p = 11 · 0.4 = 4.4. This is not an integer. Accordingly, in the first
line of the above formula, x 4.4 shows that after the 4.4, the next integer is used as the
place for the quantile we are looking for. Thus, the quantile x0.4 is at the fifth digit in an
ordered data set with 11 observations.

If we started from the calculation x0.5 at n = 10, then n · p = 10 · 0.5 = 5. Since 5 is an

integer, we go to the second line of the above formula. Now the quantile we are looking for
x 5 +x 6
is the average of the fifth and sixth digits of the ordered data set 2
.

If we have a frequency table, we must distinguish which scale level is available for the
determination of quantiles. So, let’s look at our individual variables according to the scale
of measurement and determine, with the median as well as the two quartiles, the most
important quantiles both from a primal list and the corresponding frequency table. As
already mentioned, a determination for our nominal scale variable is not possible.

Ordinal scale variables

For the variable “satisfaction with nursing robots,” we have the following unsorted data
set for n = 24 patients:

g; g; g; g; s; s; s; s; g; s; s; g; g; g; sf; g; g; vg; sf; g; s; g; s; s

We first put this into an ordered form, starting with the best score and ending with the
worst one:

41
vg; g; g; g; g; g; g; g; g; g; g; g; s; s; s; s; s; s; s; s; s; sf; sf

To determine the median for “satisfaction with nursing robots,” we multiply the number of
patients of 24 by 0.5, which gives us 12. Since 12 is an integer, we take the 12th and the
following 13th digit (marked in bold in the sorted data set) to determine the median as the
average of these two:

24 · 0.5=12 integer
x 12 + x 13 good + good
x0.5= 2
= 2
= good

Here, it has already become recognizable why determination of quantiles with ordinal
scale variables is quite questionable. Namely, this is because no average can be formed
from two words. However, it seems possible at this point, since both digits contain the
expression “good.” Overall, the result means that 50% of the patients rated their satisfac-
tion with the nursing robots as at most “good.” The remaining 50% rated it as at least
“good.”

According to this procedure, both the lower and the upper quartile can be determined:

24 · 0.25=6 integer
x 6 +x 7 good + good
x0.25= 2
= 2
= good
24 · 0.75=18 integer
x 18 + x 19 satisfactory + satisfactory
x0.75= 2
= 2
= satisfactory

25% of the respondents rated their satisfaction with the nursing robots as at most “good,”
while 75% rated it as at most “satisfactory.”

If we had a frequency table available, the determination of quantiles would be much eas-
ier. For any quantile xp, we have to check in the column of cumulative frequencies when
the cumulative frequency of p is exceeded for the first time.

Table 12: Frequency Table for Satisfaction (2)

j aj nj fj Fj

1 very good 1 1/24 1/24

2 good 12 12/24 13/24

3 satisfactory 9 9/24 22/24

4 sufficient 2 2/24 1

Σ 24 1

Source: Heike Bornewasser-Hermes (2023).

42
For the median, for example, we discuss the cumulative frequencies. In the first row, the
1
cumulative frequency of 24 is still less than 0.5. In the second row, it is greater than 0.5 for
13
the first time with 24 = 0.542, whereupon the expression in the second row equals the
median. The procedure for the two quartiles is analogous:

13 13
x0.5=good, since 24
> 0.5; x0.25 = good, since 24
> 0.25
22
x0.75=satisfactory, since 24
> 0.75

The interpretation is, of course, identical to that of the original list. According to the same
principle, it should also be mentioned that all other quantiles can be determined both
from the original list and the frequency table.

Discrete cardinal variables

All 25 patients provided information on how many times they have had contact with simi-
lar nursing robots beforehand. The following original list

1; 5; 0; 0; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 0; 3; 1; 1; 0; 3; 0; 1; 1; 2; 1

is first sorted from small to large for the determination of the three most important quan-
tiles:

0; 0; 0; 0; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2; 2; 2; 3; 3; 5

These can now be determined in the same way as for ordinal scale variables. What is dif-
ferent here is that, for example, in the case of the median, multiplying 0.5 by 25 results in
the number of 12.5, which is not a whole number. We, therefore, take the next largest
number (13) and read off the median at the 13th digit:

25 · 0.5=12.5 no integer
x0.5 = x 12.5 =x 13 = 1

For the two quartiles, the calculation of the correct digit is done in the same way:

25 · 0.25=6.25 no integer
x0.25 = x 6.25 =x 7 = 1
25 · 0.75=18.75 no integer
x0.75 = x 18.75 =x 19 = 1

What is striking here is that all three quantiles take on the same value. This always hap-
pens when the value occurs particularly frequently in the data set.

For the determination from the frequency table, we proceed in the same way as for ordi-
nal scale variables.

43
Table 13: Frequency Table for the Previous Contact (2)

j aj nj fj Fj

1 0 5 0.2 0.2

2 1 14 0.56 0.76

3 2 3 0.12 0.88

4 3 2 0.08 0.96

5 5 1 0.04 1

Σ 25 1

Source: Heike Bornewasser-Hermes (2023).

This gives us the following results, which are identical to those from the original list:

x0.5=1, since 0.76 > 0,5; x0.25 = 1, since 0.76 > 0.25;
x0.75=1, since 0.76 > 0.75

Continuous cardinal variable

We have information on the age of only 19 patients in total:

16; 50; 35; 47; 15; 20; 47; 48; 44; 55; 56; 35; 48; 52; 49; 68; 17; 26; 39

If this is sorted in ascending order, we get the following ordered list:

15; 16; 17; 20; 26; 35; 35; 39; 44; 47; 47; 48; 48; 49; 50; 52; 55; 56; 68

Based on this sorted data set, the determination of the quantiles is done as already
known:

19 · 0.5=9.5 no integer
x0.5=x 9.5 = x 10 = 47
19 · 0.25=4.75 no integer
x0.25 = x 4.75 =x 5 = 26
19 · 0.75=14.25 no integer
x0.75 = x 14.25 =x 15 = 50

Therefore, 25% of the patients are 26 years old or less, 50% are younger or older than
47 years, and 75% are not older than 50.

The procedure for determining quantiles for continuous variables will be completely dif-
ferent if only one frequency table is available. If we want to determine an arbitrary quan-
tile xp, we first look in which row the cumulative frequency is greater than p for the first

44
time. If the row is found, we know in which class the quantile we are looking for must be
located. Then, the lower limit xj*− 1 of the classes found, the relative frequency f j of these
same classes, the width of the class Δj, the cumulative frequency of the previous classes
F xj*− 1 , and the cumulative proportion p given by the quantile are used to calculate the
quantile (Fahrmeir et al., 2016, p. 55):

p − F xj* − 1
xp = xj*− 1 + fj
· Δj

With this in mind, let’s look at the determination based on the following frequency table.

Table 14: Frequency Table for Age (2)

j xj* − 1; xj* nj fj Fj

1 [15; 30] 5 5/19 5/19

2 (30; 45] 4 4/19 9/19

3 (45; 50] 6 6/19 15/19

4 (50; 70] 4 4/19 1

Σ 19 1

Source: Heike Bornewasser-Hermes (2023).

Now, let’s take a closer look at the median x0,5: Only in the third row the cumulative fre-
quency is greater than 0.5 for the first time; so, we already know that our result must lie
somewhere between 45 and 50. For the calculation, we start with 45 as the lower limit.
Next, we add a certain fraction. The numerator of the fraction starts with the proportion
given by the median in the amount of 0.5. Next, subtract the cumulative frequency of the
9
previous class in the amount of 19 . The denominator is the relative frequency of the rele-
6
vant class, which is 19 here. We multiply this fraction by the width of the class in the
amount of 5 (50 − 45 = 5):

9
0.5 −
19
x0.5 = 45 + 6
· 5 = 45.41
19

This gives us a median of 45.41. We proceed in the same way to determine the two quar-
5
tiles. It is worth mentioning that the lower quartile falls directly into the first class, since 19
is greater than 0.25. So, if the cumulative frequency of the previous class in the numerator
of the fraction has to be subtracted, the result will always be 0 (i.e., there is simply no one
younger than 15 years old). We get the following results for the two quartiles:

45
0.25 − 0
x0.25 = 15 + 5
· 15 = 29.25
19
9
0.75 −
19
x0.75 = 45 + 6
· 5 = 49.375
19

The interpretation of the content is the same as in the original list. However, the results
are not identical with those from the original list. This is common in the context of contin-
uous features. By forming classes, the original observations are no longer available. We
only know how many persons are in each class. The exact age of each person is no longer
considered. We only get the exact results from the original list. If this is available, it is
always advisable to use it for the calculation of individual measures.

Mean Value

The classic and probably the best-known measure for describing a characteristic is the
Mean value mean value, which is also called average value or arithmetic mean (note that − x is read as
The mean value summari- “x across”). The mean value indicates which variable value is assumed on average by the
zes all data of a variable
to one value. objects (Fahrmeir et al., 2016, p. 50). The determination of −
x is possible for the cardinal
scale without exception. It can again be calculated from both a primal list and a frequency
table. We go through both variants for discrete and continuous variables.

Discrete cardinal variables

Let’s start by determining the average previous contact with nursing robots based on the
primal list. To do this, we use the following formula:

n
−
x =
1
∑ xi
n
i=1

Only the variable values of all n objects are summed up and divided by n. For our
25 patients with the data

1; 5; 0; 0; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 0; 3; 1; 1; 0; 3; 0; 1; 1; 2; 1,

this will result in the following mean value:

−
x =
1
1 + 5 + … + 2 + 1 = 1.24
25

Therefore, 1.24 patients have been in contact with similar nursing robots before on aver-
age. If we do not have the original list but the frequency table instead, the mean value can
be calculated using either the absolute or relative frequencies. Both variants must lead to
the same result. Of course, it is sufficient to calculate only one:

46
k k
−
x =
1
∑ ajj · nj = ∑ aj · f j
n
j=1 j=1

If the individual values are multiplied by the number of their occurrence nj, then the sum
must be divided by n. If the proficiencies are weighted directly with their proportions f j,
then the division by the sample size is omitted. So, let’s look at this for our example.

Table 15: Frequency Table for Previous Contact (2)

j aj nj fj Fj

1 0 5 0.2 0.2

2 1 14 0.56 0.76

3 2 3 0.12 0.88

4 3 2 0.08 0.96

5 5 1 0.04 1

Σ 25 1

Source: Heike Bornewasser-Hermes (2023).

The average previous contact with nursing robots also leads to a result of 1.24 based on
the frequency table:

−
x =
1
0 · 5 + 1 · 14 + … + 5 · 1
25

= 0 · 0.2 + 1 · 0.56 + … + 5 · 0.04

= 1.24

Continuous cardinal variables

For the 19 ages in the original list of

16; 50; 35; 47; 15; 20; 47; 48; 44; 55; 56; 35; 48; 52; 49; 68; 17; 26; 39

the average age can be calculated in the same way as for the discrete variable:

−
x =
1
16 + 50 + … + 26 + 39 = 40.368
19

Therefore, the patients surveyed are 40.368 years old on average. If we have a frequency
table, we can work again with either the absolute or relative frequencies. What is impor-
tant and different, however, is that these are now multiplied by the respective class mean.

47
k k
xj* − 1 + xj*
−
x =
1
∑ mj · nj = ∑ mj · f j with mj =
n 2
j=1 j=1

If we look at the frequency table, we can determine, for example, the middle of the first
15 + 30
class from 15 to 30 by taking the average of these two limits 2
= 22.5 .

Table 16: Frequency Table for Age (3)

j xj* − 1, xj* nj fj Fj

1 [15; 30] 5 5/19 5/19

2 (30; 45] 4 4/19 9/19

3 (45; 50] 6 6/19 15/19

4 (50; 70] 4 4/19 1

Σ 19 1

Source: Heike Bornewasser-Hermes (2023).

With this knowledge, we arrive at the following average age based on the table:

−
x =
1
22.5 · 5 + 37.5 · 4 + 47.5 · 6 + 60 · 4
19
5 4
= 22.5 · 19
+ … + 60 · 19
= 41.447

As with the quantiles, the average value just calculated differs from that from the original
list. We already know the reason for this. Besides, it is recommended to use the original
list for the calculation of the average value if this is available.

Finally, let’s look at a special property of the mean value. We will now consider the original
list of ages again. We have calculated an average age of − x = 40.368 for these 19 individu-
als. Imagine we want to study this group of patients again two years later. All 19 individu-
als have, thus, become two years older. We can note that xi + 2 holds for all i = 1, …, n.
What does this do to the mean? If everyone has become two years older, the original list is
updated as follows:

18; 52; 37; 49; 17; 22; 49; 50; 46; 57; 58; 37; 50; 54; 51; 70; 19; 28; 41

Hence, the mean value is

−
x =
1
18 + 52 + … + 28 + 41 = 42.368 .
19

The average age has, therefore, increased by two years compared with the previous aver-
age age. This is not a coincidence but the rule: If all values of a sample are increased or
decreased by a certain value, the mean value also increases or decreases by this value.

48
Suppose that all ages were doubled (xi · 2 for all i = 1, …, n). So, the first person is
32 years old rather than 16. The average value would double, too. Overall, if the original
data are transformed linearly (i.e., something is added or subtracted in the value a and/or
multiplied by something in the amount b) such that new data yi are created, then the aver-
age value of the new data changes by the exact amount of the addition, subtraction, or
multiplication:

yi = a + b · xi −
y =a+b·−
x

Referring to the examples above, would represent the fact that all participants are now
two years older. In contrast, b would stand for the doubling of the age and, thus, the dou-
bling of the mean value.

Comparison of Mean Value and Median

We will now take a closer look at the mean and the median. Often a question arises about
which of them is better suited to describe the location of a variable. This critically depends
on the data set.

Imagine we are looking at the ages of 20 people. Suppose that 19 of them are between 30
and 35 years old and only one person is 63. The median does not consider the 63 year old
person. It focuses on the middle of the data set, i.e., somewhere between 30 and 35. Thus,
it is generally considered robust. The mean, conversely, takes all ages into account when
summing them up. Thus, the 63 year old person will cause the mean to be pulled up and
to be even higher than 35. Therefore, we would have an average age that is higher than
that of 19 respondents (out of a total of 20). Accordingly, the mean is very sensitive to
outliers. Outlier
A outlier is a measure-
ment value that does not
We can state that a mean is always well suited when the data set is not affected by fit into an expected series
extreme outliers. The median is not affected by such situations. Even if each of the two of measurements or gen-
measures is not always well suited, the comparison of these two helps enormously to erally does not meet the
expectations.
assess the distribution of a variable from a statistical point of view. Thus, we distinguish
between a symmetrical and asymmetrical distributions (Fahrmeir et al., 2016, p. 56):

−
x ≈ x0.5 : symmetrical distribution
−
x > x0.5 : right skewed distribution
−
x < x : left skewed distribution
0.5

A symmetrical distribution is always present when the mean and median are approxi-
mately equal. Graphically, this shows up in a bar chart or histogram in such a way that we
can see an even slope on both sides of the highest bar.

A skewed distribution can occur in two ways. If the mean is greater than the median, we
are dealing with a right skewed distribution. Consequently, the bars decrease towards the
right. A left skewed distribution, on the other hand, is when the mean is smaller than the

49
median. The distribution then decreases to the left in the diagram. Sometimes, a distribu-
tion cannot be sorted directly into one of the three categories, such as a distribution like
the one below on the right with two largest bars being of approximately the same size.

Figure 8: Symmetry and Skewness

Source: Heike Bornewasser-Hermes (2023).

Since mean values can only be calculated for cardinal scale variables, this division into
symmetry and asymmetry also only applies to that scale level.

50
2.3 Measures of Dispersion
In this section, we discuss the measures of dispersion in the context of univariate data
analysis. The aim here is to find out whether the surveyed objects are similar with respect
to a variable or whether they differ from each other. If, in the context of an age survey, all
participants are very similar in age, there will be only a small amount of dispersion. If,
however, they differ very much in age, the dispersion will be correspondingly larger. Meas-
ures of dispersion can only be determined for cardinal scale variables both from an origi-
nal list and a frequency table. We now explain the measures of dispersion range and inter-
quartile range as well as the two most important ones of sample variance and standard
deviation.

Range

The range R is probably the simplest measure to describe the dispersion of a variable Range
(Bamberg et al., 2022, p. 20). We only have to subtract the smallest expression x 1 from The range shows the dis-
tance from the smallest to
the largest expression x n : the largest expression.

R=xn −x1

The disadvantage of this measure is that it might be influenced by extreme observations

and, thus, does not adequately reflect the fact of dispersion.

Discrete cardinal variables

For the number of previous contacts with similar nursing robots, we find the largest obser-
vation on the far right and the smallest on the left in the sorted data set:

0; 0; 0; 0; 0; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2; 2; 2; 3; 3; 5

If we subtract these from each other, we get the range:

R=5−0=5

Those with the most contact had five more contacts than those with the least contact. The
result could also be read from the frequency table (i.e., value in the last line - value in the
first line).

Continuous cardinal variables

It is the same for the age of the respondents. The sorted data set

15; 16; 17; 20; 26; 35; 35; 39; 44; 47; 47; 48; 48; 49; 50; 52; 55; 56; 68

leads us to the calculation of the range in the amount of

R = 68 − 15 = 53 .

51
There is a 53-year difference between the youngest and oldest person. Note that it may
not be advisable here to determine the range from the frequency table, as the smallest
class lower limit and the largest class upper limit do not necessarily have to correspond to
the smallest and largest expressions.

Interquartile Range

Interquartile range The second measure of dispersion with the interquartile range IQR is more suitable in
The interquartile range the case of existing outliers. It focuses on the variation in the central 50% of observations
shows the distance of the
central 50% of objects. (Fahrmeir et al., 2016, p. 61). Sorting the data from small to large, the interquartile range
marks the distance from the lower quartile to the upper quartile:

IQR = x0.75 − x0.25

To calculate the interquartile range, it is, therefore, necessary to calculate the two quar-
tiles. Since we have already discussed this in the previous distance, we only explain the
calculation of the IQRs briefly.

Discrete cardinal variables

Regarding previous contact with nursing robots, we determined a value of 1 for both the
lower and upper quartiles (original list identical to frequency table). The interquartile
range is, therefore, calculated as follows:

IQR = 1 − 1 = 0 .

The central 50% of persons do not differ from each other in any way in their frequency of
previous contact. There is no dispersion among them.

Continuous cardinal variables

Based on the original list, we arrive at an interquartile range:

IQR = 50 − 26 = 24

The central 50% of the patients sorted by age, therefore, differ from each other by a maxi-
mum of 24 years. Using the results of the frequency table, we arrive at a result of
IQR = 49.375 − 29.25 = 20.125. Here, too, the results based on the original list should
be preferred.

Sample Variance and Standard Deviation

Sample variance The sample variance s2 and the standard deviation s are probably the most important
The sample variance is
measures to describe dispersion (Fahrmeir et al., 2016, p. 65). The standard deviation indi-
required to obtain the
standard deviation. cates the average deviation from the mean, and the sample variance is required to obtain

52
the standard deviation. Compared to the other two measures, they involve every single Standard deviation
observation in the calculation. They check how far each individual observation is from the The standard deviation
indicates the average
mean. In the first step, the sample variance is always calculated by deviation from the mean.

n
2 2
∑ xi − −
1 n
s2 = n−1
x = n−1
x2 − x .
i=1

The first variant to the right of the first equal sign shows exactly what has just been descri-
bed. The mean value − x is subtracted from each observation xi. This difference is squared.
After this has been done for all observations and these squared differences have been
summed up, we divide the result by n − 1 (to know the reason why it is divided by n − 1,
please refer to Fahrmeir et al., 2016, p. 65).

An alternative calculation form is offered to the right of the second equal sign. For this var-
iant, only two mean values must be calculated. First, the simple mean, which is finally
2
squared: x . Second, the mean of all squared observations must still be calculated using
n
1
x2 = n
∑ xi2. Each individual observation is squared for this purpose and the mean is
i=1
finally calculated from the squared observations.

The second variant is highly recommended for computation by hand, since it is considered
to be less prone to error. For this reason, we restrict ourselves to exactly this in the follow-
ing explanations. The sample variance cannot be interpreted due to squaring. For this rea-
son, the root of the sample variance is taken to get the standard deviation:

s = s2

This can be interpreted very well in terms of content in conjunction with the mean: The
average of the characteristic is −
x ± s. Let’s now explain the calculation for the two cardi-
nal scale variables from both the original list and the frequency table.

Discrete cardinal variables

Recall the original list of previous contact with nursing robots:

1; 5; 0; 0; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 0; 3; 1; 1; 0; 3; 0;
1; 1; 2; 1

It leads us to the following calculation:

− 1
x = 25 1 + 5 + … + 2 + 1 = 1.24
1
x2= 25 12 + 52 + … + 22 + 12 = 2.76
25
s2= 24 2.76 − 1.242 = 1.273

s= 1.273 = 1.128

53
It is recommended to calculate the sample variance in the order given above. One should
start with the simple mean value 1.24, which enters the formula of sample variance in
squared form with 1.242. The mean of the squared observations in the second row
requires squaring each observation. The resulting 2.76 also enters the formula for s2.
Finally, for s2, we need the sample size to be 25, which flows into the form with 25 in the
numerator and 25 − 1 = 24 in the denominator. The result 1.273 should not be interpre-
ted.

Only by forming the standard deviation by taking the root from 1.273 do we get an inter-
pretable result of 1.128. The average previous contact with similar nursing robots is
1.24 ± 1.128 times. Whether this spread is large or small usually depends on the context
as well as the units measured. If we had a frequency table, we would only have to adjust
the way the mean is calculated in the first two steps:

−
x =0 · 0.2 + 1 · 0.56 + … + 5 · 0.04 = 1.24
x2=02 · 0.2 + 12 · 0.56 + … + 52 · 0.04 = 2.76

Here, the mean values were calculated based on the relative frequencies. Care must be
taken when calculating x2; the values must be squared in each case. Since the two subse-
quent calculation steps are analogous, no further illustration is given.

Continuous cardinal variables

Since the procedure for the age based on the original list is similar to the one above, we
proceed with the following data:

16; 50; 35; 47; 15; 20; 47; 48; 44; 55; 56; 35; 48; 52; 49; 68;
17; 26; 39

We get the following result:

− 1
x = 19 16 + 50 + … + 26 + 39 = 40.368
1
x2= 19 162 + 502 + … + 262 + 392 = 1851
19
s2= 18 1851 − 40.3682 = 233.690

s= 233.690 = 15.287

The mean age of the 19 patients is 40.368 ± 15.287 years. From the frequency table, as
with the rest of the measures, we arrive at a slightly different result.

Table 17: Frequency Table for Age (4)

j xj* − 1, xj* nj fj Fj

1 [15; 30] 5 5/19 5/19

54
2 (30; 45] 4 4/19 9/19

3 (45; 50] 6 6/19 15/19

4 (50; 70] 4 4/19 1

Σ 19 1

Source: Heike Bornewasser-Hermes (2023).

However, the procedure remains identical:

−
x =22.5 ·
5
+ … + 60 ·
4
= 41.447
19 19
5 4
x2=22.52 · 19
+ … + 602 · 19
= 1899.671
19
s2= 18
1899.671 − 41.4472 = 191.918

s= 191.918 = 13.85

Finally, the calculation of the mean value x2 should be discussed again. The class centers
must now be squared in each case. With slightly different mean values we consequently
also come to slightly different results for s2 and s.

We recall the special property of the mean, which changes exactly by an added/subtracted
and/or multiplied value by linear transformation when all the data have been changed in
shape. In the case of the sample variance and standard deviation, this looks somewhat
different. Here, too, we consider the original list of the age again for this purpose:

16; 50; 35; 47; 15; 20; 47; 48; 44; 55; 56; 35; 48; 52; 49; 68;
17; 26; 39

Based on this, we have obtained a sample variance of s2 = 191 . 918 and a standard devia-
tion of s = 13.85. If we now look at all persons again two years later, meaning that xi + 2
applies to all i = 1, …, n, then this does not affect the sample variance and standard
deviation at all. According to the modified original list, all patients have become two years
older, and the numbers have changed fundamentally:

18; 52; 37; 49; 17; 22; 49; 50; 46; 57; 58; 37; 50; 54; 51; 70; 19; 28; 41

However, the distances between the individual people and, thus, the variation/scatter
between them does not change.

In summary, this means that increasing or decreasing all values by a certain amount a
does not lead to any change in the sample variance or standard deviation. The situation is
different if all values of the sample are multiplied by a certain number b. The original pri-
mal list above, if all values were doubled (xi · 2 for all i = 1, …, n), would result in

32; 100; 70; 94; 30; 40; 94; 96; 88; 110; 112; 70; 96; 104; 98; 136; 34; 52; 78.

55
Here, we can see very well that not only all values have changed. However, their distances
to each other and, thus, the dispersion have also taken on a completely different form.
There are now suddenly much larger differences between the individual values. Whereas
in the original list there were 34 years between the first (16 years) and the second
(50 years) person, there are now 68 years (32 and 100 years) between them after the dou-
bling. This leads to the following basic rule: Multiplying all values by b leads to a change in
2
the sample variance by b and in the standard deviation by exactly b. Overall, we hold the
following:

yi = a + b · xi sy2 = b2 · sx2 sy = b · sx

If all original data xi are increased or decreased by a, this has no effect on the sample var-
iance sy2 or standard deviation sy of the new data yi. If, conversely, all original values xi are
2
multiplied by b, then the new sample variance sy2 changes by b compared to the previous
sx2. For the new standard deviation sy, this results only in a change by b.

SUMMARY
Descriptive univariate (i.e., one-dimensional) analysis is a very impor-
tant part of statistical analysis. It is used to describe each variable collec-
ted before going into deeper analysis. We used frequency tables, graphs,
and measures to describe them in more detail. Recall that we covered
popular visualization methods such as pie charts, bar charts, and Pareto
charts using our example of a hospital with nursing robots.

In all the analyses we used, the scale level of the variables considered
was the main focus. This is because depending on the nature of a varia-
ble, certain statistical analyses may or may not be permitted.

56
UNIT 3
ANALYSIS METHODS OF TWO-
DIMENSIONAL DATA

STUDY GOALS

On completion of this unit, you will be able to ...

– quantify the correlation between two nominal scale variables.

– calculate and interpret the rank correlation coefficient and the correlation coefficient.
– calculate and interpret the correlation coefficient.
– select the correct correlation measure.
3. ANALYSIS METHODS OF TWO-
DIMENSIONAL DATA

Introduction
Is there a correlation between gender and income? Is there a dependency between hours
worked and cigarette consumption? These and other questions are what we want to
answer in this unit. When we look at two variables together, such as gender and income or
hours worked and cigarette consumption, and want to uncover a possible relationship, we
use bivariate analysis. This is also referred to as analyzing two-dimensional data. As in uni-
variate analysis, the scale level of the two variables under investigation is crucial in bivari-
ate analysis. Therefore, the following cases should be determined:

• two nominal scale variables

• two ordinal scale variables
• two cardinal scale variables
• two different scale variables

We will first explain the three cases in which the scale levels of the two variables are iden-
tical. Then, we will describe the procedure for two different scale levels.

3.1 Contingency Analysis

In this section, we will discuss the case of two nominal scale variables. Our goal is to find
out whether there is a correlation between the two variables. The two variables to be
investigated are designated A and B. These two variables can take different values. The
values of the variable A are denoted by A1, A2, …, AI i = 1, 2, . . , I and those of the
variable B by B1, B2, …, BJ j = 1, 2, …, J . Thus, the variable A has I different val-
ues, and the variable B has J different values. I and J can – but do not have to – be identi-
cal.

EXAMPLE: SMOKING HABITS AND GENDER

As an example, let’s assume that we asked 48 nurses about their gender (which
was either female or male) and smoking behavior (which was either smoker or
non-smoker). This survey resulted in the following table.

58
Table 18: Initial Data for Two Nominal Scale Variables

Indi Indi Indi Indi

vid- vid- vid- vid-
ual A B ual A B ual A B ual A B

1 f N 2 f N 3 f N 4 f S

5 m S 6 f N 7 f N 8 f N

9 m N 10 m S 11 f S 12 m N

13 f N 14 f N 15 f S 16 f N

17 f N 18 f N 19 m N 20 f S

21 f N 22 f N 23 f N 24 f N

25 f N 26 m S 27 m N 28 m N

29 f N 30 f S 31 f N 32 m S

33 f N 34 f N 35 f N 36 m N

37 m S 38 m S 39 m N 40 f N

41 f S 42 f S 43 f N 44 f N

45 f N 46 m N 47 f N 48 f N

Source: Heike Bornewasser-Hermes (2023).

From the answers, we know that, for instance, the first respondent is a female
non-smoker and the 27th respondent is a male non-smoker. We now want to
find out whether there is a correlation or dependence between gender and
smoking behavior in this sample. Therefore, we need to examine A: gender and
B: smoking behavior (please note that the order is arbitrary). The variable val-
ues of the two features are as follows:

59
• A: female (A1), male (A2)
• B: smoker (B1), non-smoker (B2)

Both A and B have I = 2 and J = 2 variable values. The order in which the val-
ues are arranged (first female and then male or vice versa) is unimportant. The
aim of the bivariate analysis is to consider the two variables together. When they
are considered together, I · J different variable values result in

Ai, Bj with i = 1, 2, …, I; j = 1, 2, …, J .

In the present example, there are the following four combinations of the values:

• The nurse is female and a smoker (f, S).

• The nurse is female and a non-smoker (f, N).
• The nurse is male and a smoker (m, S).
• The nurse is male and a non-smoker (m, N).

The first step of the data analysis is to summarize the collected data in a more compressed
way regarding the question. This is done by summarizing the absolute frequencies nij for
Contingency table the individual expressions Ai, Bj in a contingency table for absolute frequencies. The
The contingency table for
structure of this contingency table is as follows (Bamberg et al., 2022, p. 30):
absolute frequencies
forms the counterpart to
the frequency table in the Table 19: General Structure of a Contingency Table With Absolute Frequencies
univariate case.

A/B B1 B2 … BJ

A1 n11 n12 … n1J n1 .

A2 n21 n22 … n2J n2 .

⋮ ⋮ ⋮ ⋱ ⋮ ⋮

AI nI1 nI2 … nIJ nI .

n. 1 n. 2 … n. J n

Source: Heike Bornewasser-Hermes (2023).

The rows of the table contain the values of the variable A. The table, therefore, has as
many rows as the values of the variable A. The individual columns of the table mark the
values of the variable B. The contingency table is composed of two areas regarding the
absolute frequencies. In the center of the table are the absolute frequencies nij of the vari-
able values Ai, Bj for all combinations i = 1, 2, …, I; j = 1, 2, …, J. The absolute
frequency n21, thus, stands for the number of people who have the variable values A2 and
B1. The margin of the table forms the second part of the contingency table where the mar-
ginal frequencies ni . or n . j are placed. Next, ni . sums up the absolute frequencies of the
row i:

60
J
ni . = ∑ nij
j=1

Accordingly, n . j represents the sum of all absolute frequencies of the column j:

I
n . j = ∑ nij
i=1

Finally, the lower right corner of the contingency table with n always contains the total
number of people involved in the sample.

RUNNING EXAMPLE: SMOKING HABITS AND GENDER

With reference to our example, we obtain the following contingency table.

Table 20: Contingency Table With Absolute Frequencies for the

Variables of Gender and Smoking Behavior

A/B S (B1) N (B2)

f (A1) 7 27 34

m (A2) 6 8 14

13 35 48

Source: Heike Bornewasser-Hermes (2023).

Looking at the initial table, we know that there are seven female smokers and 27
female non-smokers. Therefore, there are a total of 34 female nurses, which are
divided into smokers and non-smokers according to the above values. This
information fills the first row. The second row contains the information about
male nurses. Based on the initial table, there are six smokers and eight non-
smokers among them, which brings us to a total of 14 male nurses.

The sum of the first column and, thus, the total number of smokers is 13. There-
fore, the total number of non-smokers is 35. If we did not already know it, we
could calculate the total of all nurses in three different ways. All four values in
the center of the table must add up to the 48 present here. The sum of female
and male nurses and the sum of smokers and non-smokers both must also add
up to a total of 48 nurses.

61
We can see that among both female and male nurses, there are more non-smok-
ers (27 and eight, respectively) than smokers (seven and six, respectively). How-
ever, it should also be noted that there are many more female than male nurses
in our sample.

Based on this contingency table, the question that now arises is whether there is
a correlation between gender and smoking behavior. Does it make a difference if
one is male or female in terms of smoking behavior? This would be the case if,
for example, there were more female smokers than non-smokers and, con-
versely, more male non-smokers than smokers. Since there are more non-smok-
ers than smokers in both sexes, the tendency is that there is little or no correla-
tion.

In order to answer this question in a concrete and meaningful way, a measure called the
Corrected contingency corrected contingency coefficient is used (Bamberg et al., 2022, pp. 36–37). This requires
coefficient four calculations.
This coefficient quantifies
the relationship between
two variables, at least one Step 1: Calculation of the expected frequencies
of which is nominally
scaled.
In the first step, the expected frequencies or the absolute frequencies are determined
under descriptive independence. These are the absolute frequencies for which the two
variables are not related in any way. For instance, the proportion of smokers among
women would be exactly equal to the proportion of smokers among men. The expected
frequency for the variable values Ai and Bj is denoted by n ij and calculated by

ni . · n . j
n ij = n
.

Thus, one takes the product of the marginal frequencies and divides this by the sample
size. In the case of descriptive independence, for all combinations of i and j, nij = n ij. To
show that the two variables are dependent on each other in some way, for at least one
pair Ai, Bj nij ≠ n ij must hold.

RUNNING EXAMPLE: SMOKING HABITS AND GENDER

Regarding our sample, we only need the marginal frequencies and n for the
expected frequencies.

Table 21: Contingency Table With Only Marginal Frequencies for the
Variables of Gender and Smoking Behavior

A/B R (B1) N (B2)

62
w (A1) 34

m (A2) 14

13 35 48

Source: Heike Bornewasser-Hermes (2023).

This results in the following expected frequencies:

34 · 13
n 11= 48
= 9.208
34 · 35
n 12= 48
= 24.792
14 · 13
n 21= 48
= 3.792
14 · 35
n 22= 48
= 10.208

We now explain the first expected frequency n 11. Consider the first row (female)
and first column (smokers). We take the sum of the row (34 female nurses), mul-
tiply it by the sum of the column (13 smokers), and divide the answer by the
total sum of individuals (48). The other three fields are also calculated in the
same way: The row sum is multiplied by the column sum and divided by the
total number individuals.

Step 2: Calculation of the distances between the absolute and expected frequencies

The second step is to see how far the actual absolute frequencies are from the expected
ones. With the expected frequencies, we now have a reference value that we know how to
interpret. If they are available, independence between the two characteristics applies.
Therefore, the further away the actual frequencies are from the expected frequencies, the
stronger the correlation must be. To measure the distance of these frequencies, the χ2
(Greek letter: Chi) is calculated by

I J 2
nij − n ij
χ2 = ∑ ∑ n ij
.
i = 1j = 1

Thus, in each field of the center of the contingency table, the expected frequency is sub-
tracted from the absolute one. This difference is squared since some distances may be
negative and others positive, which could cause them to cancel each other out to a 0.
Finally, the expected frequency is divided again to make the relation of the distances clear.
If one has only a small sample size, a difference equal to, let’s say, 5 is proportionally
larger than if one has a large sample size. This is done for each field of the contingency

63
table. Subsequently, all summands are added up. If the contingency table consists of two
rows and two columns (i.e., both variables have two values each), then χ2 can be calcula-
ted by

2
n · n11 · n22 − n12 · n21
χ2 = n1 . · n2 . · n . 1 · n . 2
.

This makes the first step superfluous because this alternative is based solely on the num-
bers that are in the original contingency table for absolute frequencies.

The value of χ2 can be from 0 to infinity. If χ2 = 0, there is descriptive independence

between the two variables since all absolute frequencies are equal to their expected fre-
quencies. Thus, the numerator in each sum becomes zero. In terms of content, this means
that the two variables have nothing to do with each other. In the present example, gender,
therefore, would be irrelevant. The tendency to smoke or not would be identical for the
two genders included in our sample. The greater the χ2, the greater the dependence.
However, no estimate can be made as to the value above when there is a strong or very
strong dependence.

RUNNING EXAMPLE: SMOKING HABITS AND GENDER

For the calculation of χ2 for our sample, it is helpful to consider the absolute
and expected frequencies together in a contingency table. The following table
summarizes the absolute and expected (in parentheses) frequencies.

Table 22: Absolute Frequencies and Expected Frequencies (in

Parentheses)

A/B R (B1) N (B2)

w (A1) 7 (9.208) 27 (24.792) 34

m (A2) 6 (3.792) 8 (10.208) 14

13 35 48

Source: Heike Bornewasser-Hermes (2023).

χ2 is calculated like this:

64
2 2 2 2
7 − 9.208 27 − 24.792 6 − 3.792 8 − 10.208
χ2= 9.208
+ 24.792
+ 3.792
+ 10.208
=0.529 + 0.197 + 1.286 + 0.478
=2.49

Alternatively, χ2, at this point, can also be calculated by

2
48 · 7 · 8 − 27 · 6
χ2 = 34 · 14 · 13 · 35
= 2.49 .

How did we proceed from here? The numerator contains the total number of
people (48). In the first step, the two values on the diagonal (7 and 8) are multi-
plied in the parentheses. The product of the two numbers on the secondary
diagonal (27 and 6) is subtracted from the result. The content of the parentheses
must be squared in total. In the denominator, all edge frequencies (34, 14, 13,
and 35) are multiplied together. Since this value can be from 0 to infinity, we will
certainly not get an extraordinarily high value here.

Step 3: Calculation of the contingency coefficient

In this step, the contingency coefficient K is calculated. Since χ2 is not normalized to a

range and is also positively dependent on n (the larger n, the larger χ2), obtaining an inter-
pretable result in this calculation step is unavoidable. The contingency coefficient is calcu-
lated by

χ2
K= 2
.
χ +n

It can only assume values between 0 and 1, inclusive. We would obtain a 0 if the expected
frequencies corresponded exactly to the absolute frequencies, and χ2 = 0 would also
apply. The further apart the absolute and expected frequencies are from each other, the
larger χ2 and, thus, K become.

RUNNING EXAMPLE: SMOKING HABITS AND GENDER

For the present example, the contingency coefficient is

2.49
K= 2.49 + 48
= 0.222 .

However, we will not interpret this result yet. There is still one last calculation
step to be done.

65
Step 4: Calculation of the corrected contingency coefficient

In the last step, the contingency coefficient is corrected. This results in K*. This is because
the more expressions I and J that the two variables A and B have, the larger the number
of cells in the contingency table is and the larger χ2 becomes automatically (since χ2 con-
sists of several summands and one would always add up more). This influence is elimina-
ted here in the last step by

K M −1
K* = Kmax
with Kmax = M
with M = min I, J .

The function min is a minimum function and selects the smaller number of columns or
rows. This results in M , which is used for Kmax and finally used to calculate K*.

RUNNING EXAMPLE: SMOKING HABITS AND GENDER

In our example, for the correlation between gender and smoking behavior, the
corrected contingency coefficient is as follows:

2−1
M = min 2, 2 =2 Kmax = = 0 . 707
2
0 . 222
K* = = 0 . 314
0 . 707

Let’s explain it now. Start with the number of rows (2) and columns (2) of the
contingency table. If these two numbers are in curved brackets, you choose the
smaller number of the two. Since we are dealing with the same number twice,
the decision is trivial. If, however, we included occasional smokers in the smok-
ing behavior, the contingency table would consist of three columns. One would
then choose the smaller number of 2 and 3. Once 2 is selected, it is substituted
for M in the formula for Kmax. This gives the above 0.707, which is finally used
to divide the contingency coefficient from the third step, 0.222, by this 0.707.

This leaves only the interpretation of the corrected contingency coefficient. The values
that K* can assume lie between 0 and 1, inclusive. It assumes the value 0 precisely when
there is descriptive independence; the two variables are therefore not related in any way.
The higher the value for K*, the stronger the correlation. In particular, the following clas-
sification applies to the interpretation:

• 0 ≤ K* ≤ 0.2: no correlation/no dependence

• 0.2 < K* ≤ 0.4: weak correlation/weak dependence
• 0.4 < K* ≤ 0.6: medium correlation/medium dependence
• 0.6 < K* ≤ 0.8: strong correlation/strong dependence
• 0.8 < K* ≤ 1: very strong correlation/very strong dependence

66
RUNNING EXAMPLE: SMOKING HABITS AND GENDER
Since K* = 0.314 applies in the present example, there is a weak dependence
or correlation between gender and smoking behavior. Even if we find a strong
correlation here, the mere number does not tell us what the nature of the corre-
lation is. So, whether male nurses are likely to smoke more than female nurses
or vice versa can only be read from the original contingency table.

3.2 Rank Correlation Analysis

Now, we look at the relationship between two ordinal scale variables. Since the structure
of ordinal scale variables is different from that of nominal scale variables, a different opti-
mal form of analysis is used here.

The two ordinal scale variables are denoted by X and Y in this context. For each object
carrier i in the sample, there is an observation xi for the variable X and an observation yi
for the variable Y . For each object, there is also a pair of points

xi, yi for i = 1, …, n .

EXAMPLE: SURVEY ON CARE ROBOTS

To carry out an analysis, we consider the following example. In a hospital, some
patients were increasingly cared for by nursing robots during their stay. At the
time of their discharge, they were asked to rate their satisfaction with the robots
on a classic school grade scale from “very good” to “insufficient.” They also
expressed their satisfaction with their health status on the same scale. The man-
agers of the hospital asked themselves whether there could possibly be a con-
nection between patient satisfaction with the nursing robots and satisfaction
with health status. Since only four patients gave a final statement about their
health status, we can only check for such a connection for n = 4 persons. In the
following table, we find the answers of the individual patients summarized. Vari-
able X is “patient satisfaction with nursing robots,” and Y is “patient satisfac-
tion with health status” (the order is arbitrary).

67
Table 23: Initial Data on the Relationship Between Satisfaction With
Care Robots and Satisfaction With Health Status

Patient Satisfaction with Satisfaction with

robot health status

i xi yi

1 good very good

2 satisfied satisfied

3 poor insufficient

4 very good good

Source: Heike Bornewasser-Hermes (2023).

While, for example, the first person rated their satisfaction with the robots as
“good” and satisfaction with their health status as “very good,” the third person
rated both as “poor” and “insufficient,” respectively. The second person rated
both as “satisfactory,” which is average, and the fourth person was in the upper
range regarding satisfaction both with the robots and their own health status.

Now, the question arises how to measure a correlation between two ordinal scale varia-
bles. We have the advantage here that the variable values can be placed in a meaningful
order. This makes it possible, for example, to check whether people who are very satisfied
with the nursing robots are also highly satisfied with their health status. The two variables
Monotonicity can, therefore, be tested for monotony. Monotonicity can mean two things:
This can be understood as
either a correlation in the
same direction or a corre- 1 x1 < x2 y1 < y2
lation in the opposite
direction.
2 x1 < x2 y1 > y2

Let’s go over these relationships in detail. The first type of monotonicity involves a recti-
fied or monotonically increasing correlation. This means that if Person 1 has a smaller pro-
ficiency than Person 2 with respect to variable X, then they also have a smaller profi-
ciency with respect to variable Y . Regarding our example, if satisfaction with robots is
high, so is satisfaction with health status and vice versa.

68
The second type of monotonic relationship is an opposite or monotonically decreasing
one. In this case, Person 1 has a lower score on one value than Person 2 but a higher score
on the other. Regarding our example, this means that, for example, there is a high level of
satisfaction with the robots but a rather low level with health status.

Now, how can we test whether the two variables are monotonically related to each other?
For this purpose, the observations are replaced by ranks. Within a variable, ranks are
assigned to the characteristic carriers. The one with the best expression usually gets the
rank of 1. The characteristic carrier with the worst expression gets the last rank n. The
ranks for the observations xi of the variable X are denoted by ri. The ranks for the obser-
vations yi of the feature Y are called si.

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

Regarding our example, we can find the ranks for the two variables in the table
below. Let’s take a closer look at this for the expressions xi (“satisfaction with
robots”) combined with the ranks ri. The fourth patient is on Rank 1 because
they have the highest satisfaction with the answer “very good;” the first patient
is on Rank 2 with the evaluation “good;” the second patient is on Rank 3 with a
“satisfactory” evaluation; and the third patient is on Rank 4 with the answer
“poor.”

Table 24: Ranks of the Relationship Between Satisfaction With Care

Robots and Satisfaction With Health Status

i xi yi ri si

1 good very good 2 1

2 satisfied satisfied 3 3

3 poor insufficient 4 4

4 very good good 1 2

Source: Heike Bornewasser-Hermes (2023).

For the expressions yi (“satisfaction with health status”), the ranks si are
assigned analogously. It was possible that two or more patients gave the same
rating to satisfaction with the robots or health status. In this case, a binding and

69
average rank would have to be assigned. However, we are not going to use such
a procedure in this script (if you are interested, please refer to Fahrmeir et al.,
2016, pp. 133–134).

The measure that is based on the ranks and allows a statement about a monotonic corre-
Spearman’s rank correlation is called Spearman’s rank correlation coefficient rS and can be calculated as fol-
lation coefficient lows (Bamberg et al., 2022, p. 35):
The Spearman's rank cor-
relation coefficient is
used for the relationship n
between two variables ∑ ri − −
r · si − −
s
that are both at least ordi- i=1
nally scaled. rS =
n n
2 2
∑ ri − −
r · ∑ si − − s
i=1 i=1

It is used to assess the relationship between two variables that are both at least ordinal
scaled in nature. The following is calculated in the numerator: For both the feature X and
the feature Y , we look at how far the respective ranks of each feature carrier are from the
average rank − r and − s , respectively. For each object i, the product of the differences is
formed. These individual products are finally summed over all individuals. For the denom-
inator, the differences already calculated for the numerator are squared and summed. If
there are no ties, the rank correlation coefficient can also be determined using

n
6 · ∑ di2
i=1
rS = 1 − with di = ri − si
n · n2 − 1

This variant saves a lot more time and is strongly recommended if there are no ties. We
must calculate only the difference di between the ranks and for each feature carrier. Since
we consider just the cases without ties, we can use the second formula at any time.

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

Regarding our example, we will go through both variants once. For the first for-
mula mentioned, the average ranks must first be formed:

−
r=
2+3+4+1
= 2.5
4
−
s=
1+3+4+2
= 2.5
4

Since the same ranks from 1 to n are assigned for both characteristics, the aver-
age rank for both is always identical. Here, this is 2.5 for each case. Subse-
quently, it makes sense to create the following auxiliary table.

70
Table 25: Auxiliary Table for the Calculation of the Rank Correlation Coefficient (1)

i ri si ri − −
r si − −
s ri − −
r · ri − −
r
2
si − −
s
2
si − −
s

1 2 1 -0.5 -1.5 0.75 0.25 2.25

2 3 3 0.5 0.5 0.25 0.25 0.25

3 4 4 1.5 1.5 2.25 2.25 2.25

4 1 2 -1.5 -0.5 0.75 2.25 0.25

Σ 4 5 5

Source: Heike Bornewasser-Hermes (2023).

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

We will go through the calculated values of the first feature carrier. The ranks of
2 and 1 were assigned to them. If we subtract the mean value of 2.5 from both,
we obtain the numbers −0 . 5 and −1 . 5. If we multiply these two numbers
together, we obtain 0 . 75. The last two columns are calculated by squaring the
2 2
two differences ( −0.5 = 0.25 and −1.5 = 2.25). The procedure for the
other three feature carriers is analogous. Finally, the sum of the last three col-
umns is calculated and used in the formula of the rank correlation coefficient:

n
∑ ri − −
r · si − −
s
i=1 4
rS = = = 0.8
n n 5·5
2 2
∑ ri − −
r · ∑ si − − s
i=1 i=1

We obtain a rank correlation coefficient of 0.8 as a result. The alternative form of

the calculation would require the last column in the following table.

Table 26: Auxiliary Table for the Calculation of the Rank Correlation
Coefficient (2)

di = ri
i xi yi ri si − si

very
1 good good 2 1 1

2 satisfied satisfied 3 3 0

71
insuffi-
3 poor cient 4 4 0

very
4 good good 1 2 -1

Source: Heike Bornewasser-Hermes (2023).

Here, only the two ranks must be subtracted from each other for each object. For
the first person, this is 2 − 1 = 1. If we calculate these differences for all objects,
we obtain the following for the rank correlation coefficient:

n
2
6 · ∑ di 2
i=1 6 · 12 + 02 + 02 + −1
rS = 1 − =1− = 0.8
n · n2 − 1 4 · 42 − 1

The result is, of course, the same as in the previous formula. However, we will
get it faster with this variant.

We now interpret our result. rS can only assume values between −1 and +1, inclusive. We
obtain positive values if there is a monotonically increasing correlation or a correlation in
the same direction between the two variables under consideration. If, on the other hand,
rS is in the negative range, a monotonically decreasing or opposite relationship applies.
For rS = 0, there is no monotonic correlation. This does not have to mean that there is
generally no correlation between the considered variables. It is only certain with the result
that this correlation is “not monotonous.” If you receive an arbitrary value from 0 to 1 both
in the positive and negative range, the following classification scheme can help you inter-
pret the result:

• 0 ≤ rS ≤ 0.2: no monotonic correlation

• 0.2 < rS ≤ 0.4: weak monotonously decreasing/increasing correlation
• 0.4 < rS ≤ 0.6: mean monotonically decreasing/increasing correlation
• 0.6 < rS ≤ 0.8: strong monotonically decreasing/increasing correlation
• 0.8 < rS ≤ 1: very strong monotonically decreasing/increasing correlation

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

Regarding our example, with rS = 0.8, there is a strong monotonically increas-
ing correlation between the satisfaction with nursing robots and satisfaction
with health status. If a patient shows a high level of satisfaction with the nursing
robots, they usually show this with their health status and vice versa.

72
In our interpretation, it is of enormous importance to mention “and vice versa” here. If we
omitted this from the above formulation in the present example, we might have gotten the
impression that satisfaction with nursing robots influences satisfaction with health status.
However, we do not know that at all. A correlation assumes that they can influence each
other. A mutual influence or dependence is examined with a correlation, and it is certified
in the present case.

3.3 Correlation Analysis

In this section, we explain how the correlation between two cardinal scale variables can
be examined. Since we find the strongest scale level with the cardinal scale, we can also
perform the strongest correlation analysis. Like the ordinal scale variables, let us denote
our cardinal scale variables with X and Y . The individual observations of feature i are
again denoted by xi and yi. Here, too, there is a pair of points for each individual object:

xi, yi for i = 1, …, n

EXAMPLE: AGE OF PARENTS

As an example, imagine we are in the pediatric ward of a hospital. Twelve chil-
dren in this ward have a similar clinical picture. Suppose that researchers have
particularly scrutinized the family background of these little patients and
requested information on the age structure of their parents. The ages of the 12
pairs of parents are shown in the following table.

Table 27: Initial Data for the Relationship Between the Age of the
Mothers and Fathers of Young Patients

Individual Mother’s age Father’s age

i xi yi

1 56 60

2 49 55

3 48 46

4 46 52

73
5 47 56

6 56 51

7 57 71

8 53 60

9 58 61

10 54 58

11 47 49

12 53 65

Source: Heike Bornewasser-Hermes (2023).

For instance, the first child has a mother aged 56 years and a father aged 60
years. We now want to ask whether there is a connection between the age of the
mother and the age of the father. Is a comparatively older mother associated
with a relatively older father? Or is an older mother associated with a younger
father and vice versa?

Scatter Plot

Scatter plot To get a first impression of the correlation of the variables, a scatter plot is drawn (Bam-
A scatter plot visualizes berg et al., 2022, p. 32). A scatter plot is a coordinate system in which the two characteris-
the relationship between
two cardinal scale varia- tics form the axes. It visualizes the relationship between two cardinal scale variables. The
bles. x-axis is described by the variable X and the y-axis by the variable Y . The pairs of points
of the individual persons are drawn as dots, circles, or asterisks in the scatter diagram.
Based on the arrangement of the points in their entirety, it can finally be seen whether
there is a positive or negative correlation.

RUNNING EXAMPLE: AGE OF PARENTS

Returning to our example, the scatter plot for the variables “age of the mother”
and “age of father” is shown in the figure below.

74
Figure 9: Scatter Plot for the Correlation Between the Age of the
Mothers and Fathers of Young Patients

Source: Heike Bornewasser-Hermes (2023).

The scatter plot shows that the older a child’s mother is (moved further to the
right on the x-axis), the older the father is (moved further up on the y-axis) and
vice versa. Accordingly, a rather younger mother is associated with a younger
father and vice versa. Overall, the scatter plot indicates a positive correlation.

Regarding the interpretation, it is helpful to consider the mean values of the two variables
in the scatter plot. −
x is entered vertically and − y horizontally in the scatter plot. This
results in four quadrants:

• Quadrant I: xi > −x , yi > −

y xi − −
x > 0, yi − −
y >0
• Quadrant II: xi < x , yi > −
− y xi − x < 0, yi − −
− y >0
• Quadrant III: xi < −
x , yi < −y xi − −
x < 0, yi − −y <0
• Quadrant IV: xi > −x , yi < −y xi − −
x > 0, yi − −y <0

These quadrants are discussed in further detail below:

• In Quadrant I, there are those people who have an above-average value for both varia-
bles. Thus, the differences to the mean value are positive in each case; both parents are
older than the average.
• Quadrant II contains those objects who are below average with respect to X and above
average with respect to Y . The difference between the individual values of variable X
and the mean value is, therefore, negative. The difference between the individual values
of variable X and the mean value is also negative (i.e., the mothers are younger than
the average). The differences for the values of Y are positive (i.e., the fathers are older
than the average).
• Quadrant III contains objects who are below the mean for both variables so that the
differences to the mean are negative (i.e., both parents are younger than the average).

75
• In Quadrant IV, the mothers have an above-average value for variable X and a below-
average value for variable Y . The mothers have a positive difference to the mean value.
Thus, for the values of variable X, positive differences to the mean are obtained (i.e.,
the mothers are older than the average). For variable Y , negative differences are
obtained (i.e., the fathers are younger than the average).

The following figure illustrates this situation graphically.

Figure 10: General Scatter Plot Divided Into Quadrants

Source: Heike Bornewasser-Hermes (2023).

If most of the observations are in Quadrants I and III, there is a positive correlation
because both variables tend to the same direction: Either both are below average or above
average. If, conversely, most of the observations are in Quadrants II and IV, there is a nega-
tive correlation. The two variables behave opposite to each other. One variable is below,
and the other above the average.

RUNNING EXAMPLE: AGE OF PARENTS

In our example, the mean values for the age of the mothers and fathers are as
follows:

−
x = 52 −
y = 57

These are considered vertically and horizontally, respectively, at the corre-

sponding points in the scatter plot.

76
Figure 11: Scatter Plot for the Correlation Between the Age of the
Mothers and Fathers of Young Patients Divided Into Quadrants

Source: Heike Bornewasser-Hermes (2023).

Eleven of the 12 observations are in Quadrants I and III. This means that there is
a positive correlation between the age of the mother and that of the father.

Bravais-Pearson Correlation Coefficient

With the scatter plot, we could see if the correlation between the two variables under con-
sideration was positive or negative. We now need to clarify how strong the correlation is
and which type of correlation can be measured at all. For two nominal scale variables, we
have only checked whether there is a correlation or not. Since ordinal scale variables can
be placed in a meaningful order, it is possible to check for a monotonic correlation. The
distances between two expressions can be interpreted in a mathematically meaningful
way for cardinal scale variables. This means that even stronger analyses can be performed
at this point. Namely, it should now be checked whether the cloud of points in the scatter
plot can be described by a linear straight line. The closer the points are to each other and
form a straight line, the stronger the linear relationship is.

A measure which can make a statement in this regard is the Bravais-Pearson correlation
coefficient rx, y (Bamberg et al., 2022, p. 34). It is defined by Bravais-Pearson corre-
lation coefficient
The Bravais-Pearson cor-
n
relation coefficient
∑ xi − −
x · yi − −
y requires a cardinal scale
i=1 level from both variables.
rx, y =
n n
2 2
∑ xi − −
x · ∑ yi − − y
i=1 i=1

77
and, thus, is very similar to Spearman’s rank correlation coefficient. Instead of ranks, the
characteristic expressions themselves are used here. The numerator is also referred to as
Covariance covariance (Bortz & Schuster, 2010, p. 153). The covariance decides the sign of rx, y. If
The covariance measures
most of the observations are in Quadrants I and III, the covariance and, thus, rx, y become
the relationship between
two cardinal scale varia- positive. If, on the other hand, most of all observations are in Quadrants II and IV, we
bles but is not normalized obtain a negative covariance and a negative correlation coefficient. In the denominator,
to a specific range.
we can observe the product of both sample variances under the root. An alternative to the
above calculation is the following function:

− −−
xy x ·−
y
rx, y =
2 2
x − x · y2 − y
2

It is only based on the calculation of mean values and can, therefore, be solved more
quickly. The only innovation within this formula is the mean xy. It is calculated by
n
1
xy = n
∑ xi · yi.
i=1

RUNNING EXAMPLE: AGE OF PARENTS

Let’s now look at the application of both formulas with an example. We start
with the first way of calculating the correlation coefficient. For this, we have cre-
ated the following help table.

Table 28: Auxiliary Table for the Calculation of the Correlation Coefficient

xi − −
x xi − −
x ·
2 2
i xi yi yi − y yi − y xi − −
x yi − −
y

1 56 60 4 3 12 16 9

2 49 55 -3 -2 6 9 4

3 48 46 -4 -11 44 16 121

4 46 52 -6 -5 30 36 25

5 47 56 -5 -1 5 25 1

6 56 51 4 -6 -24 16 36

7 57 71 5 14 70 25 196

8 53 60 1 3 3 1 9

9 58 61 6 4 24 36 16

10 54 58 2 1 2 4 1

78
11 47 49 -5 -8 40 25 64

12 53 65 1 8 8 1 64

220 210 546

Source: Heike Bornewasser-Hermes (2023).

RUNNING EXAMPLE: SURVEY ON CARE ROBOTS

With the exception of the ranks, the procedure for calculating the individual
numbers is completely identical to the rank correlation coefficient. Neverthe-
less, we take a closer look at the first line: The mother is 56, and the father is 60
years old. If we subtract the average age of the mothers from 52, we get 4. Since
the fathers are 57 years old on average, the difference between the 60-year-old
father and the average age is 3. The 4 multiplied by the 3 gives the result of the
adjacent 12. And, finally, for the last two columns, the differences 4 and 3 are
squared to get 16 and 9, respectively. The calculation is analogous for the
remaining 11 lines.

Ultimately, the sums of the last three columns are required for the final determi-
nation of the correlation coefficient. These are substituted into the starting for-
mula as follows:

n
∑ xi − −
x · yi − −
y
i=1 220
rx, y= = = 0 . 650
n n 210 · 546
2 2
∑ xi − −
x · ∑ yi − − y
i=1 i=1

The positive covariance 220 is responsible for the fact that the overall result
must also be positive. Since most of the observations are in the first and third
quadrant, such a number must result. The second variant requires the calcula-
tion of the following seven mean values:

−
x =
1
· 56 + 49 + … + 53 = 52
12
−
y =
1
· 60 + 55 + … + 65 = 57
12
−
x 2 = 522 = 2704

y 2 = 572 = 3249
1
x2 = 12
· 562 + 492 + … + 532 = 2721.5
1
y2 = 12 · 602 + 552 + … + 652 = 3294.5
1 −
xy = 12 · 56 · 60 + 49 · 55 + … + 53 · 65 = 2982. 3

79
The only mean value that we have not calculated in advance is the mean value
xy. To do this, we go through all pairs of parents and multiply the age of the
mother by that of the father. The sum is formed from all pairs of parents and
then divided by the total number of pairs of parents, which is 12.

Using the seven calculated values, we arrive at the same result for the correla-
tion coefficient

−
xy − x · −
y 2982. 3 − 52 · 57
rx, y= =
2 2 2721.5 − 2704 · 3294.5 − 3249
x2 − x · y2 − y
−
18. 3
= = 0.650 .
17.5 · 45.5

Due to the significantly shorter calculation paths, it is recommended to use this

second variant to calculate the correlation coefficient. Regarding the present
example, we have to determine 63 numbers in advance in the first variant, while
only the seven mean values are required in the second variant.

Let’s now interpret the result rx, y. Just like rS, rx, y can assume values from −1 to +1. If
the coefficient is positive, there is a positive linear relationship between the two variables
X and Y . If one variable assumes large values, this tends to apply to the other variable. A
negative value is obtained for rx, y if there is a negative linear relationship between the
two variables. The two variables behave in opposite ways to each other. Ifrx, y = 1 for the
correlation coefficient, there is a perfect positive linear relationship. As can be seen in the
following figure, all points can now be connected by a straight line.

80
Figure 12: Scatter Plot With a Correlation Coefficient of +1

Source: Heike Bornewasser-Hermes (2023).

Whether the straight line begins at the origin or not is completely irrelevant. The only
important thing is that all points can be connected by a straight line.

If a correlation coefficient of rx, y = − 1 is calculated, then there is a perfect negative lin-

ear relationship. As in the following figure, all points can also be connected by a straight
line. However, the gradient is negative and, thus, runs from top left to bottom right.

81
Figure 13: Scatter Plot With a Correlation Coefficient of -1

Source: Heike Bornewasser-Hermes (2023).

For rx, y = 0, there is no linear connection between the considered variables. It is also said
that the variables are uncorrelated. Note that this only shows that the relationship is non-
linear. There can be a relationship that is, for instance, quadratic or exponential. This can
be seen in the following left figure. The points can be linked perfectly, but the connection
is not a linear one. Also, a scatterplot like the one on the right, where the points are dis-
tributed without any fixed negative or positive structure, results in a correlation coefficient
close to 0.

82
Figure 14: Scatter Plots With a Correlation Coefficient of 0

Source: Heike Bornewasser-Hermes (2023).

In the vast majority of cases, the correlation coefficient is somewhere between 0 and 1 or 0
and −1. In analogy to the rank correlation coefficient, the following classification applies
to the strength of the relationship:

• 0 ≤ rx, y ≤ 0.2: no linear relationship,

• 0.2 < rx, y ≤ 0.4: weak negative/positive linear relationship,
• 0.4 < rx, y ≤ 0.6: medium negative/positive linear relationship,
• 0.6 < rx, y ≤ 0.8: strong negative/positive linear relationship, and
• 0.8 < rx, y ≤ 1: very strong negative/positive linear relationship.

83
RUNNING EXAMPLE: AGE OF PARENTS
In our example, the correlation coefficient rx, y = 0.65 for the variables “old
mother” and “old father” means that there is a strong positive linear relation-
ship between these two variables. So, if the mother is older, this is usually the
case for the father and vice versa.

Let’s now go back to the scatter plot. The points are arranged in ascending form
from bottom left to top right. There is, therefore, a positive connection, which is
why a positive result is also obtained here. It is not surprising that we deviate
slightly from a perfect result of 1, since we cannot connect the points in such a
way to form a straight line.

3.4 Correlation Measures for Different

Scale Levels
If we are dealing with two variables that have different scale levels, we can, of course,
check whether there is a connection between these two. For example, we might want to
find out if there is a dependency between gender (nominal scale) and income (cardinal
scale). For this, we should first find out at which scale levels the variables are available. In
particular, it is important to know which is the weaker of the two (Bamberg et al., 2022,
p. 33). In our example, this would be the nominal scale. Then, we choose the analysis
method that is used for two of the weaker scale levels. Accordingly, the corrected contin-
gency coefficient K* is used here. The following table shows the different combinations of
the two scale levels and which measure is the best.

Figure 15: Optimal Correlation Measures Depending on the Scale Level

Source: Heike Bornewasser-Hermes (2023).

84
The three cases that we have discussed so far are on the diagonal. In these cases, the two
scale levels are identical. Above the diagonals are the optimal measurements that you
would take for two different scales.

Problems With Interpreting Correlations

Problems that can occur with correlations are mainly due to the interpretation of the
results. Basically, one is aware of the fact that correlations are symmetrical in nature.
X and Y (or A and B) are equal and can theoretically influence each other. Therefore, we
cannot read from the result of any association measure that one variable influences the
other and not vice versa.

Spurious correlation

It is often concluded from high correlations that the two variables under consideration are
dependent on one another. However, correlation does not necessarily mean causality. It is
often a matter of spurious correlations, which means that a third variable is responsible
for the high correlation. Suppose there is a high negative correlation between the number
of hairs on the head and income for men. This high correlation is certainly not due to the
fact that these two variables are connected but rather to the fact that a third variable, such
as age, correlates strongly with both variables and, thus, ensures a high correlation
between the number of hairs on the head and income.

Nonsense correlation

The so-called “nonsense correlation” is also closely related to the previous point. One
should never pay too much attention to a high correlation between two totally irrelevant
variables.

Type of correlation

Finally, with the two correlation coefficients according to Spearman or Bravais-Pearson, in

particular, it should be noted that a correlation of 0 only means that there is no monotonic
or linear relationship. It cannot be ruled out that a different type of connection neverthe-
less exists. Therefore, one should never claim that there is no correlation.

SUMMARY
As a second part of the descriptive statistics, a bivariate (i.e., two-dimen-
sional data) analysis should be used if a connection between two varia-
bles is to be checked. This can be employed to find out whether two var-
iables are related in any way. It should be noted that the bivariate
analysis initially assumes that the two variables influence each other.
One should, therefore, never suppose that one variable has a clear direc-
tion of effect on another.

85
In order to select the right measure of association, the weaker level of
the scale should always be considered as a guide value.

86
UNIT 4
LINEAR REGRESSION

STUDY GOALS

On completion of this unit, you will be able to ...

– describe the field of application of linear regression.

– set up the equation of a simple linear regression.
– interpret the regression coefficient.
– assess the regression line drawn up.
4. LINEAR REGRESSION

Introduction
How does age affect income? Does blood alcohol concentration affect reaction time? If so,
how strong is this influence? How do age and IQ affect the ability to concentrate? If we
want to examine how one or more cardinal scale variables affect another cardinal scale
variable, we use linear regression analysis. If the focus is only on the influence of a single
cardinal scale variable on another cardinal scale variable, this is referred to as simple lin-
ear regression. Finally, if we assume that several cardinal scale variables influence another
cardinal scale variable, we are referring to multiple linear regression.

In this unit, we will deal exclusively with simple linear regression. In addition, it should be
made clear that the simple linear regression is only explained in terms of its most impor-
tant features here. It is a very extensive process that must meet numerous requirements
and is, therefore, subject to several tests.

4.1 Basics of Simple Linear Regression

Analysis
Assume that we have two variables X and Y for which the observations xi, yi for the
objects i = 1, …, n are available. Accordingly, as in a correlation study, we have the point
pairs available for individual objects. In contrast to the correlation, however, we now
assume that the variable X influences the variable Y and not the other way around. There
must be a reasonable suspicion for this influence to be exerted. For instance, you might
already know from other studies that the variable X influences the variable Y and would
like to verify the fact again or apply it to a new data set. Although, it can also be predeter-
mined simply because of the chronological order, i.e., only X can influence the variable Y
and not the other way around. The variable X is also referred to as the independent vari-
Independent variable able or a predictor. The variable Y , on the other hand, is referred to as a dependent varia-
An independent variable ble or criterion (Bortz & Schuster, 2010, p. 184).
can influence another var-
iable.
Dependent variable
A dependent variable is
influenced by one or
more independent varia-
EXAMPLE: REACTION TIMES WHILE UNDER THE INFLUENCE OF
bles. ALCOHOL
We now consider an example to analyze simple linear regression: We all know
that driving under the influence of alcohol is not permitted. To better under-
stand this, we will examine how alcohol consumption affects reaction time. For
this purpose, the alcohol concentration in parts per thousand was measured at
a party for a total of 10 guests. Immediately after the measurement, the 10 sub-
jects completed a reaction test on a laptop that recorded their reaction time in
milliseconds. This resulted in the following data.

88
Table 29: Initial Data for Simple Linear Regression

Person i xi (per mille) yi (in milliseconds)

1 0 590

2 0.3 581

3 0.5 687

4 0.7 658

5 1 632

6 1.2 645

7 1.4 687

8 1.8 624

9 2.3 702

10 2.5 789

Source: Heike Bornewasser-Hermes (2023).

For instance, the first person had 0 alcohol per mille in their blood and achieved
a reaction time of 590 milliseconds. The seventh person, on the other hand, had
1.4 per mille and a much longer reaction time of 687 milliseconds. The inde-
pendent variable X is defined here by the alcohol concentration and the
dependent variable Y by the reaction time because we want to investigate the
influence of alcohol consumption on the reaction time.

In order to get a first impression of the connection between the two variables X and Y , we
look at the data in a scatter plot (like with the correlation analysis). At this point, it is cru-
cial to know which axis in the coordinate system stands for which variable. The independ-
ent variable X is always plotted on the x-axis and the dependent variable Y on the y-axis.

89
RUNNING EXAMPLE: REACTION TIMES WHILE UNDER THE INFLUENCE
OF ALCOHOL
Regarding our example, we plot the alcohol concentration on the x-axis and the
reaction time on the y-axis. If all objects with their two values are then consid-
ered in the scatter plot, the following form is obtained.

Figure 16: Scatter Plot for the Initial Data of the Simple Linear
Regression

Source: Heike Bornewasser-Hermes (2023).

We can see that as alcohol concentration increases (moving further to the right
on the x-axis), reaction time also tends to increase (moving further up on the y-
axis). Indeed, we see a positive correlation here.

If we only wanted to check how strong this connection is, we would use the Bravais-Pear-
son correlation coefficient. This tells us how strong the linear relationship is between the
two variables and how well the points can be described by a linear straight line.

RUNNING EXAMPLE: REACTION TIMES WHILE UNDER THE INFLUENCE

OF ALCOHOL
In the present example, suppose that we have a correlation coefficient of
rx, y = 0.742 (we will explain the exact calculation later). Therefore, there is a
strong positive linear relationship between alcohol concentration and reaction
time.

90
Now, the correlation coefficient of rx, y = 0.742 offers a very good basis for
determining the linear straight line that describes the influence of the alcohol
concentration on the reaction time. This is what simple linear regression does: It
determines the regression line for the relationship between an independent var-
iable X and a dependent variable Y .

4.2 Determination of the Regression Line

Now, suppose that we want to determine the linear straight line that best describes the
scatter plot. A linear function basically takes the following form (Bortz & Schuster, 2010,
p. 184):

yi = a + b · xi

We know the values for xi and yi; these are shown in the output table or in the scatter plot.
The parameter a is the y-axis intercept of a linear function and b describes the correspond-
ing slope. These two parameters are determined in the linear regression analysis in such a
way that the point cloud can also be described by a linear straight line. Thus, one tries to
find the straight line in which the points in the scatter plot lie as close as possible. The
statistical method used for this is called the least squares method (Handl & Kuhlenkasper,
2018, p. 478).

To understand this better, we will take a closer look at the following figure.

91
Figure 17: Idea of Simple Linear Regression

Source: Heike Bornewasser-Hermes (2023).

There are three points in the figure labeled y1, y2, and y3. These are the exemplary starting
points, which were obtained by combining xi and y1 for three arbitrary objects. Specifi-
cally, these are the points x1, y1 = 2.2 (at Location 2 on the x-axis and Location 2 on
the y-axis), x2, y2 = 5.10 and x3, y3 = 7.7 . We tried to find a linear line for these
three points; the result is the drawn line. However, none of the three points lies on the
straight line. All of them deviate either upward or downward from it.

For the second point y2, the line is drawn very nicely. Actually, one has observed the point
y2, but based on the estimated regression line, one would predict the value y2. Note that
you must always use the roof on a variable when estimating a value based on a regres-
Residual sion line. Accordingly, one would make an error – also called a residual – if one used the
A residual is the distance regression line to forecast y (Bortz & Schuster, 2010, p. 186). The least squares method
between the actual y-
value and the y-value esti- aims to minimize the sum of these errors. For the above example with the three points,
mated based on the the straight line would be selected where the sum of squared deviations is minimal. We
regression line. take the squared deviations because some points deviate upward and others downward
from the regression line.

Calculation of the Coefficients of the Linear Regression

The formulas for a and b, which are obtained by the least squares method, are now as fol-
lows.

We start with the formula for b. This parameter must always be calculated first because we
need the result of b for the calculation of a. Have a look at the formula.

92
xy − −
x ·−
y
b= 2
2
x −x

The numerator should look very familiar. It contains the covariance between the two vari-
ables. The numerator is, therefore, the same as the numerator of the Bravais-Pearson cor-
relation coefficient. The denominator is not new either; it includes part of the denomina-
tor of the correlation coefficient, namely the variance of the independent variables.
Accordingly, to get the result for b, we need to determine five means, as we have already
learned. It is recommended to start with the simple means − x and −y . We can square the
2
mean value − x directly afterward to get the mean value x . In the denominator, we also
need the mean x2 for which all values of x are squared, summed, and divided by the num-
ber of observations. Finally, for the mean xy, xi and yi are multiplied for each individual,
all products are added up, and they are divided by the number of observations. If all deter-
mined mean values are inserted into the above formula, we obtain the coefficient b. This is
now a component to calculate the second coefficient a:

a=−
y −b·−
x

Once the two coefficients a and b have been determined, the regression line can be drawn.

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
Consider our example. We start with the calculation of the coefficient b. For this
purpose, we first calculate all five mean values:

− 1
x = 10 · 0.0 + 0.3 + … + 2.5 = 1.17
− 1
y = 10 · 590 + 581 + … + 789 = 659.5
2
x =1.172 = 1.3689
1
x2= 10 · 0.02 + 0.32 + … + 2.52 = 2.001
1
xy= 10 · 0.0 · 590 + 0.3 · 581 + … + 2.5 · 789 = 805.65

We now use these calculated mean values to calculate the coefficient b:

xy − −
x ·−y 805.65 − 1.17 · 659.5
b= 2
= 2.001 − 1.3689
= 53.844
x2 − x

Next, the obtained 53.844 is used to determine the coefficient a by

a=−
y −b·−
x = 659.5 – 53.844 · 1.17 = 596.503

All in all, we arrive at the following linear regression equation:

93
yi = 596.503 + 53.844 · xi .

Again, the roof is used as a symbol for an estimate over the dependent variable,
since we can only estimate reaction times based on this equation, which means
that we cannot predict them exactly. Often in the equation, the two mathemati-
cal abbreviations are replaced by the actual variables. Accordingly, one could
also write down the above equation as

reaction time = 596.503 + 5.844

· alcohol concentration .

If you have found a regression line and, therefore, know a and b, you have to take a closer
Regression constant look at them. The parameter a is the regression constant. It describes the y-axis intercept
The regression constant
of the linear regression line, i.e., which value y assumes when x = 0. Considering our
describes the y-intercept
of the linear regression example, a describes which reaction time is to be expected if the alcohol concentration is
line. 0 per mille. Often, this number does not make sense in terms of content since negative
values may result for the regression constant.

Regression coefficient More important is the parameter b, which is called the regression coefficient (Bortz &
The regression coefficient Schuster, 2010, p. 188). It describes the slope of the linear regression line and tells us by
describes the slope of the
linear regression line. how many units y changes when x increases by one unit.

In our example, b describes the change in reaction time when the alcohol concentration
increases by 1 per mille. This can produce both a positive and a negative result. Positive
values represent a positive influence of the independent variable on the dependent varia-
ble. Consequently, a negative regression coefficient means that a negative influence is
exerted. It is true that the larger the regression coefficient, the steeper the regression line.
A high positive regression coefficient would indicate a steep positive regression line.
Accordingly, a strong negative regression coefficient indicates a steep negative regression
line (from top left to bottom right).

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
Let’s now interpret the two calculated parameters in our example. With
b = 53.844, we can state that an increase in alcohol content of per 1 mille causes
the reaction time to increase by 53.844 milliseconds. We speak here of a length-
ening or an increase because a positive result has been calculated for b. A nega-
tive result indicates a reduction or shortening of the reaction time. The regres-
sion constant a = 96.503 means that at an alcohol concentration of 0 per mille,
a reaction time of 596.503 milliseconds is to be expected.

94
The great advantage of the established regression line is that we can use it to
make predictions. For any value xi, we can estimate the corresponding value of
the dependent variable yi. Imagine if the police stopped a suspected road user
and asked them to take a Breathalyzer test, which resulted in an alcohol concen-
tration of 0.4 per mille of blood. We could estimate the reaction time of the road
user based on the equation. Please note that only “estimating” the reaction time
is possible because we can never be completely sure that we hit it exactly. For
that, all points in the scatter plot must lie perfectly on a straight line. The esti-
mate is obtained by substituting the alcohol concentration of 0.4 for xi:

yi = 596.503 + 53.844 · 0.4 = 618.041 .

Thus, for a person with such an alcohol concentration, one would expect a reac-
tion time of 618.041 milliseconds based on the estimated regression equation.
Finally, the regression line can be plotted in the scatter plot.

Figure 18: Scatter Plot for the Initial Data Including Regression Line of
the Simple Linear Regression

Source: Heike Bornewasser-Hermes (2023).

The regression line drawn above is now the one that best describes the influ-
ence of the alcohol concentration on the reaction time. We find the regression
constant of 596.503 where the regression line begins. There is the y-axis inter-
cept. The regression coefficient of 53.844 cannot be read directly from the
graph. If from the starting point of the regression line, one goes to the right by 1
(on the x-axis to Position 1) and from there up to the straight line. Then, one has
to go up by 53.844 milliseconds.

95
Overall, it should be mentioned that calculating the two coefficients a and b and, thus,
establishing the regression equation for prediction is crucial. Additionally, the interpreta-
tion of the two calculated coefficients is important. The graphical representation of the
data situation as well as the drawing of the straight lines is of secondary importance.

4.3 Quality Evaluation

The least squares method finds a linear straight line – no matter how poorly suited it may
be to represent the relationship – for every constellation of points in the scatter plot. For
this reason, it is essential to check and assess the quality of the estimated regression lines.
Three measures are available for doing this:

• the Bravais-Pearson correlation coefficient

• the coefficient of determination
• the standard error

Correlation Coefficient

At the beginning of this unit, we used the Bravais-Pearson correlation coefficient to check
whether there is a linear relationship between alcohol concentration and reaction time.
Now, we use it again to assess the quality of the regression line in terms of linearity. If the
points in the scatter plot do not approximate a linear course, you should not use the calcu-
lated linear straight line to even make predictions based on it. The formula of the Bravais-
Pearson correlation coefficient is as follows:

xy − −
x ·−
y
rx, y =
2 2
2
x − x · y2 − y

Regarding the calculation of rx, y, it is of great advantage that we have already calculated
five of the seven required mean values for the regression coefficient b. Only the mean
2
value −y has to be squared once for y . The final relevant mean y2 again requires squaring
each observation yi, summing them, and dividing the result by the total number of obser-
vations.

We know that the correlation coefficient can take values between −1 and +1, inclusive,
and the proximity of the value to +1 or −1 is indicative of a strong linear relationship. In
the context of this linear regression analysis, we should obtain, at least, a correlation coef-
ficient of 0.5 or −0.5 to speak of an adequate linear relationship.

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
Considering our previous example, we already know five of the seven mean val-
ues for the calculation of the correlation coefficient:

96
− 1
x = 10 · 0.0 + 0.3 + … + 2.5 = 1.17
− 1
y = 10 · 590 + 581 + … + 789 = 659.5
2
x =1.172 = 1.3689
1
x2= 10 · 0.02 + 0.32 + … + 2.52 = 2.001
1
xy= 10 · 0.0 · 590 + 0.3 · 581 + … + 2.5 · 789 = 805.65

2
Only the last two mean values y and y2 are missing. We calculate these first:

2
y =659.52 = 434,940.25
1
y2= 10 · 5902 + 5812 + … + 7892 = 438,271.3

With this, the Bravais-Pearson correlation coefficient can now finally be deter-
mined:

xy − −
x ·−
y
rx . y =
2 2
x − x · y2 − y
2

805.65 − 1.17 · 659.5

= = 0.742 .
2,001 − 1,3689 · 438,271.3 − 434,940.25

As noted above, there is a strong positive linear relationship between alcohol

concentration and reaction time. Regarding the linearity between the two varia-
bles, the established regression line shows a very good quality.

Coefficient of Determination

The coefficient of determination is probably the most important criterion for evaluating a
regression line. But what is meant by the coefficient of determination? We can see that the
reaction times vary for the 10 participants in the previous example. Accordingly, there is a
dispersion in the observations of the dependent variable. With the help of the regression
analysis, we tried to find the reason for the different reaction times in the alcohol concen-
tration. In other words, we tried to explain the reaction time with the help of the alcohol
concentration. The part of the dispersion of the dependent variable (i.e., reaction time)
that can be attributed to the independent variable (i.e., alcohol concentration) is called
the explained dispersion. The remaining part of the dispersion of the dependent variable Explained dispersion
– which is not caused by the independent variable but can be attributed to other factors or The dispersion scatter is
due to the independent
even errors in the measurement – is called unexplained dispersion. For instance, the age variable.
of the participant or the time of day of the reaction test could play a role. Besides these Unexplained dispersion
two examples, many other factors could also explain the variation in reaction time, but The unexplained disper-
sion arises from unac-
they were not considered in the regression model. In addition, the reaction times may not
counted variables and
measurement error.

97
be recorded correctly and there may be measurement errors (Benesch, 2013, pp. 119–
Total dispersion 120). Both dispersions taken together result in the total dispersion of the dependent vari-
The total dispersion is able:
composed of the
explained and unex-
plained dispersion. total dispersion = explained dispersion + unexplained dispersion

In summary, the reaction times of the 10 test subjects are all different. Therefore, we have
a basic dispersion, the total dispersion. A part of this dispersion can be explained by the
alcohol concentration, which describes the explained dispersion. The rest of the total dis-
persion, which is not due to the alcohol concentration, forms the unexplained dispersion.

Coefficient of determi- The coefficient of determination reflects the explanatory power of a regression model.
nation
The coefficient of deter-
With the mathematical abbreviation R2, it puts the explained dispersion in relation to the
mination reflects the total dispersion:
explanatory power of a
regression model. explained dispersion explained dispersion
R2 = explained dispersion+unexplained dispersion
= total dispersion

This results in the proportion of the dispersion of the dependent variable that can be
explained by the independent variable (Benesch, 2013, p. 119). A possible calculation of
the coefficient of determination is calculating the explained and unexplained dispersion
according to the above principle. Since this can be quite time-consuming under certain
circumstances, we use the simplest way to obtain a result for R2:

R2 = rx,
2
y

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
The only thing we must do is to square the result of the Bravais-Pearson correla-
tion coefficient. Since we have already calculated it, this step is not very time-
consuming. In the present example, this results in a coefficient of determination
of

R2 = 0.7422 = 0.551 .

The coefficient of determination can take on values between 0 and 1, inclusive, and results
in a percentage when multiplied by 100. If, for example, we obtain an R2 of 0.7, this means
that 70% of the dispersion of the dependent variable is due to the independent variable.
This is equivalent to saying that the remaining 30% of the dispersion is explained by other
factors or errors. If R2 = 0, the dispersion of the dependent variable cannot be explained
in any way by the independent variable. If R2 = 1, we have a complete explanation, since
the scatter of the dependent variable can be explained perfectly by the independent varia-

98
ble. All points in the scatter plot should lie on a straight line such that the correlation coef-
ficient is either +1 or −1. Consequently, the coefficient of determination becomes 1,
meaning 100%. In general, it is desirable if R2 is close to 1. In practice, one is very satisfied
with a coefficient of determination of at least 0.3, meaning 30%.

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
In our example, the coefficient of determination is 0 . 551, which means that
55.1% of the dispersion in reaction times can be attributed to the alcohol con-
centration. Since the “critical” limit of 0 . 3 was exceeded, the explanatory power
is good. The regression line is, in effect, a suitable instrument for predicting
reaction times. At the same time, it should be noted that 1 − 0.551 = 0.449.
Thus, 44.9% of the dispersion is due to other factors. Accordingly, there must be
other factors that influence alcohol concentration.

Standard Error

The last way to evaluate the regression line is to use the standard error. It describes the Standard error
absolute estimation error when using the regression model. If we wanted to use the estab- The standard error
describes the absolute
lished regression line to make predictions, we would only succeed in making an accurate estimation error when
forecast if all points were on a straight line and we, thus, had obtained a coefficient of using the regression
determination of 1. In most cases, we fail in obtaining this complete explanatory power. As model.

a result, we can make an error in the forecast of the dependent variable. This average error
across all forecasts is described by the standard error, which is represented by σ x, y
(Bortz & Schuster, 2010, p. 190).

We will not explain the calculation of the standard error here(as we need the unexplained
dispersion for that). Rather, we will present its interpretation.

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
In our example, we obtained a standard error equal to

σ x, y = 43.279 .

This means that if we were to use the above regression line to predict reaction
times, we would make an error of 43.279 milliseconds on average. Therefore, we
would deviate on average by 43.279 milliseconds upward or downward from the
true value.

99
The value for the standard error is an absolute value in the unit of measurement of the
dependent variable (here, milliseconds). Whether this standard error is large or small
entails an assessment that is both very subjective and often difficult. A solution is the cal-
Relative standard error culation of the relative standard error, repesented by σ 0, which describes the percentage
The relative standard deviation of the estimated value from the actual one. It is calculated by
error indicates the stand-
ard error in percentage
form. σ x, y
σ0 = y
,

and, thus, it only requires dividing the standard error obtained above by the mean value y
of the dependent variable.

RUNNING EXAMPLES: REACTION TIMES WHILE UNDER THE

INFLUENCE OF ALCOHOL
In our example, this means that we arrive at

43.279
σ0 = 659.5
= 0.066

with y = 659.5. The average deviation of the reaction times estimated based on
the regression line from the actual reaction times is 6.6% 0.066 · 100 .

The percentage deviation can probably be better estimated than the absolute deviation.
The closer we are to 0%, the better the regression equation is suited for forecasts. In the
present case, due to the low percentage error, we can certainly speak of a good prerequi-
site for using the established equation for forecasts.

Overall, it can be stated that the three criteria for assessing the regression line go hand in
hand. If there is a high correlation (whether positive or negative), the result will be a high
coefficient of determination and, consequently, a low standard error and a low relative
standard error. If the correlation is low (no matter whether positive or negative), the result
will be a low coefficient of determination and, thus, a large standard error and a large rela-
tive standard error.

SUMMARY
Simple linear regression is used to examine the influence of a cardinal
scale independent variable X on a cardinal scale dependent variable Y .
Based on the previous findings, we know that independent variable line-
arly influences the dependent variable and in no case the other way
around. If these conditions are met, the linear equation can be construc-
ted based on the least squares method. In particular, the regression
coefficient contained therein describes the nature of the influence of X

100
on Y . Since the least squares method finds a regression line for every
given set of data, it is important to evaluate it in terms of its quality and
explanatory power. While the correlation coefficient determines the
degree of linearity, the coefficient of determination gives us information
about the explanatory content. Finally, the standard error provides
information about the average error one would make when using the
regression model for forecasts.

101
UNIT 5
FUNDAMENTALS OF PROBABILITY THEORY

STUDY GOALS

On completion of this unit, you will be able to ...

– specify the result set of a random experiment.

– perform various operations between events.
– calculate probabilities.
– distinguish between discrete and continuous random variables.
– calculate the expected value and variance of a discrete random variable.
5. FUNDAMENTALS OF PROBABILITY
THEORY

Introduction
What will the weather be like tomorrow? We usually look at the weather app to check the
chance of rain or to see if the sun comes out after all. For such future events, meaning
those that we do not know in advance whether they will occur or not, we want to deter-
mine their probabilities. In this unit, we will concentrate on the basics of probability
theory to understand how such probabilities come about.

5.1 Random Experiments and Events

Many processes in everyday life are random. You do not know in advance how a certain
process will turn out. For example, if you go to a restaurant, bakery, cafeteria, or canteen
at lunchtime, you often do not know in advance whether

• you will like the food,

• how long you must wait until you will receive the food, and
• how many guests are waiting for their food before you.

Therefore, it is unknown which of several possible situations will occur; the outcomes
Random experiment depend on chance. Such processes are called random experiments or random processes
A random experiment (Bortz & Schuster, 2010, p. 49). All possible outcomes of a random process are summarized
deals with processes
whose outcomes depend in the result set Ω or result space (Nachtigall & Wirtz, 2013, p. 18). We will now look at
on chance. some examples.
Result set
The result set describes
the set of all results of a
random experiment.
EXAMPLE: OUTCOME SETS IN RANDOM EXPERIMENTS
Rolling a six-sided die

Probably the most classical example of a random experiment is the throwing of

a standard six-sided die. If we throw a die once, the possible results will be the
numbers 1, 2, 3, 4, 5, and 6. Hence, the result set is as follows:

Ω = 1, 2, 3, 4, 5, 6

Tossing a coin

Tossing a coin is also a very common example of a random experiment and

allows the two outputs: “heads” or “tails.” So, the result set consists of these two
elements:

104
Ω = heads, tails

Tossing a coin twice

If a coin is thrown twice, the result set will expand a little further:

Ω = heads, heads , heads, tails , tails, heads ,

tails, tails

Both the first and the second time, the coin can land on heads (heads, heads) or
on tails (tails, tails). A mix of both is also possible in two different orders (heads,
tails or vice versa).

Healing success

Imagine that we are all therapists. You are interested in the success of each
patient that you have. This can be measured, for example, by “success” or “no
success,” which is reflected in the result set in the same way:

Ω = success, no success

If you are interested in certain outcomes within a random experiment, you must deal with
a special event. Events describe a subset of the result set and include what you are inter- Event
ested in. They are always denoted by a capital letter (e.g., A, B, or C; Nachtigall & Wirtz, An event describes a sub-
set of the result set and
2013, p. 18). We look at individual events for two of the examples mentioned above. includes what you are
interested in.

RUNNING EXAMPLE: OUTCOME SETS IN RANDOM EXPERIMENTS

Rolling a six-sided die

In the case of the throwing of a die, we are interested in even numbers. There-
fore, the events include only the numbers 2, 4, and 6 and are read as

A = 2, 4, 6 .

Healing success

As ambitious therapists, we wish for every patient to be healed with no excep-

tions. Thus, the events of interest consist of

A = success .

105
There are three special forms of events: (1) sure, (2) impossible, and (3) elementary. A sure
Sure event event is one that always occurs. This can only refer to the result set, Ω. No matter how
A sure event always often we roll a die, how often we flip a coin, or how many patients we treat, we know that
occurs.
one of the above-described outcomes (e.g., obtaining a number from 1 to 6 in the case of
rolling a die) will always occur.

Impossible event Animpossible event can never occur. For instance, it is impossible to obtain the number 7
An impossible event can or 8 when throwing standard six-sided dice. We will discuss impossible events in more
never occur.
detail later.

Elementary event Finally, anelementary eventdescribes every single result of the result set. In the therapy
An elementary event is example, both the outcomes “success” and “no success” are elementary events (Bortz &
every single result of the
result set. Schuster, 2010, p. 49).

Operations Between Events

As we have seen, individual result sets and events are sets of possible situations in the
form of numbers or words. Based on these existing sets, the usual set operations can be
carried out because the linking of individual events in turn results in new events. The most
important set operations will be explained by means of examples.

EXAMPLE: DICE GAME WITH THREE PEOPLE

Assume that we are in a game with three participants A, B, and C . In this game,
the die is rolled once. This means that the result set is

Ω = 1, 2, 3, 4, 5, 6 .

Accordingly, the numbers from 1 to 6 can be rolled if the die is rolled only once.
Now, each of the three players needs one of a certain selection of numbers to be
able to win the game:

• Player A needs one of the numbers 1, 2, or 4 to win: A = 1, 2,4 .

• Player B needs one of the numbers 3, 4, or 5 to win: B = 3, 4,5 .
• Player C needs one of the numbers 5 or 6 to win: C = 5, 6 .

Thus, for each player, one of the numbers they are interested in is enough to win
the game.

Complementary event The first possible set operation is a complementary event (also called a counter event).
A complementary event −
Assuming an event A, the complementary event for this is denoted by A . It describes the
contains exactly the
opposite of the actual situation in which the opposite of A occurs. Accordingly, it includes everything that is con-
event. tained in the result set Ω but not in A (Handl & Kuhlenkasper, 2018, p. 181).

106
Such set operations can be illustrated very well using a Venn diagram (Nachtigall & Wirtz, Venn diagram
2013, p. 19). A Venn diagram consists of a rectangular box. This box contains all elements A Venn diagram is used to
illustrate event opera-
of the result set. Depending on the number of events considered, they are represented by tions.
one or more circles (or, as an alternative, rectangles) in the box. These circles contain the
results of the individual events. If you deal with only one event at first, one circle in the
rectangular box is enough.

Figure 19: Venn Diagram for a Complementary Event to A

Source: Heike Bornewasser-Hermes (2023).

The box shown above contains all elements of the result set. Regarding the die example,
the box consists of the numbers 1, 2, 3, 4, 5, and 6. The event here exemplarily labeled
A is represented by the circle and contains those numbers that are of interest with the
event A. For B and C possibly further events, this can be represented in a similar way.
Consequently, in the blue shaded area, we will have all numbers that are not contained in
−
A but are still in the result set. This, therefore, represents the complementary event A .

RUNNING EXAMPLE: DICE GAME WITH THREE PEOPLE

In our running example, we introduced the events A, B, and C and outlined
which numbers allow each player to win the dice-roll game. The complementary
events describe the situation in which each player loses the game. We will look
at the situation using the example of Player A with the help of the Venn dia-
gram.

107
Figure 20: Complementary Event to Event A in Dice Game

Source: Heike Bornewasser-Hermes (2023).

Player A wins the game if the die shows the number 1, 2, or 4: A = 1, 2, 4 .

Consequently, they lose the game if the number 3, 5, or 6 is rolled:
−
A = 3, 5, 6 . We can see that the rectangular box contains all the elements of
the result set (i.e., all six sides of the die), which are then divided between the
event and the complementary event. For the other two players, B and C , the
situation would be the same, except that the numbers would be different: Player
B wins with one of the three numbers, namely 3, 4, or 5: B = 3, 4, 5 . Con-
−
sequently, they lose with 1, 2, or 6: B = 1, 2, 6 . For Player C , 5 or 6 leads to
victory: C = 5, 6 . This is equivalent to saying that they lose with numbers
−
from 1 to 4: C = 1, 2, 3, 4 .

EXAMPLE: EXAM RESULTS

Now, we will move on to a more practical example from everyday life. Imagine
that we are going to take our statistics exam in one month. In this context, the
random experiment “result of the exam” is about
“passing the statistics exam” and “failing the statistics exam.” One might also
be interested in the achievable grades; however, for simplicity we assume the
following result set:

Ω = passing statistics,failing statistics

We are optimistic and want to pass the exam in any case. So, the event A is as
follows:

A = passing statistics

The complementary event to A is the event of not passing the statistics exam:

108
−
A = failing statistics

Now, we bring the two events together and see what operations are possible between
them. The first possibility is the union set A ∪ B (read as A unites B) between the two Union set
events A and B. This operation unifies all the outcomes that are in A or B. It is also said The union set summari-
zes all results which are
that at least one of the two events occurs (Bortz & Schuster, 2010, p. 50). If the two events contained in at least one
A and B have something in common (i.e., they overlap), this commonality is represented of the events.
in a Venn diagram by the middle section.

Figure 21: Venn Diagram for the Union Set of A and B

Source: Heike Bornewasser-Hermes (2023).

EXAMPLE: DICE GAME WITH THREE PEOPLE

We will first look at the situation where we consider the two players such that
A A = 1, 2, 4 and B B = 3, 4, 5 . The Venn diagram is as follows:

Figure 22: Venn Diagram for the Union Set of Players A and B

Source: Heike Bornewasser-Hermes (2023).

109
There are two circles for the two events. They overlap because there is a com-
mon feature with the number 4. The two numbers that are only interesting to
PlayerA (1 and 2) are in the part of Circle A that has nothing to do with Circle B.
Likewise, the two numbers that are decisive for Player B (3 and 5) are only in the
part of Circle B that has no common area with Circle A. The union set now
describes the situation in which at least one of the two players wins.

But what does “at least one of the players wins” mean? It means that either only
A (with 1 or 2), only B (with 3 or 5) wins, or both win together (with 4). Thus, the
union set is A ∪ B = 1, 2, 3, 4, 5 . So, if one of these five numbers is rolled,
at least one will win in every case. For the remaining constellations, the union
set is determined similarly:

• A ∪ C = 1, 2, 4, 5, 6 , since A = 1, 2, 4 and C = 5, 6
• B ∪ C = 3, 4, 5, 6 , since B = 3, 4, 5 and C = 5, 6

It is certainly noticeable that the union set of A and C have no commonality.

Therefore, there is no overlap between the two circles in a Venn diagram.
Rather, they detach from each other.

RUNNING EXAMPLE: EXAM RESULTS

We will now extend the exam example a bit. Next month, we will take not only
the statistics exam but also the math exam. So, what can happen is as follows:

Ω = pass both, pass statistics, fail math ,

pass math, fail statistics , fail both

Next, we will look at the situation using a Venn diagram. The entire left circle, A,
describes passing the statistics exam. The part of A without the overlap means
passing only statistics. The right circle, B, represents passing the math exam,
and the part without the overlap again describes passing this exam only. The
overlapping area means that both statistics and math are passed. Everything
around the two circles signals the failure of both exams.

110
Figure 23: Union Set for Passing at Least One Examination

Source: Heike Bornewasser-Hermes (2023).

We now want to determine the union of these two events, which means passing
at least one exam: statistics only, math only, or both:

A ∪ B = pass both, pass statistics, fail math ,

pass math, fail statistics

Using the average set A ∩ B (read as A intersected B), we can determine the commonal- Average set
ity of the two events (Bortz & Schuster, 2010, p. 51). Accordingly, both events under con- The average set contains
the commonality of
sideration occur simultaneously. events.

Figure 24: Venn Diagram for the Intersection Set ofA and B

Source: Heike Bornewasser-Hermes (2023).

In a Venn diagram, the intersection is exactly where the two events overlap.

111
RUNNING EXAMPLE: DICE GAME WITH THREE PEOPLE
Consider the two players A A = 1, 2, 4 and B B = 3, 4, 5 again. For
this situation, the Venn diagram with the intersection highlighted in blue looks
like this.

Figure 25: Intersection of Players A and B

Source: Heike Bornewasser-Hermes (2023).

The intersection A ∩ B here describes the situation in which both players win
the game together. These two players can only win simultaneously if the num-
ber 4 is rolled. Consequently, A ∩ B = 4 . For the other two possible constella-
tions the following intersections result:

• B ∩ C = 5 , since B = 3, 4, 5 and C = 5, 6
• A∩C = = ∅ (empty set), since A = 1, 2, 4 and C = 5, 6

We should take a closer look at the intersection A ∩ C. Since the two players do
not have a common number to lead to their victory, the intersection is marked
as“empty.” We can mark this either by having no result in the two curly brackets
or by using the ∅ sign for the empty set.

Disjoint Events that have no commonality and, thus, no intersection, are called disjoint or incom-
Two events are disjoint if patible events (Handl & Kuhlenkasper, 2018, p. 181).
they have no commonal-
ity.

RUNNING EXAMPLE: EXAM RESULTS

In our exam example, the intersection A ∩ B stands for passing both exams in
terms of content (see the Venn diagram above).

112
The last set operation is offered by the difference which describes the sole event of an
event. Considering the two events A and B, if we want to determine the difference A\B Difference
(read as A without B), we should find out when A but not B occurs (Handl & Kuhlenkas- The difference describes
the sole occurrence of an
per, 2018, p. 183). In the Venn diagram, this is the part of circle A that does not include the event.
intersection.

Figure 26: Venn Diagram for the Difference A\B

Source: Heike Bornewasser-Hermes (2023).

The difference becomes interesting only if the two events under consideration have some-
thing in common. If their intersection is empty, then A\B will simply correspond to A and
B\A to B. We can see this in the following example, among others.

RUNNING EXAMPLE: DICE GAME WITH THREE PEOPLE

For our two players A A = 1, 2, 4 and B B = 3, 4, 5 , the situation
is as follows. If we want to find out when Player A wins (but in no case Player B
wins), we look at the difference A\B. This is illustrated by the following Venn
diagram.

Figure 27: Difference of the Players A and B

Source: Heike Bornewasser-Hermes (2023).

113
Basically, Player A can win if 1, 2, or 4 is rolled. But if they want to be the only
winner, then the number 4 must not fall because in this case Player B will also
win. So, we can state that the difference “A without B” is characterized by
A\B = 1, 2 . If 1 or 2 is rolled, Player A will win but not Player B. For all fur-
ther pairings the differences can be formed in the same way:

• A\C = 1, 2, 4 , since A = 1, 2, 4 and C = 5, 6

• B\C = 3, 4 , since B = 3, 4, 5 and C = 5, 6
• B\A = 3, 5 , since A = 1, 2, 4 and B = 3, 4, 5
• C\A = 5, 6 , since A = 1, 2, 4 and C = 5, 6
• C\B = 6 , since B = 3, 4, 5 and C = 5, 6

For A\C as well as for C\A, we can observe that the first mentioned event
always occurs completely because both events have no similarities and are,
therefore, disjunctive.

RUNNING EXAMPLE: EXAM RESULTS

In the case of the exams, A\B stands for passing the statistics exam alone. Thus,
the difference B\A is characterized by passing only math but not statistics.

5.2 Probability of Events

So far, we have dealt with individual events and their operations or links to each other. But
what interests us in the end is how likely it is that, for example, Player A and in no case
Player B wins. What is the probability that a student passes both exams? In the following,
we will explain how to determine basic probabilities for individual events, which rules
apply to them, and how to calculate them.

Determination of Probabilities

The prerequisite for determining probabilities is that the result set of a random process is
known and finite. Thus, one must know how many results can occur in a random process.
Only because we know a die has six sides, we can determine that each side has a 1 in 6
chance of being rolled. If we assume that there are n equally possible outcomes of a ran-
1
dom event, then the probability of each elementary event is n . Therefore, for a die, each
Probability 1
side has an equal probability 6 of of being rolled. In general, the probability P of an event
A probability describes
the chance for the events A is defined by
of an event.
number of outcomes in A
P A = number of outcomes in Ω

114
Thus, one divides the number of favorable outcomes (contained in A) by the number of
possible outcomes (contained in Ω). Alternatively, to abbreviate the probability of the
event A, one can write

A
P A = Ω
,

where A is the number of outcomes in A and, correspondingly, Ω is the number of out-

comes in Ω. The two vertical lines (| |) represent the power of the respective event (Bortz & Power
Schuster, 2010, p. 51). The power represents the
number of outcomes in
the result set or in an
event.

RUNNING EXAMPLE: DICE GAME WITH THREE PLAYERS

With this in mind, let’s look at the dice game again. Each of the players A, B, and
C plays the game for themselves. What is the probability that each player will
win? Because the die is rolled only once, the result set (as explained above) is
Ω = 1, 2, 3, 4, 5, 6 . The power of the result set is therefore Ω = 6. Thus,
the probabilities for the victory of each individual can be calculated as follows:

3
• A = 1, 2, 4 A =3 P A =
= 0.5
6
3
• B = 3, 4, 5 B =3 P B = = 0.5
6
2
• C = 5, 6 C =2 P C = = 0.33
6

We see that the calculation of the probability depends only on how many favor-
able and possible outcomes exist. Even if A and B need different numbers to
win, they have the same probability of 50% to win the game. For Player C , only
two numbers lead to victory, which is why their probability of winning is 33%,
lower than that of the other two players.

RUNNING EXAMPLE: EXAM RESULTS

In the exam example, the situation is somewhat different. Here, we know from
the past numerous exams that the statistics exam (A) is passed with a probabil-
ity of 80 % and the math exam (B) with a probability of 70%. Both exams
(A ∩ B) are passed with a probability of 50%. We summarize this as follows:

P A =0.8
P B =0.7
P A ∩ B =0.5

115
Rules for Calculating Probabilities

Different rules apply to probabilities, including the following (Handl & Kuhlenkasper, 2018,
p. 183):

• The probability of each event A is always at least 0: P A ≥ 0. Therefore, negative

probabilities cannot exist.
• The maximum probability of an event is 1. This is always true for the result set
P Ω = 1, since a result from the result set will always occur (e.g., no matter how many
times the dice are tossed, a number from 1 to 6 is always rolled).
• The first two points taken together mean that a probability must always lie somewhere
from 0 to 1.
• If two events A and B are disjoint (they have no similarities), then the probability of the
union set is P A ∪ B = P A + P B .

We will look at this in more detail later.

These regularities result in the following calculation rules, which relate to the operations
between events described above. While we have only examined so far, for example, when
a player does not win or when Player A and Player B win together, we now want to calcu-
late the associated probabilities.

Referring to the complementary event described above, we now want to determine the
−
corresponding probability for such an event. The probability of a complementary event A
and, thus, the counter probability to an event A is calculated by

−
P A =1−P A .

If we know the probability for Event A, we must only subtract it from 1 (100%) to get the
probability for the complementary event. If, for example, we know that the probability of
rain is 30% tomorrow, it is also given that it will not rain with a probability of 70%. Based
on the Venn diagram, one can imagine it as follows:

116
Figure 28: Venn Diagram for Counter Probability

Source: Heike Bornewasser-Hermes (2023).

The entire box contains a probability of 1 or 100%. The circle contains the probability
P A determined for an event A. Everything around it describes the probability for the
−
complementary event P A = 1 − P A .

RUNNING EXAMPLE: DICE GAME WITH THREE PLAYERS

We now want to find out what the respective probability of defeat is for the three
players A, B, and C . Since we already know the probabilities of victory, we
must only subtract these from 1 to get the probabilities of a defeat:

• A wins with probability of P A = 0.5

−
◦ A loses with a probability of P A = 1 − 0.5 = 0.5
• B wins with probability of P B = 0.5
−
◦ B loses with a probability of P B = 1 − 0.5 = 0.5
• C wins with probability of P C = 0.33
−
◦ C loses with a probability of P C = 1 − 0.33 = 0.67

Since both A and B have a 50% probability of winning, the probability of losing
is also 50% in each case. For C , the probability of losing is higher at 67%, since
the probability of winning is only 33%.

117
RUNNING EXAMPLE: EXAM RESULTS
For passing the exams, we know that the probabilities are P A = 0.8 for statis-
tics and P B = 0.7 for math. This means that we have the following informa-
tion:

−
• probability of not passing the statistics exam: P A = 1 − 0.8 = 0.2
−
• probability of not passing the math exam: P B = 1 − 0.7 = 0.3

Now, we will consider the probability of a union set. We will assume that the two events A
and B are not disjoint and, thus, have an intersection. The probability that at least one of
the two events and, therefore, the union set A ∪ B occurs is as follows:

P A∪B =P A +P B −P A∩B

Accordingly, the two probabilities of the individual events A and B are first added. Next,
the probability for the intersection is subtracted once.

RUNNING EXAMPLE: DICE GAME WITH THREE PLAYERS

Let’s explain why this is the case using our dice example. So, let’s take a more
detailed look at the two players A (A = 1, 2, 4 ) and B (B = 3, 4, 5 ). For
our present purpose, it is useful to clarify the situation of the two players once
again by means of the Venn diagram.

Figure 29: Venn Diagram for the Probability of the Union Set

Source: Heike Bornewasser-Hermes (2023).

We now want to find out the probability that at least one of the two players will
win. Remember, this can mean that only A wins (on 1 or 2), only B wins (on 3 or
5), or both win (on 4). If we calculate the probability using the above formula,
the following happens:

3 3 1 5
P A∪B = 6
+ 6
− 6
= 6

118
3
We start with the probability that Player A will win. This is , which we deter-
6
mined based on the favorable results 1, 2, and 4. The probability of victory for
3
Player B is then added to it. This is also , since the numbers 3, 4, and 5 lead to
6
victory for this player. As we see, the number “4” has been considered twice so
far. Meaning, it is contained in the probability P A as well as P B . Since this is
always the case for two non-disjoint events, the probability for the intersection
(in this case, the probability for 4) must always be subtracted once. In the end,
5
there is the probability of = 0.83 or 83% that at least one of the two players
6
will win. It is not that surprising because with the numbers 1, 2, 3, 4, and 5
(see the Venn diagram), five out of the six numbers can lead to this situation.

For the interaction of the players B B = 3, 4, 5 and C C = 5, 6 , the

probability of the union set is determined as follows:

3 2 1 4
P B∪C = 6
+ 6
− 6
= 6

At this point, with the number 5, the commonality of the two events exists,
1
which is why the probability for the intersection, which is in the amount of ,
6
must also be subtracted once.

The situation is somewhat different for the two events A A = 1, 2, 4 and

C C = 5, 6 . If we want to determine the probability that at least one of the
two events occurs, then we proceed in the same way. However, we will notice
that we do not have to subtract a probability for the intersection, since there is
none:

3 2 5
P A∪C = 6
+ 6
−0= 6

Because A and C cannot win at the same time, “at least one wins” means that
either A or C wins. For this reason, nothing more is subtracted from the sum of
the two individual probabilities.

In general, the probability for the union of any two disjoint events A and B is always as
follows:

P A∪B =P A +P B

RUNNING EXAMPLE: EXAM RESULTS

For passing each exam, we know the probabilities are P A = 0.8 for statistics
and P B = 0.7 for math. The probability of passing both is P A ∩ B = 0.5,
meaning that it is calculated by

119
P A ∪ B = 0.8 + 0.7 − 0.5 = 1 .

So, we are pleased to see that at least one of the two subjects is passed with a
probability of 100%.

The probability for the differenceA\B (A shall occur but not B) of two events A and B is
calculated by the difference of the probability for the eventA minus the probability of the
intersection A ∩ B:

P A\B = P A − P A ∩ B .

Because we want to only find out the probability of A, we must subtract its possible com-
monality with another event, since this would otherwise occur.

RUNNING EXAMPLE: DICE GAME WITH THREE PLAYERS

Again, we take a more detailed look at the two players A A = 1, 2, 4 and
B B = 3, 4,5 . Player A wants to know what their chances of winning the
game are without B also winning. In principle, A can win with 1, 2, or 4. With 4,
however, B can also win.

Figure 30: Venn Diagram for the Probability of the Difference A\B

Source: Heike Bornewasser-Hermes (2023).

For this reason, the probability for Player A to win alone is as follows:

3 1 2
P A\B = 6
− 6
= 6

For the remaining constellations, the individual victories are as follows:

3 3
• P A\C = − 0 = , since A = 1, 2, 4 andC = 5, 6 , so A always
6 6
wins alone.
3 1 2
• P B\C = − = , since B = 3, 4, 5 and C = 5, 6 .
6 6 6

120
3 1 2
• P B\A = − = , since A = 1, 2, 4 and B = 3, 4, 5 .
6 6 6
2 2
• P C\A = − 0 = , since A = 1, 2, 4 and C = 5, 6 , so C always
6 6
wins alone.
2 1 1
• P C\B = − = , since B = 3, 4, 5 and C = 5, 6 .
6 6 6

RUNNING EXAMPLE: EXAM RESULTS

In terms of passing the individual exams, we know the probabilities are
P A = 0.8 for statistics and P B = 0.7 for math. The probability of passing
both is P A ∩ B = 0.5. This means the following:

• probability of only passing statistics: P A\B = 0.8 − 0.5 = 0.3

• probability of only passing math: P B\A = 0.7 − 0.5 = 0.2

Finally, it should be mentioned that individual events can be partially independent of

each other. If we consider the two events A and B and neither the occurrence of A will
influence the occurrence of B nor vice versa, then we are talking about independent
events. For example, if we roll a die twice in a row and we want 1 to be rolled the first time Independent events
A = 1 and 4 the second time B = 4 , then Event A, which may occur first, has no Note that independent
events occur when the
influence on the occurrence of Event B. The die has six sides, and each side has the same occurrence of one event
chance of being rolled on the first roll. In the best case scenario, 1 will be rolled first. On does not affect the occur-
the second roll, all six possibilities exist again. The probability that two independent rence of the other event.

events occur together is

P A∩B =P A ·P B

In this case, one simply multiplies the two individual probabilities together to obtain the
probability of the joint occurrence (Bortz & Schuster, 2010, p. 54). In the die example with
A = 1 and B = 4 , this means that

1 1 1
P A∩B = 6
· 6
= 36
.

In this example, we can deduce that the two events must be independent of each other,
since the die always has the same initial state before each roll. In our exam case, it is not
obvious that passing the statistics exam is independent of passing the math exam or not.
This information must be known so that we can calculate the probability of the average
set as just described.

121
5.3 Random Variables and Their
Distributions
In the context of descriptive univariate statistics, we dealt with variables or characteristics
for which we had concrete data. We, therefore, evaluated individual variables based on a
sample by determining, for example, how often the individual values of the variable would
occur in the sample. In the case of gender, for instance, we can determine based on a sam-
ple that how many people belong to the female, male, or diverse group. However, we are
often not in the fortunate situation of having such a sample available. Nevertheless, we
would like to be able to make statements about certain variables. Since we do not always
have concrete data, we must make use of a probability calculation. With its help, we can
determine with which probability the individual expressions of the variables can occur.
Variables that have their outcomes (values) depending on chance are called random vari-
Random variable ables (Nachtigall & Wirtz, 2013, p. 56). As in descriptive statistics, a distinction is made
A random variable between discrete and continuous random variables.
depends on chance for its
outputs.
Discrete Random Variables

They are characterized by the fact that they have a countable set of different values. The
starting point of a discrete random variable is always a random experiment or a random
process, the result set of which must be recorded in the first step.

EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A COIN IS

TOSSED TWICE
As a random process, we consider tossing a coin twice. The result set of this
operation looks like this:

Ω = HH, HT , T H, T T

The coin can land on heads both the first and the second time (HH ). Likewise, it
can land on tails the two times (T T ). In addition, it can land on heads the first
time and tails the second time (HT ) and vice versa (T H ).

By considering a random variable X, we are now interested in a very specific variable that
ensures that the results of the above result set are converted into real numbers (Handl &
Kuhlenkasper, 2018, p. 224):

X: Ω ℝ

Depending on the definition of the random variable, the four different results of the result
set above are changed to numbers. These concrete numbers are marked with the small
letter x. The totality of all expressions of a random variable X is referred to as the carrier
T X of X.

122
Carrier
The carrier contains all
possible expressions of a
RUNNING EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A random variable.
COIN IS TOSSED TWICE
Focusing on our random variable X, we are now interested in the “number of
heads H when a coin is tossed twice.” If, for example, heads are tossed twice in
succession, the random variable takes the value “2.” If the coin lands on heads
first and then tails or vice versa, the random variable has the value “1.” If the coin
lands on tails twice in succession, this means a value of “0” for the random varia-
ble:

HH 2
HT 1
TH 1
TT 0

The random variable X can, therefore, take the values 0, 1, and 2:

T X = 0, 1, 2 .

If we were in a game of chance, it would certainly be of great interest to know the proba-
bility of the individual outcomes of the random variables. If we know all possible results of
the random process and their probabilities, we can also determine the probabilities for
the characteristics of the random variables. This is done by the probability function Probability function
fX x (Handl & Kuhlenkasper, 2018, p. 225): The probability function
describes the probability
for each individual
x fX x = P X = x for all x ∈ ℝ expression of a random
variable.

According to the function, we go through each individual expression x 0,1, 2 and deter-
mine the probability for it. The function is assigned the letter f because it represents the
counterpart to the relative frequencies in descriptive statistics. The large X is at the bot-
tom of the index to indicate that it is a random variable, and the small x in the parenthe-
ses says that we determine the probability for each expression. The following two condi-
tions must apply to the probability function: (1) There must be no negative probabilities,
and (2) the sum of all probabilities must add up to 1.

RUNNING EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A

COIN IS TOSSED TWICE
We now determine the probabilities of the three outcomes 0, 1, and 2 of the
random variable step by step. First, we know that each result of the result set
has the same probability 1 in 4 or 0.25 to occur:

123
P HH =0.25
P HT =0.25
P T H =0.25
P T T =0.25

From this, the probabilities for the characteristics of the random variables can
be derived:P X = 0 = P T T = 0.25
P X=1 =P T H + P HT = 0.25 + 0.25 = 0.5
P X=2 =P HH = 0.25

The probability that the random variable takes the value 0 (no heads) is equiva-
lent to the fact that two consecutive numbers must fall. This is one of the four
outcomes of the result set, which is why we note the probability of 0.25. The
probability that getting heads once is higher because this can occur by two pos-
sible outcomes of the random process (T H and HT ). Consequently, the proba-
bility of 0.25 must be considered twice, that is why we arrive at the probability of
0.5. For the value 2 of the random variable, the probability is the same as for the
value 0. These probabilities are now combined into a joint probability function:

0.25 for x = 0
0.5 for x = 1
fX x = P X=x =
0.25 for x = 2
0 else

Each line describes one characteristic of the random variable and its probability.
It always starts with the smallest value. The last line “0 else” is to signal that
there are no further values. This remark is always considered in the last line of
the probability function.

Alternatively, the probability function can also be represented in the manner of a fre-
quency table (with which we are already familiar).

Table 30: Frequency Table for Probability Function

x fX x

o 0.25

1 0.5

2 0.25

∑ 1

Source: Heike Bornewasser-Hermes (2023).

124
What in descriptive statistics is the expression aj of a variable is now the expression x of a
random variable. Likewise, the relative frequency f j is now replaced by the probability
fX x .

Graphically, the probability function can be represented with the help of a bar chart. This
is done in a similar way to the representation of relative frequencies in descriptive univari-
ate analysis. For our example, the bar chart looks as follows:

Figure 31: Bar Chart of the Probability Function

Source: Heike Bornewasser-Hermes (2023).

With the help of the probability function, all probabilities can now be calculated.

RUNNING EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A

COIN IS TOSSED TWICE
We look at some examples related to the act of tossing a coin:

• What is the probability that at most we have heads occur once?

P X ≤ 1 = fX 0 + fX 1 = 0.25 + 0.5 = 0.75

The probability is 75%.

• What is the probability that heads happens more than once?

P X > 1 = fX 2 = 0.25

The probability is 25%.

125
• What is the probability that heads happens at least once but not more than
twice?

P 1 ≤ X ≤ 2 = fX 1 + fX 2 = 0.5 + 0.25 = 0.75

The probability is 75%.

Such a random variable can be described by various measures, three of which we will
Expected value explain. The first measure is the expected value, written as E X or the Greek letter μ. It
The expected value is the counterpart to the mean value of descriptive statistics and can also be calculated in
describes the expected
expression of a random this way (Nachtigall & Wirtz, 2013, p. 60):
variable.

μ=E X = ∑ x · fX x
x x ∈ TX

So, if we have the probability function, we multiply the expression in each row by the
probability and finally add up all the products. Please note that we speak here of an
expected value and not of a mean value since due to the uncertainty of the occurrence of
certain events, only results can be “expected.” Mean values are based on concrete obser-
vations.

RUNNING EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A

COIN IS TOSSED TWICE
Here is the calculation for our example with the help of the probability function:

0.25 for x = 0
0.5 for x = 1
fX x = P X=x =
0.25 for x = 2
0 else

The expected value is as follows:

μ = E X = 0 · 0.25 + 1 · 0.5 + 2 · 0.25 = 1

It states that, as expected, heads will fall once if you flip a coin twice in a row.

Whether the random variable is strongly dispersed or not is in turn described by the var-
Variance ianceand then by the standard deviation. While in the context of descriptive statistics we
The variance describes speak of a sample variance (there is a concrete sample), here only the term “variance” is
the dispersion of a ran-
dom variable. used. The abbreviation of the variance is either V ar X or the Greek letter σ2. The var-
iance is calculated using the following formula (Handl & Kuhlenkasper, 2018, p. 244):

126
2
σ2 = V ar X = E X2 − E X Standard deviation
The standard deviation
stands for the expected
The calculation should look familiar to us. We need two expectation values. For the back deviation from the expec-
part of the formula, the simple expectation value E X is relevant, which finally must be ted value.

squared. For the front part, the expected value of the squared expressions E X2 is used.
Here, only the individual expressions are squared in the calculation. If we get a result for
the variance in the form of a number, we will face the problem that it cannot be interpre-
ted by squaring. If we can take the root, we will get the standard deviation σ:

σ = σ2

This gives us the expected deviation from the expected value.

RUNNING EXAMPLE: VALUES OF THE RANDOM VARIABLE WHEN A

COIN IS TOSSED TWICE
We again use the probability function to calculate all relevant results for our
example:

0.25 for x = 0
0.5 for x = 1
fX x = P X=x =
0.25 for x = 2
0 else

Now, we determine the two expected values, the variance, and the standard
deviation:

E X =0 · 0.25 + 1 · 0.5 + 2 · 0.25 = 1

E X2 =02 · 0.25 + 12 · 0.5 + 22 · 0.25 = 1.5

σ2=1.5 − 12 = 0.5
σ= 0.5 = 0.707 .

We can state that, as expected, heads is shown1 ± 0.707 times when a coin is
tossed twice.

Continuous Random Variables

Continuous random variables must be handled in a completely different way than discrete
ones. We are dealing here with random variables that can take on many different forms.
Since calculations by integral calculus are necessary at this point, we will only explain the
absolute basic features of a continuous random variable.

127
Take a moment to recall what we know about descriptive statistics. For continuous char-
acteristics, we graphically depict the empirical densities (relative frequencies divided by
the class width) in a histogram. The area within a class reflects the relative frequency in
that class, and the total area reflects the total relative frequency of 1. Also in this context,
the probabilities are not directly represented graphically. Rather, the corresponding densi-
ties are. This is because a continuous random variable can take on so many different
expressions that a consideration of individual expressions is pointless. The starting point
Density function here is, therefore, a density function fX x (Handl & Kuhlenkasper, 2018, p. 231).
The density function con-
tains the characteristics
and the densities of a
continuous random varia-
ble.
EXAMPLE: WAITING TIME
As an example, we consider a random variable X that measures the waiting time
in line at the bakery. In the best case, one does not have to wait at all. Thus, the
waiting time is 0 minutes. Since time is limited for most people, the maximum
waiting time that can be tolerated would be 10 minutes. Because any waiting
time (for instance, 3.52 minutes) can occur between 0 and 10 minutes, this ran-
dom variable is called a continuous one. The density function of this random
variable is now given by the following:

fX x =

0.1 for 0 ≤ x ≤ 10
0 else

It states in the first line that the random variable assumes a density of 0.1
between 0 and 10 minutes, inclusive. Waiting times smaller than 0 minutes or
larger than 10 minutes are not possible. These are referenced in the second line:
“else 0.” Graphically, the density function takes the following form.

128
Figure 32: Density Function for a Continuous Random Variable

Source: Heike Bornewasser-Hermes (2023).

We can observe the waiting times on the x-axis and the densities on the y-axis.
In the interval from 0 to 10 minutes, a density of 0.1 is constantly entered by a
horizontal line.

This distribution is also called a uniform distribution or rectangular distribution since it Uniform distribution
has one and the same density for the entire valid range of values and consequently A uniform distribution
contains the same density
assumes the shape of a rectangle. For ranges smaller than 0 or larger than 10, we find indi- over the entire range of
cated horizontal lines at the height 0. values.

It is important to mention that these densities do not offer any interpretation of the con-
tent. The decisive values are located in the area below this density because the probabili-
ties are to be found there. The total area must always result in a probability of 1 or 100%
since it is obvious that the waiting time with a probability of 100% lies somewhere from 0
to 10 minutes. But how do we arrive at this result mathematically? We see that we have a
rectangle at the top. The area of a rectangle is always calculated by height · width. Since
we have here a height of 0.1 and a width of 10, we get the necessary area:

height · width = 0.1 · 10 = 1

So, the probability that the waiting time is between 0 and 10 minutes, inclusive, is 1 or
100%.

According to this principle, all other areas and, thus, probabilities can be calculated.

129
RUNNING EXAMPLE: WAITING TIME
For example, what is the probability that the maximum waiting time is three
minutes? The surface we are looking for is now the one outlined in red.

Figure 33: Density Function with a Waiting Time of Maximum Three

Minutes

Source: Heike Bornewasser-Hermes (2023).

If we calculate the area according to the same principle, we still need a height of
0.1 but only a width of 3:

height · width = 0.1 · 3 = 0.3

With a probability of 30%, the waiting time is a maximum of 3 minutes.

If we want to know the probability of having to wait between 4 and 6 minutes,

inclusive, at the bakery, the area from 4 to 6 is relevant.

130
Figure 34: Density Function with a Waiting Time of Four to Six Minutes

Source: Heike Bornewasser-Hermes (2023).

The width is given by 2, and the height is still given by 0.1:

height · width = 0.1 · 2 = 0.2

We are, therefore, exposed to a waiting time of between 4 and 6 minutes, inclu-

sive, with a probability of 20%.

Finally, we want to know the probability of having to wait at least 6 minutes. So,
the total area between 6and 10 is needed.

131
Figure 35: Density Function with a Waiting Time of At Least Six
Minutes

Source: Heike Bornewasser-Hermes (2023).

With a width of 4 and a height of 0.1, we calculate the probability.

height · width = 0.1 · 4 = 0.4

With a probability of 40%, we wait at least 6 minutes in line at the bakery.

Please note that the probabilities at a certain point are subordinate for continuous ran-
dom variables because they are almost 0. Therefore, only probabilities in certain intervals
of the random variable are pertinent. Furthermore, the density function discussed here in
the form of a uniform distribution only represents one of many examples of a density func-
tion. There are many other forms of density functions, but all of them without exception
require the integral calculus for the area calculation. The calculation of the expected value
and variance also requires the integral calculus for continuous random variables, which is
why we did not cover this in this unit.

SUMMARY
Many processes in everyday life depend on chance. For unknown out-
comes of a process, probabilities are determined to make them more
tangible.

In this unit, the basic principle of calculating probabilities for certain

events, individual operations between events, and the computation of
the probabilities of these operations were explained. Finally, we dis-

132
cussed certain variables dependent on chance with the discrete and
continuous random variables. We then calculated probabilities for them
and, in the case of discrete random variables, determined the corre-
sponding distribution parameters such as expected value, variance, and
standard deviation.

133
UNIT 6
SPECIAL PROBABILITY DISTRIBUTIONS

STUDY GOALS

On completion of this unit, you will be able to ...

– distinguish the binomial distribution from the geometric distribution.

– calculate probabilities for geometrically, binomially, and normally distributed random
variables.
– calculate quantiles of normally distributed random variables.
– distinguish the t-distribution from the standard normal distribution.
6. SPECIAL PROBABILITY DISTRIBUTIONS

Introduction
For certain types of frequently used random variables, there are ready-made distribution
models that simplify working with them considerably. The advantage of these models is
that, for example, probability functions, expected values, and variances can be deter-
mined much more quickly. While there are numerous such models, we will focus on
explaining two discrete and two continuous models in this unit.

6.1 Discrete Distributions

Discrete distribution models are those based on discrete random variables. This means
that such random variables can only take on a few different values. Two such models are
based on the Bernoulli process.

When considering a random event, we are specifically interested in an event A with a

probability P A = p. Such an event is also referred to as a Bernoulli trial, and the occur-
rence of A is referred to as a success. If not just one Bernoulli trial is conducted, but multi-
Bernoulli process ple trials are performed, a Bernoulli process emerges. This process is based on two
A Bernoulli process assumptions (Handl & Kuhlenkasper, 2018, p. 275):
underlies certain discrete
distribution models.
1. The Bernoulli trials are independent of each other.
2. The probability of success remains constant for each Bernoulli trial.

EXAMPLE: MULTIPLE-CHOICE EXAM

As an example, assume that we are taking a multiple-choice exam. We have not
studied, and we are now trying to guess the correct answer to each question.
Each question has four possible answers, of which only one is correct. It is a suc-
cess for us if we guess the correct answer to a question. The success event is
always denoted by A (so, A = correct answer). If our guesses are wrong, there
−
will be a failure, denoted by A . Since one answer out of four possible answers is
correct for each task (i.e., for each individual Bernoulli trial), the probability for
1
A is constant at P A = = 0.25. Because we assume that answering a new
4
question is independent of answering the previous questions, the individual
Bernoulli trials are equally independent of each other.

A Bernoulli process is about counting something. Two types of random variables can be of
interest (Handl & Kuhlenkasper, 2018, p. 276):

136
1. The number of successes in n Bernoulli trials
2. The number of failures until the first success

The first variant describes the situation in which there are, for instance, 10 multiple-choice
questions and we are interested in knowing how many of them we will guess correctly.
The second variant of a random variable is about how many multiple-choice questions we
answer incorrectly until we guess one correctly the first time. Basically, we count the failed
attempts before the first success.

The first type of random variable is described by the binomial distribution, and the second
one by the geometric distribution. We will explain these two distributions in the following
sections. The advantage of both is that there is a predefined probability function that can
be used to calculate every probability. Also, the expected value and the variance of the
random variables can be determined with quite simple formulas without much effort.

Binomial Distribution

With the binomial distribution, we are dealing with a random variable that counts the suc-
cesses in n Bernoulli trials. Thus, for instance, we count how many of the 10 multiple-
choice questions we will guess correctly. To work with a binomially distributed random
variable, two parameters must always be determined: (1) The number of Bernoulli trials Binomially distributed
n, and (2) the probability of success p. Individual probabilities as well as expected value random variable
A binomially distributed
and variance can only be calculated if these two values are known. random variable counts
the number of successes.
Let’s start with the probabilities. We want to be able to calculate any probability. So, if we
have recognized that the random variable X under consideration is a binomially distrib-
uted one, then the big advantage is that there is a function with which we can determine
all probabilities. This probability function takes the following form (Handl & Kuhlenkasper,
2018, p. 276):

n n−x
P X=x = x · px · 1 − p for x ∈ T X

Let’s explain this function now. We want to determine the probability for x successes. To
do this, we first start by determining with nx all the possibilities of achieving x successes
in n Bernoulli trials. How we calculate this binomial coefficient will be explained using Binomial coefficient
The binomial coefficient
the example. Consider px. Here, the probability for the success p is then multiplied x times
counts the possibilities of
by itself px , since we assume exactly x successes. Therefore, if we have x successes, then achieving successes in tri-
we must have n − x failures for n Bernoulli trials. The probability of failure 1 − p is, thus, als.
n−x
multiplied by itself a total of n − x times: 1 − p . As shown in the above equation,
all three terms discussed thus far are multiplied together due to the independence of each
Bernoulli trial. Finally, x ∈ T X stands for the fact that any value of the carrier of X can be
substituted for x. But what is in the carrier of X? If we consider n Bernoulli trials, there can
be at most n successes. In the worst case, there is no success. Consequently, all values
between 0 and n, inclusive, can be inserted into the probability function forx above.

137
RUNNING EXAMPLE: MULTIPLE-CHOICE EXAM
We consider the above example once more. We are taking an exam with 10 mul-
tiple-choice questions. We have not studied and want to answer each question
by guessing. Each question has four possible answers, of which only one is cor-
rect. All questions can be answered independently. With the random variable X,
we are interested in the number of correctly guessed answers. This is all the
information we have.

This random variable is binomially distributed because

• the successes (guessing correctly) are counted within a certain number of

operations (n = 10),
• the individual operations (questions) are independent of each other, and
1
• the probability of success (p = = 0.25) is constant across all tasks.
4

Knowing this, we can now set up the probability function, with the help of which
we can calculate any probability:

n n−x
P X=x = x · px · 1 − p
10 − x
= 10
x · 0.25x · 1 − 0.25

= 10
x · 0.25x · 0.7510 − x

With this function, all probabilities for x = 0 to x = 10 can be calculated. In the

worst case, we guess no question correctly (x = 0). In the best case, we succeed
in all tasks (x = 10). To set up the above general probability function, we simply
replace n by 10 at all points and p by 0.25. Now, we determine some probabili-
ties for the following example questions.

Question 1: What is the probability of answering two multiple-choice ques-

tions correctly?

We should replace the in the x above function at the three relevant places by “2”
in this problem definition:

P X=2 = 10
2 · 0.252 · 0.7510 − 2

=45 · 0.252 · 0.758

=0.282

The binomial coefficient is, therefore, 10

2 and can be calculated using the cal-
culator by pressing the “nCr” key. If you enter “10 nCr 2” into the calculator
(you may have to use “2nd” or “Shift”), you should get the number 45. This
means that with 10 multiple-choice questions, we have 45 opportunities to
guess two questions correctly. In the next step, the probability of success 0.25 is

138
multiplied by itself twice, since we want two correct answers. Finally, with two
successes and 10 questions, there must be eight failures. So, the probability of
failure 0.75 is multiplied eight times by itself. If we now multiply the three terms
together, we get a probability of 28.2% for guessing two out of 10 multiple-
choice questions correctly.

Question 2: What is the probability that seven multiple-choice questions are

answered correctly?

We proceed according to the same principle; we only replace the x at this point
by “7”:

P X=7 = 10
7 · 0.257 · 0.7510 − 7

=120 · 0.257 · 0.753

=0.0031

The probability of correctly guessing seven multiple-choice questions is rela-

tively low, 0.31%.

Question 3: What is the probability of correctly guessing fewer than two

multiple-choice questions correctly?

Here, it is important to realize that “fewer than two” means that either no tasks
(0) or one task (1) is solved correctly. We, therefore, must add up the probabili-
ties for no correct answers and one correct answer: 1

P X < 2 =P X = 0 + P X = 1
= 10
0 · 0.250 · 0.7510 − 0 + 10
1 · 0.251 · 0.7510 − 1

= 10
0 · 0.250 · 0.7510 + 10
1 · 0.251 · 0.759

=1 · 0.250 · 0.7510 + 10 · 0.251 · 0.759

=0.0563 + 0.1877
=0 . 244

Overall, this gives us a probability of 24 . 4% that less than two multiple-choice

tasks will be solved correctly.

Expected value and variance of a binomially distributed random variable

Finally, we would like to be able to determine the expected value and variance of a bino-
mially distributed random variable. Thus, if we are interested in determining the expected
number of successes, we calculate the expected value:

E X =n·p

139
We first determine the dispersion around the expected value by the variance:

V ar X = n · p · 1 − p

This gives us the interpretable standard deviation σ by taking the square root (Handl &
Kuhlenkasper, 2018, pp. 276–277).

RUNNING EXAMPLE: MULTIPLE-CHOICE EXAM

For the present example, the expected value is now obtained by

E X = n · p = 10 · 0.25 = 2.5 .

For 10 multiple-choice questions, we can expect 2.5 correct answers. The disper-
sion around this expected value is first calculated with the variance:

V ar X = n · p · 1 − p = 10 · 0.25 · 1 − 0.25 = 1.875

As we have already learned, the variance cannot be interpreted in a meaningful

way. Accordingly, we take the root of the 1.875 just obtained and, thus, calculate
the standard deviation:

σ = 1.875 = 1.369

Overall, we conclude that, as expected, we can correctly guess 2 . 5 ± 1.369 in 10

multiple-choice questions.

Geometric Distribution

While the binomial distribution counts how many successes occur within a certain number
Geometrically of random processes, a geometrically distributed random variable X counts the failures
distributed random until the first success can be recorded. For example, we check the questions one by one
variable
A geometrically distrib- and count the incorrectly answered ones until the first time a question is guessed cor-
uted random variable rectly. Again, the event A is the one that describes the success. In this case, it is not the
counts the failures until questions we count but the one we are waiting for (i.e., a correctly guessed question). The
the first success.
probability for the event A is again P A = p, which is why the probability for the coun-
− −
ter-event A (an incorrectly answered multiple choice question) is again P A = 1 − p.
The only parameter that is important for the geometric distribution is the probability of
success p. The number of Bernoulli trials is not known here. Rather, that is what we want
to find out. Thus, the probability function of the geometrically distributed random varia-
ble takes a simple form (Handl & Kuhlenkasper, 2018, p. 278):

x
P X =x = p· 1−p for x ∈ T X

140
The carrier T X can take any value from 0 to infinity because, in the best case, we get a
success directly at the first run (so that there are 0 failures) or it can take infinitely long
until we get a success the first time. The probability function above is like this for the fol-
lowing reasons: If we want to determine the probability of x failures, we must multiply the
x
probability of failure 1 − p x times by itself. For this reason, we write 1 − p . We only
need to consider the probability of success p since we are only waiting for a success.
Because all the operations are independent, the probabilities are related by multiplying to
x
p· 1−p .

RUNNING EXAMPLE: MULTIPLE-CHOICE EXAM

Let’s consider the exam situation again. We have an exam with an infinite num-
ber of questions in front of us, and we are interested in the number of wrongly
guessed answers before the first correct answer with the random variable X.
The number of wrongly guessed answers is the same as the number of correct
answers. In each multiple-choice question, only one of four possible answers is
correct, and all questions are independent of each other. Again, this is all the
information we have. This random variable is geometrically distributed because

• we are waiting for a success (a correctly guessed task),

• no number of Bernoulli trials is given (the total number of questions is
unknown),
• the individual trials (questions) are independent of each other, and
1
• the probability of success p = = 0.25 is constant across all tasks.
4

Thus, the general probability function is as follows:

x
P X = x = 0.25 · 1 − 0.25 = 0.25 · 0.75x

With this, countless probabilities can now be determined. Let us look at two
example questions:

Question 1: What is the probability that the third question is the first cor-
rectly solved one?

This is equivalent to answering the first two questions wrongly and having to
put in a “2” for x:

P X = 2 = 0.25 · 0.752 = 0.1406

Thus, the probability is 14.06% that the first two multiple-choice questions are
solved incorrectly (0.752) and the third one correctly (0.25).

Question 2: What is the probability of correctly guessing the second multi-

ple-choice question at the latest?

141
This means that either the first one will be guessed correctly (so there will be 0
misses) or the second one will be guessed correctly (so, there will be 1 miss):

P X ≤ 2 =P X = 0 + P X = 1
=0.25 · 0.750 + 0.25 · 0.751
=0.25 + 0.1875
=0.4375

Thus, with a probability of 43.75%, the second multiple-choice question is

answered correctly at the latest.

Finally, the expected value and the variance of a geometrically distributed random varia-
ble can be easily calculated using two given formulas. For the expected value, we use the
following:

1−p
E X = p

This tells us how many failures to expect before the first success. For the variance, we use
this:

1−p
V ar X =
p2

This gives a result that is, once again, not interpretable. However, if we take the square
root of the variance, we obtain the standard deviation σ and, thus, the average deviation
from the expected value (Handl & Kuhlenkasper, 2018, p. 279).

RUNNING EXAMPLE: MULTIPLE-CHOICE EXAM

If we want to know how many multiple-choice questions we should expect to be
answered incorrectly before the first task is guessed correctly, we can calculate
the following:

1−p 1 − 0.25
E X = p
= 0.25
=3

We can, therefore, assume that three questions are answered incorrectly and
the fourth question is the first correctly answered one. If we also want to deter-
mine the dispersion around this expected value, then we first calculate the var-
iance of 9 as follows:

1−p 1 − 0.25
V ar X = = = 12
p2 0.252

142
This cannot be interpreted, as mentioned several times already, but it becomes
an interpretable result by taking the square root of 9:

σ = 12 = 3 . 46

Finally, we can state that, as expected, we will answer 3 ± 3.46 questions incor-
rectly until we guess the first one correctly.

6.2 Continuous Distributions

Just as there are discrete distribution models for special discrete random variables, there
are also continuous distribution models for selected continuous types of random varia-
bles. Here, we will mainly deal with the normal distribution as the most important contin-
uous distribution. The t-distribution, which is very similar to the normal distribution, will
also be discussed. As described before, the starting point of a continuous random variable
is always a density function. So, it is also in the case of special continuous distributions.

Normal Distribution

Many variables of everyday life are normally distributed, such as IQ scores or exam scores.
It is, therefore, important to know what characterizes a normally distributed random
variable. Normally distributed random variables are characterized by the fact that a cer- Normally distributed
tain value range of a variable occurs with a high probability while other value ranges occur random variable
A normally distributed
with a rather low probability, i.e., the variable is distributed symmetrically around an random variable is dis-
expected value. tributed symmetrically
around an expected
value.
Consider, for example, IQ scores. It is very likely that a person has an IQ between 90 and
110, inclusive. An IQ of less than 90 or more than 110 is very unlikely. The same is often
true about exams. The scores obtained in exams are often in a central range with a high
probability. Let’s say, in an exam one can achieve 100 points. However, so often the evalu-
ation of the scores shows that the probability of achieving between 60 and 70 points,
inclusive, is very high. The probability of getting less than 60 and more than 70 points is
again very low. If such distributions of random variables are present, they can often be
described by a normal distribution. Basically, we must always know or be given whether a
variable is normally distributed or not. In addition, we must know two parameters:

1. The expected value of the normally distributed random variable, which is denoted by
the Greek letter μ
2. The dispersion of the variable under consideration

143
This can be either in the form of the variance σ2 or the standard deviation σ. The fact that
a random variable X is normally distributed with expected value μ and variance σ2 is also
written as X N μ, σ2 . For the further arising calculations, it is important that we work
with the standard deviation σ. If only the variance is given, it is advisable to take the
square root at the beginning and then calculate σ.

EXAMPLE: TRAVEL TIME

As an example, we consider a random variable X that describes the travel time
of an employee to work. It is known that this is normally distributed with an
expected value of 40 minutes and a variance of 4. We, therefore, write
X N 40,4 . Such normally distributed random variables always have a density
function in the form of a bell, as the following figure shows for our example.

On the x-axis, we find the possible travel times to work. We see here a division
from 30 to 50 minutes, since obviously all other times (less than 30 or more than
50 minutes) are improbable. On the y-axis, we find the densities. The actual
probabilities are located in the area below the bell. Where we find the expected
value at 40 minutes on the x-axis, the density takes the highest point. It then
drops evenly to both sides. There is, therefore, a perfectly symmetrical distribu-
tion around the expected value. The variance in the amount of “4” cannot be
read directly from the graph. It is only noticeable in the fact that the density
function falls from the highest point very steeply or rather slowly or flatly. If you
have a small variance and, thus, a small dispersion, the density function falls off
very steeply. This is equivalent to the fact that travel times around 40 minutes
are very likely.

Figure 36: Representation of the Density Function for the Travel Time

Source: Heike Bornewasser-Hermes (2023).

144
If the variance and, thus, the dispersion were very large, then the density func-
tion would decay much more slowly and give room for much shorter or longer
travel times.

This is the initial situation. We know that the travel time is normally distributed.
We also know that the expected value is 40 minutes, the variance is 4, and,
therefore, the standard deviation is 4 = 2. The goal is now to determine proba-
bilities for certain ranges of random variables. For example, we might be interes-
ted in the probability of taking a maximum of 36 minutes to get to work, or we
might want to determine the probability of taking from 38 to 39 minutes. We
must be aware that this means that we must determine areas below the density
function. The total area below the density function is 1 or 100% because the
probability of using any travel time to work is 100%. Also, this should be clear for
understanding that the probability of needing a maximum of 40 minutes is 50%.
Likewise, the probability of needing 40 minutes or more is 50%. Since 40 is
exactly in the middle of a symmetric distribution, there is 50% probability mass
both below and above the expected value of 40. Thus, the expected value of a
normally distributed random variable corresponds exactly to the median.

If we now want to determine such probabilities, we must always follow a very specific
path: It would be quite complicated to calculate areas and, hence, probabilities for a den-
sity function like the one above. For this reason, we use the standard normal distribu- Standard normal distri-
tion. This is a special form of the normal distribution which always has an expected value bution
The standard normal dis-
of 0 μ = 0 and a variance and thus standard deviation of 1 σ2 = σ = 1 . A standard nor- tribution is a special case
mally distributed random variable is always denoted by Z (Bortz & Schuster, 2010, p. 71). of the normal distribu-
tion.

The great advantage of these standard normally distributed random variables is that there
is a table with all cumulative probabilities for them, from which we can also read out the
relevant probability for a normally distributed random variable. But what do we have to
do for this? If we are looking for the probability that a normally distributed random varia-
ble X takes on at most the value x, then a standardization must be made so that the table
can be used afterward. So, we look for

P X ≤ x = FX x .

We can also write F X x since we are looking for a cumulative probability up to the value
x. Now the value x must be standardized so that a value of the standard normal distribu-
tion z result:

x−μ
P X ≤ x = FX x = Φ σ
=Φ z

Standardization is done by subtracting the expected value μ from the relevant value x and
dividing this difference by the standard deviation σ. The Greek letter Phi Φ is used as a
sign for the distribution function of the standard normal distribution. What is finally deter-

145
mined in the parenthesis by standardizing is the letter z. The value we get for z must now
be read in the table of the standard normal distribution. This table looks like the following
figure.

Table 31: Table of the Standard Normal Distribution

z-
value* 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.00 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.10 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.20 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.30 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.40 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.50 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.60 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.70 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.80 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.90 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.00 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.10 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.20 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.30 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.40 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.50 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.60 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.70 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.80 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.90 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.00 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.10 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.20 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.30 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.40 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.50 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.60 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.70 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.80 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.90 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.00 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Source: Bortz & Schuster (2010, p. 587).

Suppose we obtain

Φ 2.53

after standardization. We now find these three relevant digits in the leftmost column and
in the top row. Then, 2 before the decimal point and the 5 as the first number after the
decimal point can be found in the line for“2.50.” The 3 as the second number after the dec-
imal point can be read in the uppermost line at the position “0.03” If we now go from the

146
line for “2.50” to the column for “0.03” we find the value 0.9943. This is the probability we
are looking for. If, for example, we get a 1 for z with Φ 1 , then it is important to realize
that this is nothing but a 1.00.

Accordingly, we go to the row for “1.00” and the column for “0.00” and read the corre-
sponding probability 0.8413. With this knowledge, we can now go through all possible
questions that are relevant regarding calculating probabilities for a normally distributed
random variable.

RUNNING EXAMPLE: TRAVEL TIME

Here are some questions regarding our examples.

Question 1: What is the probability that it takes 42 minutes or less for an

employee to get to work?

The following figure shows which area below the density function is to be deter-
mined. It is by no means an obligation to make such a sketch to solve the ques-
tion. We do this merely for illustration. The red marked area up to digit 42 is now
the cumulative probability which must be calculated. We can estimate that the
probability must be greater than 50% because 50% of the probability mass is up
to digit 40.

Figure 37: Density Function for a Travel Time of at Most 42 Minutes

Source: Heike Bornewasser-Hermes (2023).

The single calculation step is as follows:

147
P X ≤ 42 =F X 42
42 − 40
=Φ 2
=Φ 1
=0.8413

In the first line, we simply write down P X ≤ 42 or, alternatively, F X 42 . In

the second line, the standardization is done. The expected value (40) is subtrac-
ted from the value (42) up to which the probability is sought, and the difference
is finally divided by the standard deviation (2). This results in the value 1 for z in
the third line. This 1 must now be read from the table of the standard normal
distribution. Since we have already learned this for the value 1, we can take the
number 0.8413. Therefore, the probability that an employee needs a maximum
of 42 minutes to get to work is 84.13%.

Question 2: What is the probability that it takes less than 42 minutes for an
employee to get to work?

Even though the number 42 is not actually taken into account here (“less than”),
this is neglected for continuous random variables. Because less than 42 minutes
means that one may need 41 minutes and 59 seconds and still various millisec-
onds to get to work. Since the difference is minimal, it is neglected, and the
same procedure is used as for Question 1:

P X < 42 =F X 42
42 − 40
=Φ 2
=Φ 1
=0.8413

The probability of taking less than 42 minutes to get to work is also 84.13%.

In some cases, it happens that a negative z-value is calculated while standardizing. We will
not find this in the table of the standard normal distribution. There are only positive val-
ues. However, since the standard normal distribution is a random variable symmetrically
distributed around 0, we can convert the negative z-value into a positive one as follows:

Φ −z = 1 − Φ z

So, if we get a negative z-value, we read the probability in the table at the corresponding
positive place and subtract this probability from 1.

148
RUNNING EXAMPLE: TRAVEL TIME
Question 3: What is the probability that it takes an employee less than 36
minutes to get to work?

The following figure shows the relevant area that is to be calculated. It is located
to the left of the expected value and is a relatively small area, which is why the
probability must also be correspondingly low, significantly less than 50%.

Figure 38: Density Function for a Travel Time of Less Than 36 Minutes

Source: Heike Bornewasser-Hermes (2023).

The probability is calculated as follows:

P X < 36 = F X 36
36 − 40
=Φ 2
=Φ −2
=1 − Φ 2
=1 − 0.9772
=0.0228

Up to the third line, the calculation is the same as in the previous examples.
However, if we get the number −2 as a standardized value, the innovation
occurs. We convert Φ −2 into 1 − Φ 2 to be able to determine a correspond-
ing probability. We now must read the probability in the table of the standard
normal distribution at the position +2 . 00 (0.9772) and finally subtract this from
1. We, thus, obtain a very low probability of 2.28% that less than 36 minutes is
needed to get to work.

149
If we want to determine the probability that a normally distributed random variable X
takes on at least or a greater value than x, then we calculate this via

P X ≥ x = 1 − P X < x = 1 − FX x .

When a probability above a certain value is to be calculated (i.e., P X ≥ x or P X > x ),

we must work with the counter probability. The probability of, for instance, more than
42 minutes is equal to the probability of 100% minus the probability for less than or at
most 42 minutes.

RUNNING EXAMPLE: TRAVEL TIME

Question 4: What is the probability that it takes an employee more than 42
minutes to get to work?

The relevant area to be calculated is, therefore, the one to the right of the 42nd
minute. Here, we can assume that the probability is less than 50% because only
the area starting at 40 minutes corresponds exactly to the probability of 50%.

Figure 39: Density Function for a Travel Time of at Least 42 Minutes

Source: Heike Bornewasser-Hermes (2023).

With the knowledge from Question 1, we do not need to bother ourselves with a
calculation. Because we already know that the probability for a maximum of 42
minutes is 84.13%, we know the probability for the white area below the bell.
The probability for the red area must therefore be 100% − 84.13% = 15.87%.
Nevertheless, we want to prove this result by the calculation described above:

150
P X > 42 =1 − P X ≤ 42
=1 − F X 42
42 − 40
=1 − Φ 2
=1 − Φ 1
=1 − 0.8413
=0.1587

So, according to this, we can also conclude that the probability for a travel time
of at least 42 minutes is 15.87%.

Finally, we want to be able to determine the probability that a normally distributed ran-
dom variable X takes on a value between two values x1 and x2:

P x1 ≤ X ≤ x2 = F X x2 − F X x1

Accordingly, one always first calculates the total probability up to the value x2 (larger
value) and then subtracts the probability up to the value x1 (smaller value). Because after-
ward the probability part which lies between these two values remains.

RUNNING EXAMPLE: TRAVEL TIME

Question 5: What is the probability that it takes an employee from 39 to 42
minutes to get to work?

We are now looking for an area that does not start at the beginning of the distri-
bution or ends at the end of the distribution. Everything that lies between 39
and 42 minutes, inclusive, is now our searched area/probability.

151
Figure 40: Density Function for a Travel Time From 39 to 42 Minutes

Source: Heike Bornewasser-Hermes (2023).

The calculation of the probability is as follows:

P 39 ≤ X ≤ 42 =F X 42 − F X 39
42 − 40 39 − 40
=Φ 2
−Φ 2
=Φ 1 − Φ −0.5
=Φ 1 − 1 − Φ 0.5
=0.8413 − 1 − 0.6915
=0.5328

The first line describes that the cumulative probability up to 39 must be subtrac-
ted from the cumulative probability up to 42. The second line carries out the
respective standardization. From the third to the fourth line, it must be consid-
ered that with −0 . 5 additionally a negative z-value is calculated. This must be
converted in Line 4 by 1 − Φ 0.5 into a positive one. It is important to put a
bracket () around 1 − Φ 0.5 so that it is completely subtracted. Finally, we get a
probability of 53.28% that it takes an employee between 29 and 42 minutes,
inclusive, to get to work.

At this point, we would like to mention an important property of a normally distributed

random variable, but it is also valid for all continuous random variables: The exact proba-
bility at a certain point always goes in the direction of 0. If, for example, we want to know
the probability of needing exactly 41 minutes, then the probability goes toward 0. This can
be illustrated with the last calculated example. There, we calculated the probability for a
travel time from 39 to 42 minutes; this was 53.28%. If the two limits are closer together
and we, for example, calculate the probability for a travel time from 40 to 41 minutes or
even from 40.5 to 40.7 minutes, then the probability becomes smaller and smaller and

152
tends toward 0. As the distance between the two limits decreases, no probability remains
for a specific location. The reason is that a continuous random variable can take on so
many different characteristics that each individual characteristic is almost improbable.

Up to now, we have calculated probabilities without exception. A travel time was speci-
fied, and we determined, for example, the probability that a maximum of this travel time
would be required. In the same way, travel times can be calculated for given probabilities.
For instance, if we want to determine a travel time that will not be exceeded with a proba-
bility of p percent, we determine the quantile

xp = μ + zp · σ

where zp is the corresponding quantile of the standard normal distribution (Handl & Kuh-
lenkasper, 2018, p. 296). We recall that p always describes exactly the percentage that lies
below the searched value. If, for example, we are looking for the quantile that is exceeded
with a probability of 30%, then it is equivalent to saying that a probability of 70% is below
the value we are looking foru. Thus, x0.7 must be determined.

RUNNING EXAMPLE: TRAVEL TIME

Question 6: Which travel time is not exceeded on 95% of the days?

This is equivalent to saying that we are looking for a travel time that has a prob-
ability of 95% below it. If we want to mark the probability of 95% below the den-
sity function of the normal distribution, the end of the marked area just
describes the quantile we are looking for. The following figure shows the
searched quantile x0.95.

Figure 41: Density Function for a Travel Time That is Not Exceeded on
95% of Days

Source: Heike Bornewasser-Hermes (2023).

153
We calculate the quantile we are looking for as follows:

x0.95=μ + z0.95 · σ
=40 + z0.95 · 2
=40 + 1.6449 · 2
=43.289

We replace μ and σ by the corresponding values of 40 and 2. The only thing we

still must determine is z0.95. The quantile of the standard normal distribution
z0.95 is taken from the following table, in which only the most important quan-
tiles of the standard normal distribution are listed. These do not have to be
learned by heart but will be given at the relevant places.

Table 32: Important Quantiles of the Standard Normal Distribution

Cumulative probability p 0.9 0.95 0.975 0.99 0.995

Quantile zp 1.2816 1.6449 1.96 2.3263 2.5788

Source: Heike Bornewasser-Hermes (2023).

Thus, from the second to the third line, z0.95 is replaced by 1.6449. Overall, the
result means that with a probability of 95%, the calculated travel time of 43.289
minutes will not be exceeded or fallen short of.

If you are looking for a quantile that is to the left of the expected value (e.g., x0.2 or x0.4),
then the following will help you to read the quantile of the standard normal distribution:

zp = −z1 − p

This is because quantiles below x0.5 are usually not tabulated. Also, at the top of the table,
we find the quantiles that are to the right of the expected value.

RUNNING EXAMPLE: TRAVEL TIME

Question 7: Which travel time is exceeded with a probability of 90%?

The search is, therefore, for the travel time x0.1, which is undercut with a proba-
bility of 10%. The following figure shows that the searched quantile must be
below 40.

154
Figure 42: Density Function for the Travel Time That is Exceeded on
90% of the Days

Source: Heike Bornewasser-Hermes (2023).

Thus, it makes sense that, when determining the quantile, something is subtrac-
ted from the 40 and not added to it. While the approach with

x0.1 = μ + z0.1 · σ

is initially identical, we notice that we do not find the quantile z0.1 in the table of
quantiles of the standard normal distribution. Now, however, the symmetry
around 0 benefits us again:

zp=−z1 − p
z0.1=−z1 − 0.1 = − z0.9

We can state that z0.9 is as far away from the center (0) of the standard normal
distribution because z0.1. z0.9 is usually positive. To be exact, z0.9 = 1.2816.
Thus, the two quantiles are equidistant from 0, z0.1 = − 1.2816. This means
that the sought-after quantile can be calculated via the following:

x0.1=μ + z0.1 · σ
=40 − z0.9 · σ
=40 − 1.2816 · 2
=37 . 44

The travel time, which is exceeded with a probability of 90% or fallen short of
with 10%, is 37 . 44 minutes.

155
Finally, following the previous determination of quantiles, we want to determine central
Central fluctuation fluctuation intervals. These are the intervals that are formed centrally around the expec-
intervals ted value and contain a certain probability mass. Both limits of the interval must be
These central fluctuation
intervals contain a certain equally far away from the expected value. We will look at this using an example.
probability mass between
two limits.

RUNNING EXAMPLE: TRAVEL TIME

Question 8: What are the limits of the central variation interval in which the
travel times lie with a probability of 90%?

This can be best illustrated using the density function, as can be seen in the fol-
lowing figure. If 90% is within the boundaries of the central interval, then 10%
remains. These are divided equally between the left and right halves of the dis-
tribution outside the interval. This means that the interval must start at the
quantile x0.05 (since 5% are below this limit) and consequently end at x0.95
(since 5% are above this limit). These two quantiles must be determined now, as
we have already learned.

Figure 43: Density Function for the Central Variation Interval with 90%
of All Travel Times

Source: Heike Bornewasser-Hermes (2023).

It is always a good idea to start with the upper limit of the interval, here with
x0.95:

x0.05=μ + z0.05 · σ
=40 + z0.95 · 2
=40 + 1 . 6449 · 2
=43.29

156
Because if we determine the associated quantile of the standard normal distri-
bution for the upper interval boundary first, then we only have to give this a
negative sign for the lower boundary:

x0.05=μ + z0.05 · σ
=40 − z0.95 · 2
=40 − 1.6449 · 2
=36.71

With a probability of 90%, the travel time of employees is from 36.71 to

43.29 minutes.

t-Distribution

With the t-distribution, we consider another continuous distribution that is very similar to t-distribution
the normal distribution or standard normal distribution. It becomes important in particu- The t-distribution
describes a distribution
lar if one deals with statistical test procedures. The t-distribution distributes itself just like for small samples that is
the standard normal distribution around the value 0 (μ = 0) with a variance or standard close to the standard nor-
mal distribution.
deviation of 1 (σ2 = σ = 1). The difference is that the t-distribution is especially suitable
for the case when the samples are particularly small. This is because – unlike the standard
normal distribution – it can take into account how heavily populated a sample is. It is true
that with increasing sample size, the t-distribution becomes very similar to the standard
normal distribution until at some point it becomes congruent. This can be seen in the fol-
lowing figure.

157
Figure 44: t-Distributions and Standard Normal Distribution

Source: Heike Bornewasser-Hermes (2023).

The first figure shows a t-distribution with only four observations. Especially in compari-
son to the second figure in which we have 40 observations, the first one runs somewhat
flatter and consequently with a larger dispersion. This is generally the case with small
samples. Let us imagine a small sample in which there is an outlier. We have already
learned that such outliers can increase a dispersion a lot. Then, the larger the sample size
becomes (see the second figure), the less noticeable outliers become. We see at n = 40
that the distribution is almost identical to that of the standard normal distribution. So, we
can state that as the sample size increases, the t-distribution approaches the standard
normal distribution.

158
It might be surprising why we are suddenly talking about a sample here when we are dis-
cussing random variables without any sample at all. This becomes important when we
deal with inferential statistics and want to find out whether results obtained based on
samples can be transferred to the public. Here, we should just clarify that it is very similar
to the standard normal distribution and is particularly suitable for small sample sizes.

SUMMARY
For certain types of random variables, ready-made models exist with
which probabilities, expected values, and variances can be calculated
more easily and quickly than the conventional treatment of random var-
iables. Besides, there are some discrete distribution models, of which
only the binomial distribution and the geometric distribution have been
discussed here. Both are based on the Bernoulli process, and their ran-
dom variables have the task of counting something. While a binomially
distributed random variable counts the successes within a certain num-
ber of Bernoulli trials, the geometrically distributed random variable
counts the failures up to the first success.

Note that of the numerous continuous distribution models, only the nor-
mal distribution and the t-distribution have been discussed here. From
the starting point of a density function, it becomes clear that the sought
probabilities are always in the area below the density function. While
with discrete distributions, probabilities for certain expressions of the
random variable are of interest, with continuous random variables, only
ranges of expressions and their probabilities play a role, particularly the
probability of a certain expression of a continuous random variable
approaching zero.

159
UNIT 7
STATISTICAL ESTIMATION METHODS

STUDY GOALS

On completion of this unit, you will be able to ...

– distinguish point estimators from confidence intervals.

– calculate a point estimate for an expected value.
– calculate a point estimate for a variance.
– construct a confidence interval for the expected value when the variance is known.
– establish a confidence interval for the expected value when the variance is unknown.
7. STATISTICAL ESTIMATION METHODS

Introduction
Imagine that we want to conduct an employee survey on satisfaction with the company
that all of us work for. We already know that all employees of the company form the basic
population. However, if we do not reach all employees during our new survey, we will only
have a sample of that population. Therefore, we will have no choice but to estimate state-
ments from all employees based on the sample. In this unit, we will use point and interval
estimates to show how to estimate results for the population based on a sample.

7.1 Point Estimation

EXAMPLE: LENGTH OF SERVICE

Suppose that during our fictional employee survey, we ask the employees about
how long they have been with the company for. Unfortunately, not all employ-
ees end up taking part in our survey. In total, we only receive responses from
eight employees. As shown in the following list, these employees have been
with the company for different lengths of time in years:

12; 8; 16; 10; 6; 10; 14; 12

Our goal now is to be able to make estimates of certain location or dispersion

parameters in the population based on this small sample.

Let’s assume that we are now interested in the average length of service for all
employees. The dispersion is also important to us.

Point estimation In order to create a point estimation, we should first determine a concrete value for the
This refers to the use of a mean and the dispersion, which would then be valid for the population. Computationally,
concrete value to esti-
mate the true parameter there will be no new challenges for us. We only have to be able to calculate a mean value,
of the population based a sample variance, and a standard deviation.
on a sample.

Point Estimation of the Expected Value/Mean Value

If we now want to estimate the average service length of all employees in the company
(i.e., the population), the mean value comes into play as a relevant measure. It is impor-
tant to note that the mean value is assigned a different letter depending on the initial sit-
uation. If we are at the population level, the mean, also referred to as the “expected

162
value,” is abbreviated with the Greek letter μ. At the sample level, we use the familiar
abbreviation −x . Now, if we only have a sample, we must estimate the mean value of the
population. We always add a above the relevant estimate. Therefore, μ refers to the esti-
mated mean of the population.

At this point, we have only one sample. Consequently, we estimate the mean of the popu-
lation on the basis of this sample by calculating the mean value of the sample:

n
μ=−
1
x = n
∑ xi
i=1

This raises an important question: Is the mean value of the sample − x really well suited to
represent the mean value of the population μ? To do so, two quality criteria should be
met. On the one hand, the estimator should be unbiased (Handl & Kuhlenkasper, 2018, Unbiased
p. 345). This means that the mean value of the sample actually corresponds to the expec- An estimator is unbiased
if the calculated parame-
ted value in the population. In principle, the sample mean is considered to be an “expecta- ter of the sample matches
tion-true estimator.” Hence, this quality criterion is fulfilled. As well, the estimator should that of the population.
be consistent. A consistent estimator becomes more accurate as we increase the sample Consistent
size and approaches the expected value of the population. This criterion is also met by our An estimator is consistent
if it matches the true
use of the sample mean as an estimator for the expected value of the population (Handl & value of the population
Kuhlenkasper, 2018, p. 350). better and better as the
sample size increases.

RUNNING EXAMPLE: LENGTH OF SERVICE

Returning to our example, we will now estimate the average employee service
length on the basis of the eight employees available to us:

μ=−
12 + 8 + 16 + 10 + 6 + 10 + 14 + 12
x = 8
= 11

This results in an average length of service of 11 years. Due to the consistency of

expectations, we can assume that this value is also a reasonable estimate for all
employees. However, it is certainly the case (and here we are referring to consis-
tency) that, for example, using 80 employees instead of eight in the sample
would provide a more precise estimated value.

Point Estimate of Variance and Standard Deviation

Now, we want to obtain an estimate of the dispersion in the form of the variance and
standard deviation of the population. Here, a distinction must also be made between the
level of the population and that of the sample. If we are at the population level, the var-
iance and standard deviation are represented using the Greek letter σ by σ2 and σ, respec-
tively. Here, it is important that we are talking about a variance and not a sample variance.
In the context of a sample, we use the representation that we are familiar with – namely,
s2 and s – for the sample variance and standard deviation, respectively. The correspond-

163
ing estimates of the variance and standard deviation of the population are denoted by σ2
and σ , respectively. If we have only one sample, we estimate these two measures as fol-
lows:

n 2
σ2=s2 = n−1
· x2 − x

σ=s = s2

s2 is considered to be unbiased for σ2 (Handl & Kuhlenkasper, 2018, p. 348). In addition,

the estimator is consistent (Nachtigall & Wirtz, 2013, p. 112). In contrast, the standard devi-
ation s is not considered to be an expectation-true and consistent estimator for the stand-
ard deviation of the population σ.

RUNNING EXAMPLE: LENGTH OF SERVICE

We now estimate the dispersion of the service length for all employees based on
the eight employees available to us:

−
x=
12 + 8 + 16 + 10 + 6 + 10 + 14 + 12
= 11
8
− 122 + 82 + 162 + 102 + 62 + 102 + 142 + 122
x2= 8
= 130
8
σ2=s2 = 8−1
· 130 − 112 = 10.29

σ=s = 10.29 = 3.21

Accordingly, the average length of service is estimated to be 11 ± 3.21 years

based on the sample.

The major disadvantage of point estimators is that you commit yourself to a specific value
for the mean, variance, or standard deviation. Moreover, there is a chance that the true
value of the population may deviate slightly from our value. For this reason, it is common
to fall back on intervals with limits that have a certain probability of containing the true
value of the population.

7.2 Interval Estimation

When we work with point estimators, we must use a concrete value with respect to a
sought-after measure such as expected value, variance, or standard deviation on the basis
of a sample. Even if the estimators are unbiased, it is clear that it can be difficult for us to
obtain a convincing estimation when using particularly small samples. Notably, there is a
way out of the dilemma: Rather than committing to a value, we can directly specify a
whole interval that contains the value that we are looking for. Such intervals are known as
Confidence intervals confidence intervals.
A confidence interval

164
Intervals consist of a lower limit and an upper limit. This gives us a more reliable estimate specifies a range as an
estimator for a parameter
of the expected value, since the interval between the two limits allows for many possible of the population.
values. The “confidence” in “confidence interval” refers to the fact that we can trust that
the interval covers or contains the true value in the population according to a certain
probability. This probability is also called the confidence probability or confidence level. Confidence probability
The error probability is the opposite of the confidence probability, and it is written as α. It The confidence probabil-
ity, which is also known
captures the probability that the true value does not lie within the established confidence as the confidence level,
interval. Therefore, 1 − α represents the probability that the true value will be in the inter- indicates the probability
val (i.e., the confidence probability). that the true parameter
lies in the calculated
interval.
In the context of this lesson, we will only deal with the establishment of confidence inter-
vals for the expected values of the population. We will leave out intervals for variances or
standard deviations.

So, how do you set up such a confidence interval? We will go through the relevant steps:

1. Determine the confidence probability 1 − α: The most common values for the confi-
dence probability are 0.9 90% , 0.95 95% and, more rarely, 0.99 99% . Confidence
probabilities that are too large, such as 99%, should be treated with caution, since
although they include the true expected value with almost 100% probability, they are
not very informative due to their size.
2. Collect a sample and calculate its mean: At this stage, we need a sample to help cre-
ate our confidence interval. Because we do not want to deal with that, we will take a
given sample. Next, we calculate the mean value of this sample.
3. Mark the area around the mean value: The confidence interval should be set up
around the calculated mean value. Since the distributions for mean values usually
approach a normal distribution, this can also be assumed here.

With these steps in mind, let’s take a look at the following figure.

165
Figure 45: Confidence Interval for an Expected Value

Source: Heike Bornewasser-Hermes (2023).

If we have estimated the mean as μ = − x based on the sample of the population, then we
can observe that we have now constructed a symmetric interval around this mean. The
two limits are marked by two vertical lines that indicate where the true value lies with
according to the confidence probability of 1 − α. Note that only this probability forms
these boundaries. Finally, if there is a 1 − α probability that the true value lies within the
interval, there is also the remaining error probability α that it actually lies outside the
α
interval. Since the interval is symmetric around the mean, α is divided in half 2 = 0.5α
between the area below the lower limit of the interval and the area above its upper limit.

166
Our goal is to calculate the limits of our confidence interval. To do so, we must distinguish
between two cases:

1. Sometimes, we know information about the population. For example, we may know
the variance or standard deviation of the relevant variables at the population level.
2. We may have no known information about the population.

Depending on whether the measures of dispersion in the population are known or not, we
will set up the confidence interval according to a certain pattern (Bortz & Schuster, 2010,
p. 119). Both variants are outlined in the following sections.

Confidence Interval for the Expected Value with Known Variance σ2

Let us assume that we know the variance σ2 or the standard deviation σ of the population.
We can now calculate the confidence interval as follows (Handl & Kuhlenkasper, 2018,
p. 369):

−
x − z1 − 0.5 · α ·
σ
; −
x + z1 − 0.5 · α ·
σ
n n

The calculation to the left of the semicolon gives the lower limit of the interval. Accord-
ingly, the right side of the semicolon gives the upper limit. To calculate either limit, you
start by using the mean value of the sample − x as the estimator of the expected value for
the population. For the lower limit, you now subtract something − , and for the upper
limit, you now added something + . Note that what you have either subtracted or
added is identical on both sides. This must be the case, given that both limits should lie
equally far away from the mean value.

Because we know the variance or standard deviation of the population in this case, we can
next use the relevant quantile of the standard normal distribution. If the variance or stand-
ard deviation is unknown, then we must use a different distribution, which will be dis-
cussed later. Now, let’s have a look at the density function of the standard normal distribu-
tion.

167
Figure 46: Density Function of the Standard Normal Distribution

Source: Heike Bornewasser-Hermes (2023).

We have already seen from the previous figure how the probabilities are distributed over
the individual ranges. Thus, we also know that we have probability masses up to the
α
upper limit of the interval 1 − 2
= 1 − 0.5α. This is because there is a mass that we
α
delineated according to the probability 2
= 0.5α that lies to the right of the upper limit.
Consequently, we must determine the quantile z1 − α = z1 − 0.5α, which we can use for
2
both the upper limit (with a positive sign) and the lower limit (with a negative sign) due to
the symmetry of the standard normal distribution around 0.

Finally, this quantile is multiplied by the quotient of the standard deviation of the popula-
σ
tion σ and the square root of the sample size n. This quotient of is also called the
n
Standard error standard error. In terms of content, it tells us how much the estimated mean value − x
The standard error indi- deviates from the true value μ.
cates the deviation of the
estimated mean from the
true value.

RUNNING EXAMPLE: LENGTH OF SERVICE

Let’s take a closer look at the calculation of the confidence interval using our
previous example of an employee survey. Recall that we conducted a survey at a
fictional company and asked the employees how long they had been with the
company. Unfortunately, since not all employees took part in the survey, we
only received responses from a total of eight employees. According to the fol-
lowing list, these employees have been with the company for different lengths
of time in years:

168
12; 8; 16; 10; 6; 10; 14; 12

We know that the variance of service length in the population is 9.

Now, we would like to determine the confidence interval within which the true
expected value of average service length lies with a probability of 95%.

We should first note the following:

• A 95% confidence interval means that 1 − α = 0.95 and, consequently,

α = 0.05.
• Because we have the known variance of the population (here, σ2 = 9), we will
use a quantile of the standard normal distribution.

Table 33: Quantiles of the Standard Normal Distribution (1)

Cumulative probability p 0.9 0.95 0.975 0.99 0.995

Quantile zp 1.2816 1.6449 1.96 2.3263 2.5788

Source: Heike Bornewasser-Hermes (2023).

To set up the confidence interval, we need to obtain the following values:

−. We have already calculated this in the context

• Mean value of the sample x
of the point estimator:

−
x =
12 + 8 + 16 + 10 + 6 + 10 + 14 + 12
= 11
8

• Quantile of the standard normal distribution. By consulting our table, we

can obtain the following value of z0.975. The number 0.975 is the cumulative
probability p, so the associated quantile is 1.96:

z1 − 0.5α = z1 − 0.5 · 0.05 = z0.975 = 1.96

• Standard deviation σ. We obtain this simply by taking the square root of the
variance:

σ= 9=3

• Sample size n. Finally, the sample size is given by the number of interviewed
workers in the sample, with n = 8.
Now, we have all the information needed to set up our confidence interval:

169
−
x − z1 − 0.5 · α ·
σ
; −
x + z1 − 0.5 · α ·
σ
n n
3 3
= 11 − 1.96 · ; 11 + 1.96 ·
8 8

= 8.92; 13.08

With a probability of 95%, we can, therefore, conclude that the true average
service length is between 8.92 and 13.08 years.

Confidence Interval for the Expected Value with Unknown Variance σ2

The construction of a confidence interval is a little different if neither the variance nor the
standard deviation of the population is known to us. The basic principle is remains the
same. There are only two changes:

1. We now use the t-distribution instead of the z-distribution to select an appropriate

quantile to set up the interval.
2. Since the variance σ2 or the standard deviation σ are not known in the population, we
must estimate them based on the sample, meaning that we use s2 and s, respectively.

This results in the following general formula for setting up the confidence interval when
the variance in the population is unknown (Handl & Kuhlenkasper, 2018, p. 371):

−
x − tn − 1; 1 − 0.5 · α ·
s
; −
x + tn − 1; 1 − 0.5 · α ·
s
n n

Recall that −
x , s, and n should be known to us. However, we should take a closer look at
the quantile tn − 1.1 − 0.5 · α. As mentioned elsewhere, the t-distribution is a distribution
very similar to the standard normal distribution. It is also distributed around the value 0
but has a corresponding variance or standard deviation depending on the sample size. For
this reason, when determining the t-value, the sample size is also taken into account in
addition to the 1 − 0.5 · α probability that we know. This is done according to the number
of degrees of freedom: n − 1 (Nachtigall & Wirtz 2013, p. 119). We will not go into detail on
the meaning of the degrees of freedom here. However, it is important to know that they
represent the sample size.

RUNNING EXAMPLE: LENGTH OF SERVICE

Next, we will take a look at the setting up of a confidence interval using the
example we are familiar with. Here, we will again consider the example of the
employee survey. Remember that our eight employees have the following serv-
ice lengths in years:

12; 8; 16; 10; 6; 10; 14; 12

170
Now, we would like to determine the confidence interval in which the true
expected value of the average service length lies with a probability of 95%.

We should first note the following:

• A 95% confidence interval means 1 − α = 0.95 and, consequently, α = 0.05.

• Since no variance is known or given for the population here, we will use the
quantile of the t-distribution.

Table 34: Quantiles of the t-Distribution (1)

Area*
df 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.975 0.990 0.995 0.9995

l 0.158 0.325 0.510 0.727 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.619
2 0.142 0.289 0.445 0.617 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.598
3 0.137 0.277 0.424 0.584 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.941
4 0.134 0.271 0.414 0.569 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.132 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.859

6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.130 0.263 0.402 0.549 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 5.405
8 0.130 0.262 0.399 0.546 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 5.041
9 0.129 0.261 0.398 0.543 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.129 0.260 0.397 0.542 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587

11 0.129 0.260 0.396 0.540 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437
12 0.128 0.259 0.395 0.539 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 4.318
13 0.128 0.259 0.394 0.538 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.128 0.258 0.393 0.537 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.128 0.258 0.393 0.536 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 4.073

16 0.128 0.258 0.392 0.535 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.128 0.257 0.392 0.534 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.127 0.257 0.392 0.534 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.127 0.257 0.391 0.533 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.127 0.257 0.391 0.533 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850

21 0.127 0.257 0.391 0.532 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.127 0.256 0.390 0.532 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.767
24 0.127 0.256 0.390 0.531 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725

26 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.127 0.256 0.389 0.531 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.690
28 0.127 0.256 0.389 0.530 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.659
30 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646

40 0.126 0.255 0.388 0.529 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.551
60 0.126 0.254 0.387 0.527 0.679 0.848 1.046 1.296 1.671 2.000 2.390 2.660 3.460
120 0.126 0.254 0.386 0.526 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 3.373

z 0.126 0.253 0.385 0.524 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 3.291

171
Source: Bortz & Schuster (2010, p. 590).

RUNNING EXAMPLE: LENGTH OF SERVICE

To set up the confidence interval, we need to obtain the following values:

• Mean value of the sample x −. We have already calculated this for the point
estimator as well as the previous confidence interval:

−
x =
12 + 8 + 16 + 10 + 6 + 10 + 14 + 12
= 11
8

• Quantile of the standard normal distribution: Once we have reached the

step t7; 0.975, we should look at our t-distribution table. Unlike that for the
standard normal distribution, the t-distribution table has two components
that are crucial for obtaining the correct quantile. In the outer-left column, we
find the degrees of freedom and, thus, 7. In the top row, we find the cumula-
tive probability, which, in this example, is 0.975. If we follow along Row 7 and
Column 0.975, they intersect at the quantile 2.365. We keep this value for set-
ting up the confidence interval:

tn − 1; 1 − 0.5α = t8 − 1; 1 − 0.5 · 0.05 = t7; 0.975 = 2.365

• Standard deviation s. We have to calculate this ourselves, since it is not

given or known for the population. Thankfully, we have already done this in
the context of the point estimator:

−
x=
12 + 8 + 16 + 10 + 6 + 10 + 14 + 12
= 11
8

122 + 82 + … + 142 + 122

x2= 8
= 130
8
σ2=s2 = 8−1
· 130 − 112 = 10.29

σ=s = 10.29 = 3.21

• Sample size n. Finally, the sample size is given by the number of interviewed
workers in the sample, so n = 8.

With this, we have all the information to be able to set up our confidence inter-
val:

−
x − tn − 1; 1 − 0.5 · α ·
s
; −
x + tn − 1; 1 − 0.5 · α ·
s
n n
3.21 3.21
= 11 − 2.365 · ; 11 + 2.365 ·
8 8

= 8.32; 13.68

172
We can, therefore, conclude with a probability of 95% that the true average
length of service with the company is between 8.32 and13.68 years, inclusive.

Comparing Confidence Intervals With Known and Unknown Variance

Suppose that we want to compare our two confidence intervals. First, we note the follow-
ing with regard to the calculation:

• known variance or standard deviation in the population: use the quantile of the stand-
ard normal distribution (z)
• unknown variance or standard deviation in the population: use the quantile of the t-dis-
tribution and calculate sample variance and standard deviation based on the sample

Note that we have come to different results based on the same sample:

• known variance in the population: 8.92; 13.08

• unknown variance in the population: 8.32; 13.68

Accordingly, we can state that the interval is wider when we have an unknown variance
than when we have a known variance. This is very typical for two reasons. First, the quan-
tiles of the t-distribution are larger than those of the standard normal distribution, espe-
cially for small samples. Thus, the distance between the two interval boundaries and the
mean becomes larger. Second, the dispersion in the form of s is also often larger when we
have an unknown variance in the population than when we have a known variance. Both
factors ensure that more is subtracted from the mean on one side of the interval and more
is added on the other side. The interval boundaries are, therefore, always further apart
when we’re dealing with unknown variance rather than known variance.

Factors Influencing the Width of the Confidence Interval

The width of the confidence interval can be influenced in a positive sense. Basically, we
would like to have an interval that is as narrow as possible and based on as high a proba-
bility as possible, since this would mean that we have a very high probability of hitting the
true value within these very narrow limits.

How can this be achieved? One way is to increase the sample size n. The more observa-
tions our sample contains, the more precisely we can calculate the mean and estimate the
interval around it. In other words, the interval becomes narrower. Another way is to
reduce the confidence probability 1 − α. If, for example, we chose a confidence probabil-
ity of 90% instead of 95%, the boundaries of the confidence interval would automatically
move closer together. However, remember that this also reduces the probability that the
true value of the population is masked by the interval. This is because if the confidence
probability decreases, the error probability must increase.

173
SUMMARY
In most cases of statistical investigations, we only have samples to work
with. Nevertheless, we still have ways of making statements about pop-
ulations. One way to make such a statement is to estimate the results for
the population based on a sample of it. This can be done using point
estimators, which involves making a concrete estimate for the expected
value, the variance, or the standard deviation.

However, if you do not want to rely on a value for your estimation, then
you can choose a confidence interval instead. A confidence interval
specifies the probability with which the true but unknown parameter is
covered by the interval. Confidence intervals for expected values must
be differentiated into those with known variance in the population and
those with unknown variance in the population. If the variance is known,
a quantile of the standard normal distribution is used to establish the
confidence interval. If it is unknown, the corresponding quantile of the t-
distribution is used instead. It should be noted that a large sample size,
in particular, makes a confidence interval narrower and, thus, more pre-
cise.

174
UNIT 8
HYPOTHESIS TESTING

STUDY GOALS

On completion of this unit, you will be able to ...

– distinguish a null hypothesis from an alternative hypothesis.

– distinguish a directed test from an undirected test.
– identify Type I and Type II errors.
– perform a z-test on an expected value.
– perform a t-test on an expected value.
8. HYPOTHESIS TESTING

Introduction
Suppose that we only have samples to work with but still want to use our results to make
generally valid statements. Recall that we can make use of tools such as point estimators
and confidence intervals for this purpose. However, another option is for us to use hypoth-
esis testing to see if our sample results can be generalized. This will be discussed in the
following sections.

8.1 Methods
All over the world, there are many cases of wage gaps between men and women in the
same professions, meaning that these men typically earn more money than their counter-
parts. This might be one of the reasons why – and is certainly the case in many profes-
sions – there are different satisfaction levels among some workers. Suppose that we want
to examine this issue in the nursing profession. So, we might come up with the following
research questions:

• “Do male and female nurses earn different amounts of money?”

• “Do male nurses earn money more than female nurses?”

The Objective of Hypothesis Testing

To further explore our research questions, it would certainly be desirable to ask all
nurses – in short, the basic population – about their salaries. However, we are, once again,
faced with the common problem that we have to work with a random sample. Often, such
a sample is also relatively small. This raises the important question of whether the results
Inferential statistics obtained from such a sample are generally valid. Inferential statistics is concerned with
You use inferential statis- exactly this question. Namely, is it possible to generalize the sample results to the popula-
tics to test sample results
for generality. tion, or do sample results happen to represent a particularity of the sample instead? For
example, suppose that we obtain a sample of 50 nurses such that the female nurses earn
on average $500 more per month than the male nurses. Crucially, this salary disparity may
only be observed in this particular sample and not in the general population. Following
Schäfer (2011, pp. 9–14), we must content with the following questions.

Question 1: How can it be feasible to draw conclusions from a sample and apply them
to a population?

For this purpose, the sample must be “representative” of the population. This means that
the ratios in the population, such as the gender distribution, must be reflected as well as
possible in the sample. For instance, if we know that 70% of the nurses in the population
are female, then this should also be the case in the sample. We achieve this by drawing
random samples, and the resulting sample should be as large as possible.

176
Critically, there is one thing we must be aware of: A sample is always only a section of the
population. In other words, there is a certain probability that the sample does not exactly
reflect the population. If this occurs, the sample results do not actually apply to the gen-
eral population. This leads us to our final two questions.

Question 2: What is the quality of these conclusions? How well can the results of the
sample be generalized to the population?

There are two ways to answer these questions. Our first option is to conduct our analysis
on several samples rather than just one. Therefore, we would need several samples availa-
ble from which a “sampling distribution” can be derived (Sedlmeier & Renkewitz, 2018,
pp. 330–334). This option is rarely applied in practice due to the time required as well as
other financial and scientific reasons. Returning to our example, we would need to con-
duct several studies on “salaries in the nursing sector.” Then, we could use the individual
results to derive a global result. However, for the aforementioned reasons, this is hardly
practicable.

So, let’s move on to the second option: If we have the results of only one sample available,
then we can indicate the probability of our results being wrong, which is common prac-
tice. If, for example, we conclude that men in nursing professions earn on average $200
more per month than women, we should also determine a probability that reflects how
sure we are of this result.

This common practice is put into practice with the help of statistical significance tests or
hypothesis tests. We perform such tests to find out whether sample results can be trans-
ferred to the general population in addition to how well they can be transferred.

Null and Alternative Hypotheses

If we want to answer a question such as “Do male and female nurses earn different
amounts of money?” or “Do male nurses earn more money than female nurses?” and only
have a random sample available to us, then we should – as we have just learned – use a
statistical test procedure. At the beginning of such a test, the question is first transformed
into two completely contrary hypotheses (Bortz & Schuster, 2010, pp. 97–99). The first
hypothesis is called the null hypothesis, and it is represented by H0. Applied to our exam- Null hypothesis
ple, if the null hypothesis is true, then whether a nurse is male or female does not cause a The null hypothesis is
conservative and propo-
difference in their income. Suppose that we wanted to test for a correlation between gen- ses that there is no effect.
der and smoking behavior whether age has an influence on income. Under the null In other words, it always
describes that state in
hypothesis, we always assume that no such effect exists.
which there is no effect.

The opposite situation is described by the alternative hypothesis, which is represented Alternative hypothesis
by H1. In the context of this hypothesis, it is always assumed that such an effect exists. The alternative hypothe-
sis assumes that there is
Accordingly, we would assume that whether a nurse is male or female has an effect on an effect.
their income. Alternatively, we would assume that there is a correlation between gender
and smoking behavior and an influence of age on income.

177
Let’s return to our very first research question: “Do male and female nurses earn different
amounts of money?” With what we just learned in mind, we can create the following pair
of hypotheses:

• H0: Whether a nurse is male or female does not cause differences in income.
• H1: Whether a nurse is male or female causes differences in income.

Note that this is only one example of how to formulate these hypotheses. Under the null
hypothesis, we could also assume that male and female nurses earn the same amount of
money. In such a case, the alternative hypothesis would be that they earn different
amounts.

In most research projects, the alternative hypothesis represents what you want to
research or prove. For example, we would probably not answer the research question rela-
ted to whether male and female nurses earn different amounts of money if we did not also
assume this to be the case. The next step would be to conduct such a study for as large a
sample as possible in order to try and uncover the fact and, subsequently, also be able to
counteract the fiction.

However, there are also research questions that aim to prove the null hypothesis. For
example, researchers in the tobacco industry might want to prove the null hypothesis that
smokers and non-smokers have the same health status versus the alternative hypothesis
that smokers may have worse health than non-smokers.

Different Types of Hypotheses

Since there can be many different questions, there can be just as many different types of
hypothesis pairs as a result.

Test for location parameters

Test for location param- There are some questions that require a test for location parameters. If, for example, we
eters assume that nurses accumulate an average of 10 hours of overtime per week, we would
A test for location param-
eters checks, for example, then use a selected sample to test for the average value as a location parameter of 10
the presence of a mean hours of overtime. The corresponding hypothesis pair would be as follows:
value in the population.

• H0: The nurses work an average of 10 hours of overtime per week.

• H1: The nurses work on average more or less than 10 hours of overtime per week.

Correlation hypothesis

Correlation hypothesis In some situations, we might need to use a correlation hypothesis. For example, suppose
A correlation hypothesis that we want to examine whether there is a relationship between the number of hours
assumes a relationship
exists between two or worked in the nursing profession and cigarette consumption. Our question might be as
more variables. follows: “Does the number of hours worked in the nursing profession have an effect on
cigarette consumption?” Our hypothesis pair can differ depending on our research ques-
tion. One formulation is as follows:

178
• H0: There is no correlation between hours worked and cigarette consumption.
• H1: There is a correlation between hours worked and cigarette consumption.

Alternatively, the hypothesis pair might use the following formulation:

• H0: The number of hours worked in the nursing profession does not affect cigarette
consumption.
• H1: The number of hours worked in the nursing profession affects cigarette consump-
tion.

The word “influence” also falls under the umbrella of terms used in correlation hypothe-
ses. Thus, if we asked ourselves, “Does the number of hours worked have an influence on
cigarette consumption?”, we, just like before, are faced with a correlation hypothesis and
should create the corresponding hypothesis pair.

Difference hypothesis

Another type of hypothesis is a difference hypothesis. A difference hypothesis is useful if Difference hypothesis
you want to investigate the possible differences between two or more groups with respect A difference hypothesis
assumes a difference
to a variable. If we consider our original research question, “Do male and female nurses exists between two or
earn different amounts of money?”, this leads us to the following pair of difference hypoth- more groups with respect
eses: to a variable.

• H0: Male and female nurses earn the same amount on average.
• H1: Male and female nurses earn different amounts on average.

Change hypothesis

There are situations in which a difference hypothesis can also be a change hypothesis. Change hypothesis
Such hypotheses are often formulated in medical or psychological contexts. Suppose that A change hypothesis is a
difference hypothesis that
you form a group of people who all have a certain condition and intend to expose them to additionally tests for a
a new form of therapy. In such a case, their health status is usually examined at two differ- change.
ent points in time: (1) before the therapy and (2) after the therapy. Usually, the research
question is then whether their health status has changed as a result of the therapy and
results in the following pair of hypotheses:

• H0: The health status after the therapy is identical to that before the therapy.
• H1: The health status is different after the therapy than before the therapy.

This pair of hypotheses can be used to examine the difference in health status from one
point in time to the next. Specifically, it tests whether this changes for the same group of
individuals. Accordingly, change hypotheses are strongly characterized by the fact that a
single group of objects is observed at several points in time or in different situations.

179
Directed Versus Undirected Hypotheses

The types of hypotheses just described are all formulated in such a way that the alterna-
tive hypothesis does not assume that a specific effect exists in a specific direction. In such
a case, we are talking about “undirected” or “two-sided” hypotheses. If we return to our
example about nurses working overtime, the number of overtime hours can be below or
above 10. Similarly, there can be a positive or negative relationship between hours worked
and cigarette consumption. The same is true for the remaining pairs of hypotheses that we
Undirected hypotheses discussed. Undirected hypotheses are formulated when previous research does not yet
Undirected hypotheses allow for testing in a specific direction.
do not assume a specific
effect direction.
If we have an a priori assumption about the direction of an effect, we can focus on “direc-
ted” or “one-sided” hypotheses (Bortz & Schuster, 2010, p. 98). So, let’s go through the
above examples and assume that the effect in each situation has a very specific direction.
Directed hypotheses This allows us to formulate directed hypotheses:
A directed hypothesis
assumes an effect has a
specific direction. 1. Do nurses work “more” than 10 hours of overtime per week on average?
• H0: On average, nurses work “no more” than 10 hours of overtime per week.
• H1: The nurses work on average “more” than 10 hours of overtime per week.
4. Do the hours worked have a “negative” effect on cigarette consumption?
• H0: Hours worked “do not have a negative” effect on cigarette consumption.
• H1: Hours worked has a “negative” effect on cigarette consumption.
7. Do male nurses earn “more” than female nurses on average?
• H0: Male nurses earn on average “at most” as much as female nurses.
• H1: Male nurses earn “more” than female nurses on average.
10. Is the state of health “better” after the therapy than before the therapy?
• H0: The state of health after therapy is “at most” as good as before therapy.
• H1: The state of health is “better” after therapy than before therapy.

With such directed hypotheses, it is important that in the alternative hypothesis, the con-
crete assumption is always formulated in a very specific direction. Accordingly, there will
always be formulations that use terms such as “more than,” “less than,” “positive,” and
“negative” in the alternative hypothesis. In addition to the opposite direction, the null
hypothesis always accounts for equality.

Whether a pair of hypotheses is formulated directionally or non-directionally depends on

the current state of the research base. If this research base is sufficient to formulate direc-
tional hypotheses, then directional hypotheses should be used. If, however, no clear direc-
tion has been worked out in the literature or the field itself is still in its infancy, then undir-
ected hypotheses should always be used.

Deciding on a Hypothesis

Once you have formulated a pair of hypotheses as described, the next step is to decide on
one of these two hypotheses. In principle, the decision is always made with respect to the
null hypothesis. One possible result is that the null hypothesis is rejected. This means that
you assume the results obtained by the sample fit the assumption formulated in the alter-

180
native hypothesis. Accordingly, there seems to be an effect. If the null hypothesis is not
rejected, it can be assumed that the results obtained by the sample do not support the
existence of an effect.

But how do we decide on one of the two hypotheses? Such test procedures can be descri-
bed as “very conservative” (Schäfer, 2011, p. 57). They first assume that (1) the null
hypothesis is the correct hypothesis and (2) the sample data must first convince us other-
wise in order to decide against using the null hypothesis. Accordingly, you must examine
whether the data contained in the sample still justify the null hypothesis or whether it is
better to decide against it. If we determine that the nurses in a sample worked an average
of 11 hours of overtime per week, the question arises as to whether this still fits the null
hypothesis, with an expected average of 10 hours of overtime or whether the deviation of
a single hour of overtime is already large enough to decide against the null hypothesis. A
result is “significant” when we manage to reject the null hypothesis and, thus, support the
existence of an effect. The following section explains how this decision occurs during a
statistical test procedure.

Testing Procedure for a Hypothesis

No matter which test procedure is carried out and which pair of hypotheses must be tes-
ted, the associated test always consists of five steps. For a better understanding, let’s look
at a concrete example: We would like to find out whether nurses in a specific city work an
average of 10 hours of overtime per week. For this purpose, we asked eight nurses about
how much the overtime they worked in a given week. Our next steps are as follows:

Step 1: Formulating the hypothesis pair

As we have learned in the previous sections, a hypothesis pair is always formulated from
the initial question. In the present example, the null hypothesis that an average of 10
hours of overtime are worked per week will be tested against the alternative hypothesis
that an average of more or less than 10 hours of overtime are worked per week.

Step 2: Determining a significance level

When deciding on one of the two hypotheses, we can always make a mistake. As already
discussed, we only have one sample to work with. In this step, the “probability of error” is
determined. This is also referred to as the significance level, which is represented by α Significance level
(Bortz & Schuster, 2010, pp. 100–101). The significance level describes the probability with The significance level is
also referred to as the
which we allow ourselves to make a mistake if we decide against the null hypothesis and, probability of error.
thus, in favor of the alternative hypothesis. The most common significance level is 5%,
which means that there is a 5% probability of committing an error if we decide against the
null hypothesis. At the same time, it means that there is a 95% probability that we have
made the correct decision. A stricter significance level is 1%. Thus, we can be 99% sure
that we have made the right decision to reject the null hypothesis. In comparison, a
weaker significance level is 10%. Such a level is usually chosen when the null hypothesis is
the research hypothesis. In general, careful consideration is required when determining

181
the significance level. It is crucial to determine this level in this second step and never
change it in order to achieve your desired research results. In our example, we will use the
classic significance level of 5%.

Step 3: Calculating the test statistic

Every statistical test procedure – whether it tests for a location parameter, difference, cor-
Test statistic relation, or change – involves the calculation of a test statistic. In this step, all relevant
The test statistic is nee- information from the sample under consideration is combined into one value (Bortz &
ded to reach a decision
regarding the two hypoth- Schuster, 2010, p. 101). This value forms one of two bases for making a decision regarding
eses. Its form depends on the two established hypotheses. Since we are testing for a mean value within the frame-
the test procedure. work of a single example, we should apply the z-test or t-test for a sample here. These
tests will be described in more detail later on. In this step, we should calculate the z-statis-
tic or t-statistic based on the available data provided by the eight nurses. Let us assume
that these nurses actually worked an average of 11.5 hours of overtime, resulting in a t
value of 2.2.

Step 4: Determining the critical value or calculating the p-value

Critical value The second basis for our decision-making process is either the critical value or the p-
A critical value is taken value. Accordingly, there are two possibilities:
from a specific distribu-
tion.
p-value 1. Using the critical value. Most statistical test procedures are based on certain distri-
The p-value is output by butions. For example, when testing for a location parameter such as the mean, we
statistical programs.
assume that the corresponding variable is normally distributed. Therefore, by calcu-
lating the test variable, we transform the relevant sample data into a standard normal
distribution, which is distributed around the value 0. Here, this value represents the
case where the hypothetical mean value corresponds to the sample mean value.
Therefore, the further the test variable moves away from 0, the less plausible the null
hypothesis seems. The critical value is a cut-off value that marks when the null
hypothesis no longer seems plausible to us. Let’s assume that the critical value in this
case is 1.96. Later on, we’ll clarify how this is determined.
2. Using the p-value. The p-value describes the exceedance probability, meaning that it
indicates the probability that the found sample result – or even a more extreme one –
is valid under the null hypothesis (Benesch, 2013, p. 165). As mentioned previously,
we have assumed that the actual average overtime is 11.5 hours per week. Assuming
that the null hypothesis is initially the correct hypothesis, the p-value indicates the
probability of obtaining such an overtime value or an even larger one. For our exam-
ple, let’s use a p-value of 0.03.

Step 5: Making a decision

The last step of a statistical test is always to make a decision regarding the null hypothesis.
We can do this using the critical value as well as the p-value:

182
1. Using the critical value. As a rule, it is necessary to check whether the test variable
exceeds the critical value, and note that there are exceptions depending on the test
procedure and use of directional or non-directional testing. In order to be able to
reject the null hypothesis, the test variable must be greater than the critical value in
terms of its quantity (Bortz & Schuster, 2010, pp. 101–102):

reject H0 if test statistic > critical value

Let’s apply our test statistic and critical value, resulting in 2.2>1.96. This means that
we can reject the null hypothesis.
2. Using the p-value. The second way to make a decision is to combine the significance
level and the p-value. The aim here is to achieve a lower p-value than the significance
level in order to be able to reject the null hypothesis (Schäfer, 2011, p. 59):

reject H0 if p − value < α

Let’s return our example and apply our p-value and α: 0.03 < 0.05. Just like before, we are
able to reject the null hypothesis. Accordingly, we can state that the average of 11.5 hours
obtained from the sample is far “away” enough from the hypothetical average of 10 hours
that we can reject the null hypothesis. Thus, we conclude that nurses work significantly
more overtime hours on average than our hypothesis of 10 hours on average. In this unit,
we will always make such a decision based on the test variable as well as the critical value.

Factors Influencing the Test Decision

There are three main factors that can influence a test decision.

First, whether the hypotheses are formulated either directionally or non-directionally

plays a decisive role. This has an effect on the rejection range under the null hypothesis. If
we use an undirected hypothesis to test whether nurses work an average of 10 hours of
overtime, this also means that they can work more or less than 10 hours of overtime
under the alternative hypothesis. Let’s take a closer look at this in the following figure.

183
Figure 47: Rejection Area During Undirected Testing

Source: Heike Bornewasser-Hermes (2023).

We see here – as it is the basis for many tests – the standard normal distribution (Bortz &
Schuster, 2010, pp. 70–74). Under the null hypothesis, we assume that, following our
example, the hypothetical mean of 10 does not differ from the sample mean, which affects
how we interpret the center of the distribution. If a deviation from 0 is possible both in an
“upward” (i.e., nurses work more than 10 hours of overtime on average) and “downward”
(i.e., nurses work less than 10 hours of overtime on average) manner due to the undirected
hypotheses, then the rejection range in the form of the significance level is divided equally
between the two sides.

If we use a significance level of 5%, this is split between both sides, with a value of 2.5%
each. In order to reject the null hypothesis, we would have to observe that either the right
or left limit – indicated here with a vertical line – is “exceeded” or “undershot,” respec-
tively. However, if we are testing in a particular direction and want to test, for example,
whether nurses work “more” than 10 hours of overtime on average, then we confine the
rejection area to only one side of the distribution.

184
Figure 48: Rejection Area During Directed Testing

Source: Heike Bornewasser-Hermes (2023).

In this case, the rejection area is on the right side of the distribution, since we assume
there are “more than” 10 hours of overtime. If we now compare the two figures, we can
see which of the two tests makes it easier to reject the null hypothesis. Namely, the path
beyond the critical limit is simply shorter for the directed test than for the two-sided test.
This also makes sense from a content perspective: If we already have a reasonable suspi-
cion in a certain direction, we might have an easier time confirming this direction as well.
The directed test, thus, allows for a faster rejection of the null hypothesis (Schäfer, 2011,
pp. 61–63).

The sample size is another aspect that can either support or hinder the rejection of the
null hypothesis. This is because rejecting the null hypothesis becomes easier as the sam-
ple size increases. Let’s consider our example again and assume that there is an average of
11.5 hours of overtime among the nurses in our sample per week. If we obtained from
only, say, 20 individuals, we have to be cautious with our test, since this sample is a very
small one. This means that potential outliers can have a large effect and, consequently,
distort the result. However, we have obtained our average of 11.5 overtime hours from a
sample of 1,000 nurses, we can be more confident in our test’s calculated mean, as outli-
ers can hardly have an influence on it (Sedlmeier & Renkewitz, 2018, pp. 382–383).

Finally, the choice of the significance level α plays a very crucial role: The higher α value,
the easier it is to reject the null hypothesis. Let’s look at the previous figure on directional
testing. Imagine that the significance level is not 5% but 10%. This would increase the area
of the rejection region, bringing it closer to the center of 0. Therefore, the path to exceed
the critical limit becomes shorter as well.

185
Types of Errors in Hypothesis Testing

As we have already discussed, we can never be 100% sure that we have made the right
decision with regard to one of the two hypotheses. If we decide to accept the null hypoth-
esis and it turns out that the null hypothesis is also valid in the population, then we have
made the correct decision. Likewise, if we reject the null hypothesis in order to support
the alternative hypothesis and it turns out that this also applies in the population, then we
have done everything correctly. We find these two cases in the diagonal cells (from the
bottom to the top) in the following figure.

Table 35: Possible Errors in the Decision for a Hypothesis

Is valid in the population

H0 H1

Decision for H0 Correct decision Type II error

(β–error)

H1 Type I error Correct decision

(α–error)

Source: Heike Bornewasser-Hermes (2023).

On the diagonal in the other direction, we find cells outlining the two situations in which
an error occurs. If we decide on the basis of a sample to reject the null hypothesis but the
null hypothesis is actually true in the population, we have just committed what is known
Type I error as a Type I error or an α error (Schäfer, 2011, p. 65). Suppose that we use a sample to
This describes the errone- conclude that major restructuring measures within a company increase the efficiency of
ous rejection of the null
hypothesis. its employees even though this is not the case in reality at all. This is an example of a Type
I error, and it can have serious consequences at one point or another. For instance, if these
restructuring measures necessitate large investments, then the enterprise might make
expenditures that do not actually support its success.

If we make a decision in favor of the null hypothesis based on a sample but the alternative
hypothesis is actually the correct one in the population, then we have committed a Type II
Type II error error, also known as a β error (Schäfer, 2011, pp. 65–66). Let’s imagine that we are testing
This describes the errone- an anti-cancer drug as part of a large-scale study. In this context, we test the null hypothe-
ous retention of the null
hypothesis. sis that the anti-cancer drug has no effect against the alternative hypothesis that it does
have an effect. Now, based on the data from some participants, we conclude that the drug
does not work. Thus, we do not reject the null hypothesis. However, suppose that in the
generality of all cancer patients, the drug shows an effect. Hence, the null hypothesis is
true, and we have committed a Type II error. Namely, we did not attribute an effect to the
drug although it actually has one.

We can see from the two examples that the two errors are of different importance depend-
ing on our question (Sedlmeier & Renkewitz, 2018, p. 384). We must decide case by case
which error weighs more heavily. If it is important for us to keep the Type I error as low as
possible, we should thoughtfully choose α in the described test procedure and set it to, for

186
example, 1%. Even if we cannot influence the β-error directly, we can control it via α. This
is because the two errors are closely – and oppositely – related. Hence, if we want a low β-
error, we should choose an α that is as large as possible.

8.2 One-Dimensional Expected Value Test

With Known Standard Deviation (z-Test)
In the context of this section, we will only deal with two hypothesis tests that test for a
location parameter. Both the z-test and t-test test whether a mean value calculated from a
sample is also valid in the population. For example, if we want to test whether teachers
earn an average of $2,500 per month or whether it is actually more or less than this value,
we should test for the hypothetical mean value of $2,500.

The z-Test Versus the t-Test

We have already learned in the context of confidence intervals that the variance or stand-
ard deviation of the population plays a decisive factor in the way that confidence intervals
are set up. We continue this idea here:

• If we know the variance or standard deviation of the population, we can use the z-test
as well as a quantile of the standard normal distribution for the critical value.
• If we do not know the variance or standard deviation of the population, we must first
determine this ourselves based on a sample and use the t-test with the corresponding t-
distribution for the critical value.

For each individual test, assumptions specifically tailored to the test must always apply.

Assumptions for the z-test

We first start with the test for an expected value, the z-test. Therefore, we need to make z-test
several assumptions: The z-test is used to test
for an expected value
when the variance in the
1. The variable under investigation must be cardinally scaled. This makes sense because population is known.
if we want to test for a mean or expected value, then we have to deal with a variable
that consists of a number.
2. The variable under investigation should be normally distributed in the population.
Importantly, this assumption ensures that extreme outliers cannot distort the results.
In the following example, we assume that this assumption is fulfilled. While there are
special test procedures that you can use to test for the presence of the normal distri-
bution, we do not cover these procedures in this section.
3. The data chosen for the test originate from a simple random sample. This assumption
must be fulfilled for any hypothesis test. It is critical that all individuals in the popula-
tion had an equal chance of being included in the sample. Accordingly, the sample
should not contain specifically selected objects.

187
Process of the z-Test

Now, we will take a step-by-step look at the z-test using an example.

EXAMPLE: OVERTIME HOURS

Let’s assume that we work in a company where there are many employees. Our
colleagues tend to complain about the number of overtime hours that they have
to work. Our boss claims that his employees work an average of seven hours of
overtime per week: no more and no less. In his opinion, this is perfectly fine. Five
colleagues have now agreed to write down their number of overtime hours for
the past week as proof that this isn’t the case. Thus, we obtain the following list:

8; 10; 7; 5; 10

We also know that the variance of overtime hours per week in the population is
4 and overtime is basically normally distributed. Now, the question arises
whether our boss is right about the number of hours. We can formulate our
research question in three different ways:

1. Is the number of overtime hours worked by employees per week on average

actually 7 or is it “more or less than” 7 overtime hours?
2. Is the number of overtime hours worked by employees per week on average
“more” than 7 hours?
3. Is the number of overtime hours worked by employees per week on average
“less” than 7 hours?

The type of question determines whether we need to use an undirected (i.e.,

two-sided) test or a directed (i.e., one-sided) test. The first question suggests
that the average number of overtime hours could be either above or below 7.
Accordingly, we would conduct an undirected test, since the average overtime
could either be above 7 or below 7 in this scenario. In contrast, the second and
third questions use a very specific direction. If our question focuses on more
than 7 hours of overtime, we presume that the employees are working on aver-
age more than the 7 hours of overtime per week. In a similar fashion, if our ques-
tion focuses on less than 7 hours of overtime, we will also need to test in a spe-
cific direction (i.e., we need to conduct a one-sided test).

In any case, since we have a known variance of 4 for the population, we can con-
clude that we must use a z-test.

188
Two-sided z-test

Suppose that we want to address the following question: “Is the average number of over-
time hours worked by employees per week actually 7, or is it either above or below 7?”
The previous section described the test procedure in general. Now, we will go through the
five steps of the present two-sided test.

Step 1: Setting up the hypotheses

The first step is always to set up both the null and alternative hypotheses. This is usually
done in mathematical notation. As we have already learned elsewhere, the Greek letter μ
is used for the expected value. The abbreviation μ0 is, in turn, used for the hypothetical
mean, and it is replaced by a concrete number for each test. In general, for a two-sided
test, we can assume that the expected value is equal to the hypothetical mean μ0 under
the null hypothesis and not equal to μ0 under the alternative hypothesis (i.e., it is either
larger or smaller):

H0 : μ = μ0 versus H1 : μ ≠ μ0

We can also express this differently: The null hypothesis assumes that we are in a popula-
tion in which the expected value is equal to the hypothetical mean value, whereas the
alternative hypothesis assumes a generality in which the expected value is greater or
smaller than the hypothetical mean value.

RUNNING EXAMPLE: OVERTIME HOURS

In our example, we assume that the average number of overtime hours per week
is 7 under the null hypothesis. The alternative hypothesis, therefore, states the
opposite and assumes an average that is either higher or lower than 7 hours:

H0 : μ = 7 versus H1 : μ ≠ 7

The task now is to find out which hypothesis – and, thus, which population – our
five examined employees has the best fit with.

Step 2: Determining the significance level

The significance level is usually 0.05 5% ,0.01 1% , or 0.1 10% .

189
RUNNING EXAMPLE: OVERTIME HOURS
In our example, we select a commonly used significance level of α = 0.05. In
terms of content, this means that if the null hypothesis is rejected, we have a 5%
probability of being wrong.

Step 3: Calculating the test statistic

In the case of a z-test, the test variable is z, and it is calculated as follows (Bortz & Schus-
ter, 2010, p. 103):

−
x − μ0
z= n· σ

n stands for the sample size, −x is the mean of the sample, which we have to calculate our-
selves, μ0 is the hypothetical mean, which is given, and σ (sigma) is the standard deviation
known from the population.

RUNNING EXAMPLE: OVERTIME HOURS

With this in mind, let’s look at our example. Recall that we have a list of the over-
time hours of the five employees:

8; 10; 7; 5; 10

So, we can state that n = 5. In addition, we know from the case description and
hypotheses that μ0 = 7. Also, we are given the variance of the population as
σ2 = 4, which is why the standard deviation can be calculated simply by
σ = 4 = 2. Only the mean value must be calculated by

−
x =
8 + 10 + 7 + 5 + 10
= 8.
5

With this, we have everything we need to calculate the test statistic:

−
x − μ0 8−7
z= n· σ
= 5· 2
= 1.118.

The fact that the result is positive is due to the fact that the mean value of the
sample, which is 8, is greater than the hypothetical mean value, which is 7. If it
were the other way around, we would have a negative result.

190
Step 4: Defining the critical value

In order to be able to decide whether the test statistic that we just calculated still supports
the null hypothesis or not, we need a reference value. In the case of the z-test, we always
use a quantile of the standard normal distribution. Since testing is done in both directions,
we will end up with both a positive critical value and a negative critical value. We deter-
mine a critical value as follows:

±z1 − 0.5 · α

With the help of α, we can determine the cumulative probability and then find the rele-
vant value in a table of the quantiles of the standard normal distribution.

RUNNING EXAMPLE: OVERTIME HOURS

Since α = 0.05 in the present example, it follows that
1 − 0.5 · α = 1 − 0.5 · 0.05 = 0.975. Accordingly, we look in the following table
at the position p = 0.975 and read the corresponding z-value:

z0 . 975 = 1.96

Table 36: Quantiles of the Standard Normal Distribution (2)

Cumulative probabil-
ity p 0.9 0.95 0.975 0.99 0.995

Quantile zp 1.2816 1.6449 1.96 2.3263 2.5788

Source: Heike Bornewasser-Hermes (2023).

To the right of the central 0 is +1.96, and to the left is −1.96. The null hypothesis
is valid up to these critical limits.

Step 5: Making a decision

Now, if the test statistic that we calculated in the third step exceeds the critical limit, the
null hypothesis is rejected. The two-sided rule for rejecting the null hypothesis can be
written by placing magnitude bars around the test variable and using the positive critical
value:

reject H0 if z > z1 − 0.5 · α

191
The positive value of the calculated test variable – even if it is actually negative – must,
therefore, be greater than the positive critical value from the table in order for us to be
able to reject the null hypothesis.

RUNNING EXAMPLE: OVERTIME HOURS

In our example, we obtained a test value of 1.118 as well as a critical value of
1.96. Therefore, we make the following observation:

z = 1.118 ≯ 1.96 = z0.975

This means that we cannot reject the null hypothesis. Accordingly, the average
number of overtime hours worked per week does not significantly deviate from
7.

z-test in the right direction

Now, let’s address the following question: Is the average number of overtime hours
worked by employees per week more than 7? We, thus, test specifically with the possibility
that there could be more than 7 hours of overtime in mind. This is a directional test and
also, more specifically, a right-sided test. Compared to the previous two-sided test, not
much changes in terms of the steps that we take. We only have to formulate the hypothe-
ses differently and use a different critical value. The rest is analogous to the previous test
procedure. Let’s go through the steps once.

Step 1: Setting up the hypotheses

Since we are now testing in a specific direction, this must be accounted for in the hypothe-
ses. If we are performing a right-sided test, we must always assume that under the alterna-
tive hypothesis, the mean value is greater than > the hypothetical mean value μ0 .
Consequently, under the null hypothesis, we assume that the mean value is less than or
equal to ≤ the hypothetical mean value μ0 :

H0 : μ ≤ μ0 versus H1 : μ > μ0

RUNNING EXAMPLE: OVERTIME HOURS

In the example, the hypothetical value is 7, which is why the hypotheses are as
follows:

H0 : μ ≤ 7 versus H1 : μ > 7

192
Step 2: Determining the significance level

This step does not change.

RUNNING EXAMPLE: OVERTIME HOURS

We again use a significance level of 5% in our example.

Step 3: Calculating the test statistic

Although we are now using a one-sided test, nothing changes in this step.

RUNNING EXAMPLE: OVERTIME HOURS

Accordingly, we obtain the test variable just like we did before:

−
x − μ0
z= n· σ
= 1.118

Step 4: Defining the critical value

When setting the critical value, there is now a new feature to keep in mind. Since we are
now only testing in one direction, the rejection region is just on one side of the distribu-
tion – in this case, the right side. Consequently, the rejection range no longer needs to be
halved. Hence, the quantile of the standard normal distribution that we are looking for is
determined as follows:

z1 − α

RUNNING EXAMPLE: OVERTIME HOURS

With α = 0.05, we obtain a critical value of

z1 − α = z1 − 0.05 = z0.95 = 1.6449.

Once we have arrived at z0.95, we, again, consult a table of the quantiles of the
standard normal distribution. If we look for p = 0.95, the corresponding quan-
tile is 1.6449.

193
We see that this critical limit is lower than the one that we used for two-sided testing. Con-
sequently, it is easier to reject the null hypothesis. Whether this is successful is addressed
in the next step.

Step 5: Making a decision

The decision regarding the null hypothesis is almost the same as before. However, we do
not need to put the test variable z in magnitude bars. As expected, the test variable must
always be positive for a right-sided test:

reject H0 if z > z1 − α

Again, the test variable must be greater than the critical value in order to reject the null
hypothesis.

RUNNING EXAMPLE: OVERTIME HOURS

In the present example, we still have a test value of 1.118 as well as a critical
value of 1.6449. Thus, the following applies:

z = 1.118 ≯ 1.6449 = z0.95

This means that we cannot reject the null hypothesis in this case either. Accord-
ingly, we cannot assume that there are significantly more than 7 overtime hours
on average for these employees per week.

z-test in the left direction

We also want to test in the other direction: Is the average number of overtime hours
worked by employees less than 7 overtime hours per week? To investigate this, we will test
on the left-hand side, since we hypothesize there are less than 7 hours of overtime on
average per week. Again, we will only need to differ from the general procedure when
making the hypotheses, selecting the critical value, and making our decision.

Step 1: Setting up the hypotheses

When setting up the alternative hypothesis, we must now assume that there is a smaller
mean value < than the hypothetical one μ0 . Consequently, under the null hypothe-
sis, the expected mean value is greater than or equal to ≥ the hypothetical μ0 mean
value:

H0 : μ ≥ μ0 versus H1 : μ < μ0

194
RUNNING EXAMPLE: OVERTIME HOURS
In the example, the hypothetical value is 7, which is why the hypotheses are as
follows:

H0 : μ ≥ 7 versus H1 : μ < 7

Step 2: Determining the significance level

This does not change.

RUNNING EXAMPLE: OVERTIME HOURS

We continue to maintain a significance level of 5% here.

Step 3: Calculating the test statistic

Again, there are no changes between the step here and those for two-sided and right-
sided testing.

RUNNING EXAMPLE: OVERTIME HOURS

We, therefore, assume a test value of

−
x − μ0
z= n· σ
= 1.118.

Step 4: Defining the critical value

When setting the critical value, we need to consider the following: According to the alter-
native hypothesis, we expect to obtain a smaller mean than the hypothesized one. Accord-
ingly, if we look at the test variable above, we can expect that −
x − μ0 will result in some-
thing negative and, thus, the test variable will be negative overall. For this reason, the
critical value in left-handed testing is always negative. While it is identical in number to
that for right-hand testing, it is given a negative sign:

−z1 − α

195
RUNNING EXAMPLE: OVERTIME HOURS
With α = 0.05, our critical value results in

−z1 − α = −z1 − 0.05 = −z0.95 = − 1.6449

Once we have arrived at z0.95, we, again, consult a table of the quantiles of the
standard normal distribution. If we search for p = 0.95, we discover that the cor-
responding quantile is 1.6449. Finally, we assign a negative sign to this value.

Step 5: Making a decision

When making the decision, we now have two options. We can stay in the negative range
and see if the test variable is smaller than the negative critical value to reject the null
hypothesis:

reject H0 if z < − z1 − α

The alternative here is to take advantage of the symmetry of the standard normal distribu-
tion around 0. We can, therefore, place both the test variable and the critical value in mag-
nitude bars and make the decision in the positive range:

reject H0 if z > − z1 − α

RUNNING EXAMPLE: OVERTIME HOURS

In this example, we still have a test variable of 1.118 as well as a critical value of
−1.6449. In this case, the null hypothesis cannot be rejected simply because the
test variable, contrary to our expectations, has turned out to be positive.
Regardless, we should take a look at the two possible ways to make our deci-
sion. According to the first decision rule, we have the following:

z = 1.118 ≮ − 1.6449 = −z0.95

The test statistic is, therefore, not smaller than the negative critical value, which
is why the null hypothesis cannot be rejected. Recall that the second possible
decision rule uses magnitude lines, which means that we come to the following
result:

z = 1.118 ≯ − 1.6449 = − z0.95

196
The test variable, which was already positive anyway, is not larger than the posi-
tive critical value. The null hypothesis can, therefore, not be rejected, and we
conclude that there are not significantly less than 7 overtime hours worked on
average per week.

8.3 One-Dimensional Expected Value Test

With Unknown Standard Deviation (t-Test)
We use the t-test to test for an expected value exactly when we do not know the variance t-test
or standard deviation of the variable in question. Some things are different compared to The t-test is for an expec-
ted value with unknown
when we use the z-test: variance.

• the independent calculation of the sample variance and standard deviation based on
the sample,
• the calculation of the test statistic, called the t-statistic, and
• the use of a critical value from the t-distribution.

The rest of the procedure remains completely identical to that of the z-test. Also, all the
assumptions mentioned in the z-test still apply here. We want to use the same example
again, but, this time, suppose that we do not know the variance in the population.

RUNNING EXAMPLE: OVERTIME HOURS

Let’s assume that we were to work in a company where there are numerous
employees. Our colleagues complain a lot about how much overtime they have
to work each week. Five colleagues have agreed to write down the overtime
hours for the past week. This results in the following list:

8; 10; 7; 5; 10

The boss, just like before, thinks that there is an average of exactly 7 overtime
hours worked per week.

We will now go through the three possible questions and, thus, get to know the two-sided
as well as the left- and right-sided test procedures. If something should already be known
to us, we will only briefly cover it in this section.

Two-Tailed t-Test

We now want to address the following question: Is the average number of overtime hours
worked by employees per week actually 7, or is it above or below 7?

197
Step 1: Setting up the hypotheses

There are no innovations here.

RUNNING EXAMPLE: OVERTIME HOURS

We proceed as usual: H0 : μ = 7 versus H1 : μ ≠ 7.

Step 2: Determing the significance level

This step should also be familiar.

RUNNING EXAMPLE: OVERTIME HOURS

Like before, we will choose a significance level of 5%.

Step 3: Calculating a test statistic

Since we are now dealing with a t-test, we will also use a t-statistic as a test variable.
Except for the denominator used in its calculation, the t-statistic is exactly the same as the
z-statistic (Bortz & Schuster, 2010, p. 118):

−
x − μ0
t= n· s

To obtain the test statistic, we have to calculate the standard deviation s based on the
sample.

RUNNING EXAMPLE: OVERTIME HOURS

Again, everything else remains the same as for the test variable z. From our cal-
culation for the z-test, we have the following information:

• n=5
• μ0 = 7
8 + 10 + 7 + 5 + 10
• −
x = =8
5

Now, we need to calculate the standard deviation based on the sample:

198
−
x =
8 + 10 + 7 + 5 + 10
=8
5
− 82 + 102 + 72 + 52 + 102
x2 = 5
= 67.6
5
s2 = 5−1
· 67.6−82 = 4.5

s = 4.5 = 2.12

With this, we have all the information needed to calculate the test statistic:

−
x − μ0 8−7
t= n· s
= 5· 2.12
= 1.05

Step 4: Defining the critical value

This step is analogous to the one for the z-test, except that we now use the t-distribution.
From what we’ve learned about confidence intervals, we know that the t-distribution
depends on the degrees of freedom (that is, df = n − 1) in addition to the significance
level.

RUNNING EXAMPLE: OVERTIME HOURS

The remaining considerations for dividing the rejection range on both sides of
the distribution are identical to those for the z-value, which is why we determine
a critical t-value as follows:

±tn − 1;1 − 0.5 · α

With n = 5 and α = 0.05, we arrive at the following critical values for the t-dis-
tribution:

±tn − 1;1 − 0.5 · α = ±t5 − 1;1 − 0.5 · 0.05 = ±t4;0.975 =

± 2.776

199
Table 37: Quantiles of the t-Distribution (2)

Area*
df 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.975 0.990 0.995 0.9995

1 0.158 0.325 0.510 0.727 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.619
2 0.142 0.289 0.445 0.617 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.598
3 0.137 0.277 0.424 0.584 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.941
4 0.134 0.271 0.414 0.569 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.132 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.859

z 0.126 0.253 0.385 0.524 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 3.291

Source: Bortz & Schuster (2010, p. 590).

Step 5: Making a decision

Again, if the test variable exceeds the critical limit, the null hypothesis is rejected. We can
write the rule for rejecting the null hypothesis for both sides by placing magnitude bars
around the test variable:

200
reject H0 if t > tn − 1;1 − 0.5 · α

The positive value of the calculated test statistic (even if it is actually negative) must there-
fore be greater than the positive critical value from the table in order to be able to reject
the null hypothesis.

RUNNING EXAMPLE: OVERTIME HOURS

Our calculated test variable has a value of 1.05, while the critical value is 2.776.
This means that we can make the following observation:

t = 1.05 ≯ 2.776 = t4; 0 . 975

The test value is, therefore, not greater than the critical value, which is why the
null hypothesis cannot be rejected. Even after this test, it appears that the aver-
age number of overtime hours per week does not deviate significantly from 7.

t-Test in the Right Direction

With the knowledge we have gained so far, let’s just go through the directed tests very
briefly. If we want to investigate “more than 7 hours overtime on average per week,” we
perform a right-sided test. If we also don’t know the variance in the population, we must
conduct a t-test. So, we go through the following steps:

1. The hypotheses are formulated as for the right-sided z-test (H0 : μ ≤ 7 vs. H1 : μ > 7).
2. The significance level will be given with a value (α = 0.05).
3. The test statistic is calculated just like it is for a two-sided t-test (t = 1.05).
4. Following the z-test, the critical limit is defined here with the corresponding degrees
of freedom as tn − 1;1 − α. Note that the rejection region α is now only on the right side
of the distribution: t4;0.95 = 2.132
5. The null hypothesis is rejected exactly when t > tn − 1;1 − α. We then observe the fol-
lowing: t = 1.05 ≯ 2.132 = t4;0.95. So, the null hypothesis cannot be rejected.

t-Test in the Left Direction

If we test for “less than 7 hours of overtime on average per week,” again we end up with a
left-sided t-test, since the variance in the population is unknown. Here, too, we would like
to briefly summarize the essential steps:

1. The hypotheses are formulated as for the left-sided z-test (H0 : μ ≥ 7 vs. H1 : μ < 7).
2. The significance level will be given with a value (α = 0.05).
3. The test statistic is calculated just like it is for a two-sided t-test (t = 1.05).

201
4. Following the z-test, the critical limit is defined here with the corresponding degrees
of freedom as −tn − 1;1 − α. Note that the rejection region α is now only on the left
side of the distribution: −t4;0.95 = − 2.132
5. The null hypothesis is rejected exactly when either t < − tn − 1;1 − α or
t > − tn − 1;1 − α holds. In this case, we note that t = 1.05 ≮ − 2.132 = −t4;0.95
and t = 1.05 ≯ − 2.132 = −t4;0,95 . This means that the null hypothesis can-
not be rejected.

We have now become acquainted with two different test procedures. Both test for an
expected value. We should be able to distinguish between them and decide whether to
use a two-sided, left-sided, or right-sided test.

SUMMARY
Suppose that we only have a sample but we would like to make gener-
ally valid statements. A hypothesis test or significance test can be used
to test the suitability of the sample results to the population. The start-
ing point of each test is a hypothesis pair, with a null and an alternative
hypothesis. These hypotheses describe two completely contrasting pop-
ulations. The purpose of the test is to find out to which population the
given sample best fits in.

In the context of this lesson, we focused on a test for an expected value.

It should be tested whether a hypothetical mean value is exceeded, only
exceeded, or only fallen short of. The z-test or the t-test can be used for
this purpose. A distinction is made between the two depending on
whether the variance in the population is known. If this is known, the z-
test is used. If this is not the case, the t-test is used.

202
BACKMATTER
LIST OF REFERENCES
Bamberg, G., Baur, F., & Krapp, M. (2022). Statistik [Statistics] (19th ed.). De Gruyter Olden-
bourg Verlag.

Benesch, T. (2013). Schlüsselkonzepte zur Statistik – die wichtigsten Methoden, Verteilun-

gen, Tests anschaulich erklärt [Key concepts in statistics – Explained in an understand-
able manner: The most important methods, distributions, and tests]. Springer Spek-
trum.

Bortz, J., & Schuster, C. (2010). Statistik für Human- und Sozialwissenschaftler [Statistics for
human and social scientists] (7th ed.). Springer Verlag.

Fahrmeir, L., Heumann, C., Künstler, R., Pigeot, I., & Tutz, G. (2016). Statistik: Der Weg zur
Datenanalyse [Statistics: The path to data analysis] (8th ed.). Springer Spektrum.

Halfens, R. J. G., & Meijers, J. M. M. (2013). Back to basics: An introduction to statis-

tics. Journal of Wound Care, 22(5), 248–251. http://search.ebscohost.com.pxz.iubh.de:
8080/login.aspx?direct=true&db=ccm&AN=87818359&site=eds-live&scope=site

Handl, A., & Kuhlenkapser, T. (2018). Einführung in die Statistik – Theorie und Praxis mit R
[Introduction to statistics – Theory and practice with R]. Springer Spektrum.

Nachtigall, C., & Wirtz, M. (2013). Wahrscheinlichkeitsrechnung und Inferenzstatistik: Statis-

tische Methoden für Psychologen: Teil 2 [Probability theory and inferential statistics:
Statistical methods for psychologists: Part 2] (6th ed.). Beltz Juventa.

Raithel, J. (2008). Quantitative Forschung: Ein Praxiskurs [Quantitative research: A practical

course] (2nd ed.). VS Verlag für Sozialwissenschaften.

Schäfer, T. (2011). Statistik II: Inferenzstatistik [Statistics II: Inferential statistics]. VS Verlag.

Sedlmeier, P., & Renkewitz, F. (2018). Forschungsmethoden und Statistik für Psychologen
und Sozialwissenschaftler [Research methods and statistics for psychologists and
social scientists] (3rd ed.). Pearson Verlag.

204
LIST OF TABLES AND
FIGURES
Figure 1: Variable Classification by Scales of Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Table 1: Results of Patient Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Table 2: General Structure of a Frequency Table for Nominal Scale Variables . . . . . . . . . . . 24

Table 3: Frequency Table for Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Table 4: General Structure of a Frequency Table for Ordinal Scale Variables . . . . . . . . . . . . 26

Table 5: Frequency Table for Satisfaction (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Table 6: General Structure of a Frequency Table for Discrete Cardinal Variables . . . . . . . . 28

Table 7: Frequency Table for Previous Contact (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Table 8: General Structure of a Frequency Table for Continuous Cardinal Variables . . . . . 30

Table 9: Frequency Table for Age (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Figure 2: Pie Chart for Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Figure 3: Bar Chart for Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Figure 4: Pie Chart for Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Figure 5: Bar Chart for Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Figure 6: Bar Chart for Previous Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Table 10: Frequency Table for Age Including Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure 7: Histogram for Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Table 11: Location Parameters and Their Possible Applications . . . . . . . . . . . . . . . . . . . . . . . 39

Table 12: Frequency Table for Satisfaction (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Table 13: Frequency Table for the Previous Contact (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

205
Table 14: Frequency Table for Age (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Table 15: Frequency Table for Previous Contact (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 16: Frequency Table for Age (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Figure 8: Symmetry and Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 17: Frequency Table for Age (4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Table 18: Initial Data for Two Nominal Scale Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Table 19: General Structure of a Contingency Table With Absolute Frequencies . . . . . . . . 60

Table 20: Contingency Table With Absolute Frequencies for the Variables of Gender and
Smoking Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Table 21: Contingency Table With Only Marginal Frequencies for the Variables of Gender
and Smoking Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Table 22: Absolute Frequencies and Expected Frequencies (in Parentheses) . . . . . . . . . . . 64

Table 23: Initial Data on the Relationship Between Satisfaction With Care Robots and Satis-
faction With Health Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Table 24: Ranks of the Relationship Between Satisfaction With Care Robots and Satisfac-
tion With Health Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Table 25: Auxiliary Table for the Calculation of the Rank Correlation Coefficient (1) . . . . . 71

Table 26: Auxiliary Table for the Calculation of the Rank Correlation Coefficient (2) . . . . . 71

Table 27: Initial Data for the Relationship Between the Age of the Mothers and Fathers of
Young Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Figure 9: Scatter Plot for the Correlation Between the Age of the Mothers and Fathers of
Young Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Figure 10: General Scatter Plot Divided Into Quadrants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Figure 11: Scatter Plot for the Correlation Between the Age of the Mothers and Fathers of
Young Patients Divided Into Quadrants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Table 28: Auxiliary Table for the Calculation of the Correlation Coefficient . . . . . . . . . . . . . 78

Figure 12: Scatter Plot With a Correlation Coefficient of +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

206
Figure 13: Scatter Plot With a Correlation Coefficient of -1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Figure 14: Scatter Plots With a Correlation Coefficient of 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Figure 15: Optimal Correlation Measures Depending on the Scale Level . . . . . . . . . . . . . . . 84

Table 29: Initial Data for Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 16: Scatter Plot for the Initial Data of the Simple Linear Regression . . . . . . . . . . . . . 90

Figure 17: Idea of Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Figure 18: Scatter Plot for the Initial Data Including Regression Line of the Simple Linear
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 19: Venn Diagram for a Complementary Event to A . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Figure 20: Complementary Event to Event A in Dice Game . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Figure 21: Venn Diagram for the Union Set of A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Figure 22: Venn Diagram for the Union Set of Players A and B . . . . . . . . . . . . . . . . . . . . . . 109

Figure 23: Union Set for Passing at Least One Examination . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Figure 24: Venn Diagram for the Intersection Set ofA and B . . . . . . . . . . . . . . . . . . . . . . . . . 111

Figure 25: Intersection of Players A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Figure 26: Venn Diagram for the Difference A\B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Figure 27: Difference of the Players A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Figure 28: Venn Diagram for Counter Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Figure 29: Venn Diagram for the Probability of the Union Set . . . . . . . . . . . . . . . . . . . . . . . . 118

Figure 30: Venn Diagram for the Probability of the Difference A\B . . . . . . . . . . . . . . . . . . 120

Table 30: Frequency Table for Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Figure 31: Bar Chart of the Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Figure 32: Density Function for a Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . 129

Figure 33: Density Function with a Waiting Time of Maximum Three Minutes . . . . . . . . . 130

207
Figure 34: Density Function with a Waiting Time of Four to Six Minutes . . . . . . . . . . . . . . . 131

Figure 35: Density Function with a Waiting Time of At Least Six Minutes . . . . . . . . . . . . . . 132

Figure 36: Representation of the Density Function for the Travel Time . . . . . . . . . . . . . . . 144

Table 31: Table of the Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Figure 37: Density Function for a Travel Time of at Most 42 Minutes . . . . . . . . . . . . . . . . . . 147

Figure 38: Density Function for a Travel Time of Less Than 36 Minutes . . . . . . . . . . . . . . . 149

Figure 39: Density Function for a Travel Time of at Least 42 Minutes . . . . . . . . . . . . . . . . . 150

Figure 40: Density Function for a Travel Time From 39 to 42 Minutes . . . . . . . . . . . . . . . . . 152

Figure 41: Density Function for a Travel Time That is Not Exceeded on 95% of Days . . . . 153

Table 32: Important Quantiles of the Standard Normal Distribution . . . . . . . . . . . . . . . . . . 154

Figure 42: Density Function for the Travel Time That is Exceeded on 90% of the Days . . 155

Figure 43: Density Function for the Central Variation Interval with 90% of All Travel Times
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Figure 44: t-Distributions and Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 158

Figure 45: Confidence Interval for an Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Figure 46: Density Function of the Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . 168

Table 33: Quantiles of the Standard Normal Distribution (1) . . . . . . . . . . . . . . . . . . . . . . . . . 169

Table 34: Quantiles of the t-Distribution (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Figure 47: Rejection Area During Undirected Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

Figure 48: Rejection Area During Directed Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Table 35: Possible Errors in the Decision for a Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Table 36: Quantiles of the Standard Normal Distribution (2) . . . . . . . . . . . . . . . . . . . . . . . . . 191

Table 37: Quantiles of the t-Distribution (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

208
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt

Mailing Address
Albert-Proeller-Straße 15-19
D-86675 Buchdorf

[email protected]
www.iu.org

Help & Contacts (FAQ)

On myCampus you can always find answers
to questions concerning your studies.

Statistics for Engineers and Scientists, 6th Edition William Navidi - eBook PDF download pdf
100% (1)
Statistics for Engineers and Scientists, 6th Edition William Navidi - eBook PDF download pdf
52 pages
Dlmdsam01 Course Book
No ratings yet
Dlmdsam01 Course Book
160 pages
2003 Makipaa 1
No ratings yet
2003 Makipaa 1
15 pages
Preboard 1 - Math 1
100% (1)
Preboard 1 - Math 1
10 pages
001-2023-0929 DLMDSAS01 Course Book
No ratings yet
001-2023-0929 DLMDSAS01 Course Book
224 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Core Statistics PDF
100% (4)
Core Statistics PDF
256 pages
Statistics and Probability With
100% (5)
Statistics and Probability With
64 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
100% (5)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
51 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Ignou PGDAST Assignment Booklet Jan-Dec 2020
No ratings yet
Ignou PGDAST Assignment Booklet Jan-Dec 2020
30 pages
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
55 pages
(eBook PDF) Miller & Freund's Probability and Statistics for Engineers 9th Edition download pdf
100% (5)
(eBook PDF) Miller & Freund's Probability and Statistics for Engineers 9th Edition download pdf
46 pages
1syllabus Machine Learning and Data Mining 2015
No ratings yet
1syllabus Machine Learning and Data Mining 2015
9 pages
Input Modeling For Simulation
No ratings yet
Input Modeling For Simulation
48 pages
Schaum's Outlines
100% (1)
Schaum's Outlines
4 pages
ISB AMPBA Brochure Co2021Winter
No ratings yet
ISB AMPBA Brochure Co2021Winter
20 pages
Introduction To Spreadsheet Modeling - Winston Albright
No ratings yet
Introduction To Spreadsheet Modeling - Winston Albright
46 pages
Step by Step Business Math and Statistics Sneak Preview
No ratings yet
Step by Step Business Math and Statistics Sneak Preview
42 pages
Applied Linear Regression for Business Analytics with R A Practical Guide to Data Science with Case Studie 1st Edition by Daniel McGibney ISBN 9783031214806 3031214803 - The special ebook edition is available for download now
100% (8)
Applied Linear Regression for Business Analytics with R A Practical Guide to Data Science with Case Studie 1st Edition by Daniel McGibney ISBN 9783031214806 3031214803 - The special ebook edition is available for download now
81 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
Full download Statistics for Engineers and Scientists, 6th Edition William Navidi - eBook PDF pdf docx
100% (8)
Full download Statistics for Engineers and Scientists, 6th Edition William Navidi - eBook PDF pdf docx
59 pages
Numpy
No ratings yet
Numpy
15 pages
Sampling Distribution and Simulation in R
No ratings yet
Sampling Distribution and Simulation in R
10 pages
Mit Data Science Program
100% (1)
Mit Data Science Program
15 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
Simulation
No ratings yet
Simulation
38 pages
Download Full Intermediate Statistics A Conceptual Course 1st Edition Brett W Pelham PDF All Chapters
100% (4)
Download Full Intermediate Statistics A Conceptual Course 1st Edition Brett W Pelham PDF All Chapters
40 pages
MiniTab Introduction
100% (1)
MiniTab Introduction
124 pages
Ks Trivedi
0% (4)
Ks Trivedi
5 pages
Solutions For Probability and Statistics
No ratings yet
Solutions For Probability and Statistics
318 pages
IBF RECERTIFICATION - 10 POINTS FORECASTING & S&OP. W - LEADERSHIP FORUM & 1-DAY FORECASTING & PLANNING TUTORIAL
No ratings yet
IBF RECERTIFICATION - 10 POINTS FORECASTING & S&OP. W - LEADERSHIP FORUM & 1-DAY FORECASTING & PLANNING TUTORIAL
20 pages
Measure theory and fine properties of functions Ronald F. Gariepy pdf download
75% (4)
Measure theory and fine properties of functions Ronald F. Gariepy pdf download
39 pages
0b755df5-44c6-48da-9ad3-bafb0798629c
100% (2)
0b755df5-44c6-48da-9ad3-bafb0798629c
15 pages
STP531 Course Syllabus Fall2013
No ratings yet
STP531 Course Syllabus Fall2013
2 pages
Minor_in_AI_Vizuara_Engineering_Curriculum_COEP (1)
No ratings yet
Minor_in_AI_Vizuara_Engineering_Curriculum_COEP (1)
9 pages
Ggplot 2
No ratings yet
Ggplot 2
48 pages
Exam View Test Generator User Guide
No ratings yet
Exam View Test Generator User Guide
226 pages
R Programming Course Notes
No ratings yet
R Programming Course Notes
28 pages
Probability and Probability Distribution
No ratings yet
Probability and Probability Distribution
100 pages
Random Variables
No ratings yet
Random Variables
12 pages
Resume - Rajat Chaturvedi
No ratings yet
Resume - Rajat Chaturvedi
3 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
Data Forecast - What If Analysis - Few More Examples
No ratings yet
Data Forecast - What If Analysis - Few More Examples
20 pages
Statistics With Common Sense - David Kault (2003)
No ratings yet
Statistics With Common Sense - David Kault (2003)
272 pages
MA1511 Cheatsheet
No ratings yet
MA1511 Cheatsheet
2 pages
MATLAB Notes Kevin Sheppard
No ratings yet
MATLAB Notes Kevin Sheppard
154 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
No ratings yet
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
143 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
MATH1208AnnotatedBook Imp
No ratings yet
MATH1208AnnotatedBook Imp
145 pages
5 Regression Analysis
No ratings yet
5 Regression Analysis
43 pages
MathType Training Handout
No ratings yet
MathType Training Handout
24 pages
Undergraduate Text
No ratings yet
Undergraduate Text
351 pages
Binomial Distribution Cheat Sheet
No ratings yet
Binomial Distribution Cheat Sheet
1 page
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
Microsoft Dynamics AX 2012 Development Cookbook
From Everand
Microsoft Dynamics AX 2012 Development Cookbook
Mindaugas Pocius
No ratings yet
WT ST102
No ratings yet
WT ST102
201 pages
Advanced Probability and Statistics Concepts
No ratings yet
Advanced Probability and Statistics Concepts
465 pages
Introduction To Statistics WITH SAS
No ratings yet
Introduction To Statistics WITH SAS
238 pages
Asha Internship Body
No ratings yet
Asha Internship Body
22 pages
40 Lesson Plans 8608 Math Sci
No ratings yet
40 Lesson Plans 8608 Math Sci
13 pages
Experiment 3 Buck Converter
No ratings yet
Experiment 3 Buck Converter
8 pages
Module 3 Continuity of Functions - SY2021 2022
No ratings yet
Module 3 Continuity of Functions - SY2021 2022
11 pages
Jonathan James Grattage - A Functional Quantum Programming Language
No ratings yet
Jonathan James Grattage - A Functional Quantum Programming Language
213 pages
Signal & Systems Outline
No ratings yet
Signal & Systems Outline
3 pages
CAD Geometry Intro
No ratings yet
CAD Geometry Intro
29 pages
Coursework 2
No ratings yet
Coursework 2
4 pages
Group 1 - Indirect Proof Activity
No ratings yet
Group 1 - Indirect Proof Activity
4 pages
Differential Calculus Notes
No ratings yet
Differential Calculus Notes
16 pages
Compact SIW Filters With Transmission Zeros: A Review and Current Trends
No ratings yet
Compact SIW Filters With Transmission Zeros: A Review and Current Trends
9 pages
37 Assignment Eng-1
No ratings yet
37 Assignment Eng-1
12 pages
Nonlinear Systems
No ratings yet
Nonlinear Systems
39 pages
Some Specific Properties of Rician PDF and Possibility of Its Application in Modeling of ITS Infrastructures
No ratings yet
Some Specific Properties of Rician PDF and Possibility of Its Application in Modeling of ITS Infrastructures
4 pages
4.matrices Question Papers
No ratings yet
4.matrices Question Papers
7 pages
Awr White Paper May 27 EM Simulation
No ratings yet
Awr White Paper May 27 EM Simulation
6 pages
Q1 - 1.2 - Difference of Two Squares
No ratings yet
Q1 - 1.2 - Difference of Two Squares
12 pages
SFSB Undergraduate Study
No ratings yet
SFSB Undergraduate Study
54 pages
SC Industrial Applications
No ratings yet
SC Industrial Applications
23 pages
Adaptive Feature Analysis in Target Detection and
No ratings yet
Adaptive Feature Analysis in Target Detection and
14 pages
Gautam Unr 0139D 13925
No ratings yet
Gautam Unr 0139D 13925
132 pages
Chapter 6 MP
No ratings yet
Chapter 6 MP
30 pages
Module-4 New PPT Edited
No ratings yet
Module-4 New PPT Edited
45 pages
SETS DPP 1 Solution Min
No ratings yet
SETS DPP 1 Solution Min
3 pages
(1992) Investigation of Dynamic Soil Resistance On Piles Using GRLWEAP PDF
No ratings yet
(1992) Investigation of Dynamic Soil Resistance On Piles Using GRLWEAP PDF
6 pages
Namma Kalvi 5th Maths Ganga Guide Term 1 em 219179
No ratings yet
Namma Kalvi 5th Maths Ganga Guide Term 1 em 219179
73 pages
Statistics Revision 2
No ratings yet
Statistics Revision 2
8 pages
Stability in Z Plane: Dr. Sadeq Al-Majidi
No ratings yet
Stability in Z Plane: Dr. Sadeq Al-Majidi
11 pages
Application of DFT Filter Bank To Power Frequency Harmonic Measurement
No ratings yet
Application of DFT Filter Bank To Power Frequency Harmonic Measurement
5 pages