0% found this document useful (0 votes)

8 views375 pages

Statistics For Decision-Making 2024

The document discusses the importance of statistics in decision-making, emphasizing its role in interpreting data and informing business strategies. It outlines the definitions, types, and classifications of data, as well as various sampling methods and potential errors in data collection. The content highlights the growing reliance on statistical evidence in the digital age for effective management and operational decisions.

Uploaded by

fareed23ahmed4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views375 pages

Statistics For Decision-Making 2024

Uploaded by

fareed23ahmed4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 375

Statistics for Decision-Making

Carl Hope Korkpoe

Dept. of Finance
School of Business
University of Cape Coast
Cape Coast, Ghana
1
Introduction

2
The Economist article. . .

3
DATA are everywhere these days; the problem is
making sense of them. That is the role of
statistics, the university course that so many
people dodge or forget. Charles Wheelan, a
professor at Dartmouth College (and a former
Chicago correspondent for The Economist), does
something unique here: he makes statistics
interesting and fun. His book strips the subject of
its complexity to expose the sexy stuff underneath.

4
But. . .
“I keep saying the sexy job in the next ten years
will be statisticians. People think I’m joking, but
who would’ve guessed that computer engineers
would’ve been the sexy job of the 1990s?” - Hal
Varian, Google’s Chief Economist in
The McKinsey Quarterly, January 2009.

5
6
In print. . .
• In 2014, LinkedIn reported the skill of
“statistical analysis” as the number one
hottest skill that resulted in a job hire.

7
In print. . .

• In a New York Times interview, Google’s senior

vice president of people operations Laszlo Bock
stated, “I took statistics at business school, and
it was transformative for my career. Analytical
training gives you a skill set that differentiates
you from most people in the labor market.
8
Statistics are everywhere. . .
• Stock prices, market index, exchange rates,
interest rates, inflation, GDP growth. . .
• Health. . .
• Pharmaceutical industry
• Education
• ...
• Urban planning. . .

9
So what is Statistics?
• Statistics is the science of learning from data, and
of measuring, controlling and communicating
uncertainty.
• Statistics is the science of drawing conclusions
from data with the aid of the mathematics of
probability.
• Statistics the explanation of variation in the
context of what remains unexplained.
• Statistics is a collection of procedures and
principles for gaining information in order to
make decisions when faced with uncertainty.
10
Four Key Elements Of Statistics
• These definitions highlight four key elements
of statistics.
• Data – the raw material
• Information – the goal
• Uncertainty – the context
• Probability – the tool

11
A maze or toolbox?

12
13
14
Statistical Decision-making
• A defining business trend in the Digital Age has been
the growth in the volume and the use of quantitative
data.
• Increasingly, decisions once based on management
intuition and experience now rely on empirical
evidence drawn from statistical data.
• As the volume of data sets grow larger, the term "big
data" has become a buzzword.
• Statistical evidence can inform business leaders about
how their companies perform, the effectiveness of
their business operations and information about their
customers.
15
Quote
• In God we trust. The rest should bring data.

16
Data
• Data refers to the individual pieces we collect
about an entity eg. firm size, annual revenues,
share prices, etc.
• Data is the raw material of knowledge.
• We process data to get knowledge.

Data Process Knowledge

17
For example. . .

18
Where does data come from?
• Data comes from the entities of interest to us
as researchers, managers, observers.
• We might be interested in the entirety of the
observations available to us.
• For reasons of cost and time, we can only
collect data from a subset of the observations
available.
• These requires we carefully select from this
observations to avoid skewing the results.
19
Classification of Data for Analysis
• Three basic data classification exist for analysis.
• These are cross-sectional data, time series data, and panel
data.
• Cross-sectional data is taken at a point in time eg. how
does income affect consumption? What determines your
wage?
• Time series data is taken over time. Can be regular or
irregular eg. what is the performance of UNL’s stock price
over the last 5 years?
• Panel data is cross-sectional data over time. It reveals the
changes in a set of related variables over time eg. GLSS.
20
Some Terminology
• Population: All individuals, objects, firms or
measurements whose properties are being
studied.
• Sample: A subset of the population studied.
• Representative Sample: A subset of the
population that has the same characteristics
as the population.
• Variable: A characteristic of interest for each
person or object in a population

21
Some Terminology
• Numerical Variable: Variables that take on values that are
indicated by numbers.
• Categorical Variable: Variables that take on values that are
names or labels.
• Parameter: A number that is used to represent a population
characteristic and that generally cannot be determined easily.
• Statistic: A numerical characteristic of the sample; a statistic
estimates the corresponding population parameter.
• Proportion: The number of successes divided by the total
number in the sample.
• Probability: A number between zero and one, inclusive, that
gives the likelihood that a specific event will occur .

22
Back to Data. . .
• A statistical analysis starts with a set of data. We
construct a set of data by first deciding what cases or
units we want to study.
• For each case, we record information about
characteristics that we call variables.
• Data a set of observations (a set of possible outcomes).
• Data can be put into two groups: qualitative (an
attribute whose value is indicated by a label) or
quantitative (an attribute whose value is indicated by a
number).
• Quantitative data can be separated into two subgroups:
discrete and continuous.
23
Back to Data
• Data is discrete if it is the result of counting
(such as the number of students of a given
ethnic group in a class or the number of books
on a shelf).
• Data is continuous if it is the result of
measuring (such as distance traveled or
weight of luggage)

24
Sources of Data
• Anecdotal data come from stories or reports about cases
that do not necessarily represent a larger group of cases.
• Available data are data that were produced for some
other purpose but that may help answer a question of
interest.
• A sample survey collects data from a sample of cases that
represent some larger population of cases.
• A census collects data from all cases in the population of
interest.
• In an experiment, a treatment is imposed and the
responses are recorded.
25
Types of Data
• There are three type of data for our statistical
work.
• Cross-Sectional:
– A set of data values observed at a fixed point in
time (e.g. bank data about its loan customers)
– The wage equation
• Time-Series:
– a set of consecutive data values observed at
successive points in time (e.g. stock price on daily
basis for a year)

26
Types of Data
• Panel/Longitudinal Data
• When we collect cross-sectional data across
time, we form a panel or longitudinal data.
Sales (in $1000s)
2009 2010 2011 2012
Time
Accra 435 460 475 490 Series
Data
Ho 320 345 375 395
Cape Coast 405 390 410 395
Koforidua 260 270 285 280
Cross Sectional
Data

27
Levels of Measurement of Data
• Statisticians use different types of variables to
describe the characteristics of a population.
• Usually a more detailed distinction called the
levels of measurement is used when examining
the information that is collected for a variable.
• Nominal measurement
• Ordinal
• Interval
• Ratio

28
Levels of Measurement of Data
• A nominal measurement is one in which the values of the variable
are names.
• Nominal data are considered the lowest or weakest type of data,
since numerical identification is chosen strictly for convenience and
does not imply ranking of responses.
• The values of nominal variables are words that describe the
categories or classes of responses.
• The values of the gender variable are male and female; the values of
Do you own a car? are yes and no.
• We arbitrarily assign a code or number to each response. However,
this number has no meaning other than for categorizing.
• For example, we could code gender responses or yes/no responses
as follows: 1 = Male; 2 = Female, 1 = Yes; 2 = No
29
Levels of Measurement of Data
• Ordinal data indicate the rank ordering of items, and similar to nominal
data the values are words that describe responses.
• Some examples of ordinal data and possible codes are as follows:
• 1. Product quality rating (1: poor; 2: average; 3: good)
• 2. Satisfaction rating with your current Internet provider (1: very
dissatisfied; 2: moderately dissatisfied; 3: no opinion; 4: moderately
satisfied; 5: very satisfied)
• 3. Consumer preference among three different types of soft drink (1:
most preferred; 2: second choice; 3: third choice)
• In these examples the responses are ordinal, or put into a rank order, but
there is no measurable meaning to the “difference” between responses.
• That is, the difference between your first and second choices may not be
the same as the difference between your second and third choices.

30
Levels of Measurement of Data
• Interval and ratio levels of measurement refer to data obtained
from numerical variables, and meaning is given to the difference
between measurements.
• An interval scale indicates rank and distance from an arbitrary zero
measured in unit intervals.
• That is, data are provided relative to an arbitrarily determined
benchmark.
• Temperature is a classic example of this level of measurement,
with arbitrarily determined benchmarks generally based on either
Fahrenheit or Celsius degrees.
• Suppose that it is 50°C in Ouagadougu, and only 25°C in Koforidua.
We can conclude that the difference in temperature is 25°C, but
we cannot say that it is two times as warm in Ouagadougu as it is
in Koforidua.
31
Levels of Measurement of Data
• Ratio data indicate both rank and distance
from a natural zero, with ratios of two
measures having meaning.
• A person who weighs 200 pounds is twice the
weight of a person who weighs 100 pounds; a
person who is 40 years old is twice the age of
someone who is 20 years old.

32
Schematically

33
Exercise
• For each of the following samples, state what type of data have been
collected (i.e. nominal, Ordinal, Interval, Ratio, Discrete and/or
Continuous):
1. The gender of students in a class.
2. The height in millimeters of students in a class.
3. The number of siblings (i.e. brothers and/or sisters) for each individual in a
class.
4. The birth order (i.e. first born, second born) of each individual in a class.
5. The distance that each individual in a class travels to get to college.
6. The type of degree (e.g. BSc, BEng, BA) that each individual in a class is
studying

34
Statistics. . .

35
Sampling
• A sample should have the same characteristics
as the population it is representing.
• Statisticians use various methods of random
sampling in an attempt to achieve this goal.
• There are several different methods of random
sampling.
• In each form of random sampling, each
member of a population initially has an equal
chance of being selected for the sample.
36
Random Sampling Methods
• Simple random sample
• Stratified sample
• Cluster sample
• Systematic sample
• Convenience sample
• Voluntary Response Sample

37
Simple random sample
• It is the easiest method to describe.
• Any group of n individuals is equally likely to
be chosen by any other group of n individuals
if the simple random sampling technique is
used. In other words, each sample of the same
size has an equal chance of being selected.

38
Stratified Sampling
• To choose a stratified sample, we divide the
population into groups called strata and then
take a proportionate number from each
stratum.

39
Cluster sample
• To choose a cluster sample, divide the population into
clusters (groups) and then randomly select some of the
clusters.
• All the members from these clusters are in the cluster
sample.
• For example, randomly sampling from the departments of
the B-School, the departments make up the cluster sample.
Divide your college faculty by department. The departments
are the clusters. Number each department, and then choose
four different numbers using simple random sampling. All
members of the four departments with those numbers are
the cluster sample.
40
Systematic sample
• To choose a systematic sample, randomly select
a starting point and take every nth piece of data
from a listing of the population. For example,
suppose you have to do a phone survey. Your
phone book contains 20,000 residence listings.
You must choose 400 names for the sample.
Number the population 1–20,000 and then use a
simple random sample to pick a number that
represents the first name in the sample.

41
Convenience sampling
• A type of sampling that is non-random is convenience
sampling.
• Convenience sampling involves using results that are
readily available.
• For example, a computer software store conducts a
marketing study by interviewing potential customers who
happen to be in the store browsing through the available
software.
• The results of convenience sampling may be very good in
some cases and highly biased (favor certain outcomes) in
others.
42
Voluntary Response Sample*
• A voluntary response sample consists of
people who choose themselves by responding
to a general appeal.
• Voluntary response samples are biased
because people with strong opinions,
especially negative opinions, are most likely to
• respond.

43
44
Careful!
• Sampling data should be done very carefully.
• Collecting data carelessly can have devastating
results.
• Surveys sent to households and then returned
may be very biased (they may favor a certain
group).
• It is better for the person conducting the
survey to select the sample respondents.

45
What can go wrong?
• Errors can occur during the sampling process.
• Sampling error can include both systematic
sampling error and random sampling error.
• Systematic sampling error is the fault of the
investigation, but random sampling error is not.
• When errors are systematic, they bias the sample in
one direction.
• Under these circumstances, the sample does not
truly represent the population of interest.
• Systematic error occurs when the sample is not
drawn properly.
46
What can go wrong?
• It can also occur if names are dropped from the
sample list because some individuals were difficult
to locate or uncooperative.
• Random sampling error, as contrasted to systematic
sampling error, is often referred to as chance error.
• Purely by chance, samples drawn from the same
population will rarely provide identical estimates of
the population parameter of interest.
• These estimates will vary from sample to sample.

47
Problems
• Sampling error can affect inferences based on
sampling in two important situations.
• In one situation, we may wish to generalize from
the sample to a particular population.
• With a small sampling error, we can feel more
confident that our sample is representative of the
population.
• We can therefore feel reasonably comfortable
about generalizing from the sample to the
population. Survey research is most concerned
about this kind of sampling error.
48
Things to consider. . .
• The second situation in which sampling error plays a
role is when we wish to determine whether two or
more samples were drawn from the same or different
populations.
• In this case, we are asking if two or more samples are
sufficiently different to rule out factors due to chance.
• An example of this situation is when we ask the
question “Did the group that received the experimental
treatment really differ from the group that did not
receive the treatment other than on the basis of
• chance?”
49
Descriptive Statistics

• Descriptive statistics is the analysis of data

that helps describe, show or summarize data
in a meaningful way such that patterns in the
data would emerge.
• Descriptive statistics do not allow us to make
conclusions beyond the data we have
analysed or reach conclusions regarding any
hypotheses we might have made.

50
• Descriptive statistics are very important because
• If we simply presented our raw data it would be hard to
visualize what the data was showing, especially if there
was a lot of it.
• Descriptive statistics therefore enables us to present
the data in a more meaningful way, which allows
simpler interpretation of the data.
• For example, if we had the results of 100 pieces of
students' coursework, we may be interested in the
overall performance of those students.
• We would also be interested in the distribution or
spread of the marks.
• Descriptive statistics allow us to do this.
51
• We often do not have access to the whole population
for investigation.
• Usually, we have only a limited number of data
instead.
• For example, you might be interested in the exam
marks of all students in the Ghana.
• It is not feasible to measure all exam marks of all
students in the whole of the UK so you have to
measure a smaller sample of students (e.g., 100
students), which are used to represent the larger
population of all UK students.
• Properties of samples, such as the mean or standard
deviation, are not called parameters, but statistics.
52
Descriptive Statistics
• Typically, there are two general types of statistic
that are used to describe data.
• Measures of central tendency: these are ways of
describing the central position of a frequency
distribution for a group of data.
• In this case, the frequency distribution is simply
the distribution and pattern of marks scored by the
100 students from the lowest to the highest.
• We can describe this central position using a
number of statistics, including the mode, median,
and mean.
53
Measures of Spread/Dispersion
• These are ways of summarizing a group of data by
describing how spread out the scores are.
• For example, the mean score of our 100 students may be
65 out of 100. However, not all students will have scored
65 marks.
• Rather, their scores will be spread out.
• Some will be lower and others higher.
• Measures of spread help us to summarize how spread
out these scores are.
• To describe this spread, a number of statistics are
available to us, including the range, quartiles, absolute
deviation, variance and standard deviation.
54
• For example, if you were only interested in the
exam marks of 100 students, the 100 students
would represent your population.
• Descriptive statistics are applied to
populations, and the properties of
populations, like the mean or standard
deviation, are called parameters as they
represent the whole population (i.e.,
everybody you are interested in).

55
Inferential Statistics

• We have seen that descriptive statistics provide

information about our immediate group of data.
• For example, we could calculate the mean and
standard deviation of the exam marks for the
100 students and this could provide valuable
information about this group of 100 students.
• Any group of data like this, which includes all the
data you are interested in, is called a population.
• A population can be small or large, as long as it
includes all the data you are interested in.

56
Inferential Statistics Defined
• Inferential statistics are techniques that allow
us to use of a samples to make generalizations
about the populations from which the
samples were drawn.
• It is, therefore, important that the sample
accurately represents the population.
• The process of getting the sample is called
sampling.

57
• Inferential statistics arise out of the fact that
sampling naturally incurs sampling error and
thus a sample is not expected to perfectly
represent the population.
• The methods of inferential statistics are
• (1) the estimation of parameter(s) and
• (2) testing of statistical hypotheses.

58
Measures Of Central Tendency
• We move on from visualization techniques to numerical
measures that can be used to quantitatively summarize data.

• Let’s start with measures of central tendency, which looks to

summarize the “typical” or “average” value in a sample.

• There are three main measures of central tendency: the mean,

the median and the mode.

• We will first describe the three measures and then discuss the
circumstances in which each should be used.

59
Arithmetic Mean
• The arithmetic mean (or simply mean) of a set of data is
the sum of the data values divided by the number of
observations.
• If the data set is the entire population of data, then the
population mean, m, is a parameter given by

where N = population size and means “the sum of Xs.”

• If the data set is from a sample, then the sample mean, x,
is a statistic given by where

n = sample size. The mean is appropriate for numerical data.

60
Median
• The median is the middle observation of a set
of observations that are arranged in increasing
(or decreasing) order.
• If the sample size, n, is an odd number, the
median is the middle observation.
• If the sample size, n, is an even number, the
median is the average of the two middle
observations.
• The median will be the number located in the
0.50(n + 1)th ordered position.
61
Mode
• The mode, if one exists, is the most frequently
occurring value.
• A distribution with one mode is called
unimodal; with two modes, it is called
bimodal; and with more than two modes, the
distribution is said to be multimodal.
• The mode is most commonly used with
categorical data.

62
Example 1
• if we were taking the weights of tourists. The
weights are recorded below:
• S={96,103,121,114,98,111,107,289,115,101,
114, 100} where S denotes the sample.
• Compute the following in Python
• mean
• medianConsider
• mode
• Is there an unusual value in the dataset?

63
Measure of Central Tendency to Use?
• For continuous or discrete data, the mode is rarely used.
• The use of the mean or median depends upon whether our data
distribution is symmetric or skewed, and whether there are outliers any
outliers.
• The mean, median and mode will all have approximately the same value
if the data are symmetrically distributed.
• If the skew is negative (i.e. the left tail of the distribution is longer than
the right tail), then the mode will be larger than the median, which in
turn will be larger than the mean.
• The converse is true for positively skewed distributions.
• Outliers are data points that are very different from the others in the data
set being analyzed.
• It is important to detect them as they may be due to errors in data
gathering (e.g. a height entered in meters rather than centimeters).
• Outliers should not be removed without there being a good reason to do
so.
64
Outliers
• An outlier is an extreme point that doesn’t really ‘lie’ with the rest
of the data.
• Consider a sample of the litres of water drank by ten friends in a
month
• 9, 37, 39, 39, 43, 44, 48, 48, 48, 89
• It’s clear that there seem to be two outliers: the person who
drank only 9 litres and the person who drank a whopping 89
litres.
• However, if someone asked you why these points are outliers,
how would you respond? “Because they are really big or really
small?”
• As in any field of analysis, it’s important to quantify exactly how
we make these decisions.
• We will use two methods: the z-score and Box-Plot methods later.

65
Exercise 1: Demand for Bottled Water
• The demand for bottled water increases
during the harmattan season in Ghana.
• The number of 1-gallon bottles of water sold
for a random sample of n = 12 hours in one
store during hurricane season is:
60 84 65 67 75 72
80 85 63 82 70 75
• Describe the central tendency of the data.

66
Measures of Dispersion/Variation
• Measures of central tendency only summarize the
typical or average value of the data, and they provide
no information on its spread, variation or dispersion.
• There are two main measures to summarize the
spread of data, which are the standard deviation and
the interquartile range (IQR).

67
Standard Deviation
• The sample standard deviation s is defined by

•
• where, as before, n is the sample size, are the
individual sample values, and is the sample mean.

68
• Note the following points about the standard
deviation:
• It has the same units as the data, for example,
calculating s for our height data would result in a
value in centimeters.
• It is always positive.
• It requires calculation of the mean of the data, .
• The division is by (n − 1), not n. This makes the
value of s a better estimate of the population
standard deviation.
• The variance is the standard deviation squared,
that is, .
69
Interquartile Range
• To calculate the interquartile range, we need to calculate
the values of the upper and lower quartiles of our data.
• The concept of a quartile is related to the concept of the
median, as explained below:
• The median is the data value that has 50%of the values
above it and 50% of values below.
• The upper quartile is the data value that has 25% of values
above it and 75% of values below.
• The lower quartile is the data value that has 75% of values
above it and 25% of values below.
• The interquartile range (IQR) is then calculated as
IQR = upper quartile−lower quartile.
70
Quartiles

71
Illustration

72
Range
• As well as the IQR, the overall range of the
data is also regularly reported as a measure of
variation.
• The range is simply
• range = maximum value − minimum value.

73
Which Measure of Variation to Use?
• The answer to this question is similar to the
answer to the question of when to use the
mean or median:
• Use the mean and standard deviation if your
data distribution is symmetric with no outliers.
• Use the median and IQR if your data
distribution is skewed or has outliers.

74
Skewness
• Skewness is basically a measure of asymmetry, and
the easiest way to explain it is by drawing some
pictures.
• If the data tend to have a lot of extreme small values
(i.e., the lower tail is “longer” than the upper tail) and
not so many extremely large values (right panel), then
we say that the data are negatively skewed.
• On the other hand, if there are more extremely large
values than extremely small ones (left panel) we say
that the data are positively skewed.
• That’s the qualitative idea behind skewness.

75
76
Symmetrical = No skew

77
Left skew

78
Right skew

79
80
81
• The actual formula for the skewness of a data
set is as follows:

sample mean, and 𝜎̂

• where N is the number of observations, is the
is the standard
deviation (the “divide by N−1” version, that is).

82
Kurtosis
• Kurtosis is a measure of the “tailedness”, or
outlier character, of the data.
• In other words, kurtosis is a statistical measure
that defines how heavily the tails of a
distribution differ from the tails of a normal
distribution.
• The normal distribution is taken as the
standard.
• It has a kurtosis measure of 3. Anything above
or below it is said to deviate from normality.
83
• The value for kurtosis of a normal distribution
is 3, and the shape is referred to as
mesokurtic. Kurtosis for a standard normal
distribution will be zero.

84
• For tail thickness < 3, the tails are said to be
platykurtic.
• So platykuritc describes a distribution where
the center of the curve will be shorter than a
normal, and the tails will be lighter with fewer
values in the tail.

85
• Leptokurtic describes a distribution where the
value for kurtosis is greater than 3 when
compared to a normal distribution and a value
larger than zero when compared to a standard
normal distribution.
• So for a leptokurtic, kurtosis > 3 compared to
the normal distribution or > 0 compared to
the standard normal distribution.

86
Formulae

87
Mesokurtic

88
Platykurtic

89
Leptokurtic

90
Python Exercise
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• %matplotlib #%matplotlib inline
• from scipy.stats import skew, kurtosis

91
• urlfile='https://raw.githubusercontent.com/burakaydin/materyaller/gh-
pages/ARPASS/dataWBT.csv'

• data = pd.read_csv(urlfile)
• data.head()
• data.columns
• data['wage'].head()
• wage = data['wage']

92
• wage.dropna(inplace = True)

• mean = np.mean(wage)

• mean
• Out[39]: 12703.65552623864

• median = np.median(wage)

• median
• Out[41]: 10800.0

• plt.hist(wage)
93
• plt.xlabel('Wages($)')

• plt.ylabel('Frequency')

• plt.title('Wage Distribution of Sample')

94
95
• skew(wage)
• Out[48]: 4.901206069790337

• kurtosis(wage)
• Out[49]: 64.0760982749893

96
• https://sievo.com/resources/procurement-ana
lytics-demystified
• https://rfp360.com/procurement-analytics/

97
Association in Statistics
• Statisticians, data scientists, business people, etc. are
interested in the relationships between variables.
• For example, if I want to setup a factory to
manufacture luxury cars in Ghana, I have to know
the income of Ghanaians.
• If their income constituent a large percentage of the
price of the cars, chances are that these vehicles will
not sell.
• A social scientist will be concerned about the
relationship between a family’s income and their
consumption. . .

98
• Covariance is a term used to describe the degree to
which two random variables are related to each other.
• There are three ways the two variables can related to
each other.
• Positive: In a positive relation, when one variable
increases, the other also increases eg. when income
increases, consumption also increases on average.
• Negative: Here when one variable increases, there
other decreases eg. price and the demand for a normal
good.
• Zero: Here there is no discernible relationship between
the variables eg. number of babies made in Accra and
the show sizes of their parents!
99
Definition - Covariance
• Covariance gives a measure of the direction
with which two variables vary together.
• A positive value means that there is a direct
relationship: that is, they move in the same
direction.
• A negative value means there is an inverse
relationship, or they move in opposite
direction.
• A zero value mean there is no discernible
relationship between the variables.
100
Example - Covariance
• Let’s consider two variables: how many hours you
spend studying in school and your GPA.
• Hopefully, these have a positive covariance.
• That would mean that the two vary together: the more
you study, the higher the GPA tends to be (and vice
versa).
• An opposite example would be imagining the
covariance between hours exercised and heartbeats
per minute.
• This would likely be negative, since usually people are
in better shape when they exercise more and thus their
hearts tend to pump less per minute.
101
Formulae - Covariance
• The covariance between two variables x and y is given by:

• Compute the covariance between income and consumption.

• df = pd.read_csv('Income Consumption Data.csv')

• Income = df['Income']

• Consumption = df['Consumption']

• cov = np.cov(Consumption, Income)[0][1]

• cov
102
Notes
• Covariance does not give the strength of the
relationship between variables.
• Since covariance relies on units (of whatever
variable you’re measuring) there is no set
gauge to measure how strong a relationship is.
• So, even if you see a covariance of .001, do
not be quick to call it a weak relationship (we
could be measuring something at the
molecular level, maybe).

103
• Another important reminder for covariance is
that it gives association, NOT causation.
• In the example above, we can’t say that studying
harder caused a higher GPA (notice how we use
the word “tends”), only that when you study a
lot you also seem to get a better GPA.
• Without actually performing an experiment, we
can only say that these things tend to vary
together, but we can never say if one actually
causes the other without performing a
controlled experiment.

104
Correlation
• Correlation is the more interesting and perhaps more
important.
• Instead of only indicating direction, correlation
measures direction and strength of a linear
relationship.
• The sign gives direction, the magnitude strength.
• Correlation is always between −1 and 1.
• A Correlation of 0 means there is no linear relationship
between two variables.
• As the correlation moves from 0 to −1 or 1, the
relationship gets stronger and stronger, culminating in
a perfect relationship at either endpoint (−1 or 1).
105
Formulae - Correlation
• The correlation between a pair of variables, x
and y is given by

• Find the correlation between Income and

Consumption. What can you say about the
relationship?

106
Symmetrical?
• How can we decide if our data distribution is
skewed or symmetric?
• Often it is clear from looking at a histogram, but
the following numerical measure can be useful in
making this assessment:

• Using this measure, values greater than 1 denote

a “reasonable” positive skew, values less than −1
show a “reasonable” negative skew, and values
between −1 and 1 show that the distribution is
approximately symmetric.
107
Displaying Data
• Statistical tools and ideas help us examine data to
describe their main features.
• This examination is called exploratory data analysis.
• We want first to simply describe what we see.
• Here are two basic strategies that help us organize
our exploration of a set of data:
• Begin by examining each variable by itself. Then
move on to study the relationships among the
variables.
• Begin with a graph or graphs. Then add numerical
summaries of specific aspects of the data.
108
Categorical variables: Bar graphs and pie
charts
• The values of a categorical variable are labels
for the categories, such as “Yes’’ and “No.’’
• The distribution of a categorical variable lists
the categories and gives either the count or
the percent of cases that fall in each category.

109
How Do you Do online Research?
• A study of 552 first-year
college students asked
about their preferences
for online resources.
• One question asked
them to pick their
favorite.
• Here are the results:

110
Solutions
• Since this is categorical,
we can convert to
percentages and show
as a pie-chart.
• We can also take the
raw counts and plot as a
bar graph

111
Solution

112
Pareto Chart for Cost Analysis
• A bar graph whose categories are ordered from
most frequent to least frequent is called a Pareto
chart.
• Pareto charts are frequently used in quality
control settings.
• There, the purpose is often to identify common
types of defects in a manufactured product.
• Deciding upon strategies for corrective action can
then be based on what would be most effective.
113
Example

114
Solution

115
Quantitative variables: Histograms
• Quantitative variables often take many values.
• A graph of the distribution is clearer if nearby
values are grouped together.
• The most common graph of the distribution of
a single quantitative variable is a histogram.

116
Calls to a Customer Service Center
• Many businesses operate call centers to serve
customers who want to place an order or
make an inquiry.
• Customers want their requests handled
thoroughly. Businesses want to treat
customers well, but they also want to avoid
wasted time on the phone. They, therefore,
monitor the length of calls and encourage
their representatives to keep calls short
117
Calls to a Customer Service Center

118
Histogram of Call Duration

119
Shape of a Distribution
• We can describe graphically the shape of the
distribution by a histogram.
• That is, we can visually determine whether data
are evenly spread from its middle or center.
• Sometimes the center of the data divides a graph
of the distribution into two “mirror images,” so
that the portion on one side of the middle is
nearly identical to the portion on the other side.
• Graphs that have this shape are symmetric; those
without this shape are asymmetric, or skewed.
120
Shape of a Distribution
• Symmetry: The shape of a distribution is said to be
symmetric if the observations are balanced, or
approximately evenly distributed, about its center.
• Skewness: A distribution is skewed, or asymmetric, if
the observations are not symmetrically distributed on
either side of the center.
• A skewed-right distribution (sometimes called
positively skewed) has a tail that extends farther to the
right.
• A skewed-left distribution (sometimes called negatively
skewed) has a tail that extends farther to the left.

121
122
Quantitative variables: Stem-and-leaf plots

• A stem-and-leaf display is an EDA graph that is

an alternative to the histogram.
• Data are grouped according to their leading digits
(called stems), and the final digits (called leaves)
are listed separately for each member of a class.
• The leaves are displayed individually in ascending
order after each of the stems.
• The number of digits in each class indicates the
class frequency.
123
Grades on a Finance Exam
• Describe the following random sample of 10 final exam grades
for an introductory accounting class with a stem-and-leaf
display.
• 88 51 63 85 79 65 79 70 73 77
• Solution: In constructing a stem-and-leaf display, each final
exam grade is separated into two parts. For example, the
grade of 63 is separated as 6|3, where 6 is called a stem; it
appears on the left side of the straight line.
• The number 3 is called a leaf and appears on the right side of
the straight line. We see that the lowest grade was 51, the
high test grade was 88, and most of the students in the
sample earned a grade of C on the finance exam.
124
Stem-and-leaf Display

125
Scatter Plot
• We can prepare a scatter plot by locating one point
for each pair of two variables that represent an
observation in the data set.
• The scatter plot provides a picture of the data,
including the following:
1. The range of each variable
2. The pattern of values over the range
3. A suggestion as to a possible relationship between
the two variables
4. An indication of outliers (extreme points)
126
Entrance Scores and College GPA

127
Scatter Plot

128
Numerical Summaries of Data
• Earlier on we described data graphically, noting
that different graphs are used for categorical and
numerical variables.
• Going forward, we describe data numerically and
observe that different numerical measures are
used for categorical and numerical data.
• In addition, we will look at measures for grouped
data and measures of the direction and strength
of relationships between two variables.

129
Position of

130
Some points to note
• The decision as to whether the mean, median, or mode is the
appropriate measure to describe the central tendency of data is
context specific.
• One factor that influences our choice is the type of data, categorical or
numerical.
• Categorical data are best described by the median or the mode, not
the mean.
• If one person strongly agrees (coded 5) with a particular statement and
another person strongly disagrees (coded 1), is the mean “no opinion”?
• An obvious use of median and mode is by clothing retailers considering
inventory of shoes, shirts, and other such items that are available in
various sizes.
• The size of items sold most often, the mode, is then the one in heaviest
demand.

131
Some points to note
• Numerical data are usually best described by the mean.
• However, we have to consider is the presence of outliers
—that is, observations that are unusually large or
unusually small in comparison to the rest of the data.
• The median is not affected by outliers, but the mean is.
• Whenever there are outliers in the data, we first need to
look for possible causes.
• One cause could be simply an error in data entry.
• The mean will be greater if unusually large outliers are
present, and the mean will be less when the data contain
outliers that are unusually small compared to the rest of
the data.
132
Five-Number Summary
• The five-number summary refers to the five
descriptive measures:
• minimum, first quartile, median, third quartile,
and maximum.

• minimum < Q1 < median < Q3 < maximum

133
Covariance & Correlation
• Is there a relationship between your income
and your expenditure?
• What about your education and the amount
of money your employer pays you?
• Still, what is the relationship between the
sizes of shoes people wear in Cape Coast and
the amount of rainfall?
• What of the amount of goods a household
demand and the price of goods?

134
• In our examples we see that when our incomes
increase, we tend to increase our consumption.
• In the same way, if incomes decrease, we also
decrease our consumption.
• The more educated we are, the more pay (at
least in principle) we expect.
• The cheaper the goods, the more we buy and
vice versa.
• But we can say that there is no relationship
between shoes size and the amount of rainfall
in Cape Coast.
135
• These are relationships.
• We can describe them using the positive,
negative signs and zero.
• So. . .
• Income and consumption (+)
• Education and wage (+)
• Price and good demanded (-)
• Shoe sizes and amount of rainfall (0)
• Positive and negative relations are illustrated
in the diagrams on next slide.
136
137
138
Typical Example of Negative Covariance

139
Definitions
• Covariance is therefore the relationship
between a pair of variables.
• When one variable X increases and at the same
time Y increases, we say they have a positive
covariance.
• When one variable X increases and at the same
time Y decreases, we say they have a negative
covariance.
• If there is no obvious relationship between the
variables X and Y, we say they have zero
covariance.
140
• Statistically, we use the following formulae to

• calculate the covariance between two random

variables X and Y.
• is the mean of X and is the mean of Y.
• is the sample size.
• is sometimes written as .

141
Example: Covariance
• Compute the covariance in the returns (%)
between the returns of Crane Analytics and
Heron Computing as shown below.
Year Crane Analytics Heron Computing
2008 1 3
2009 -2 2
2010 3 4
2011 0 6
2012 3 0

• Is the covariance positive of negative? What

does it mean practically?
142
Correlation
• Correlation is a statistical measure that
expresses the extent to which two variables are
linearly related.
• It’s a common tool for describing simple
relationships without making a statement
about cause and effect.
• It is the results of dividing the covariance by the
standard deviation of the respective variables.
• We say that correlation, or , quantifies the
strength of the relationship.
143
Properties of Correlation
• The closer r is to zero, the weaker the linear relationship.
• Positive r values indicate a positive correlation, where the
values of both variables tend to increase together.
• Negative r values indicate a negative correlation, where
the values of one variable tend to increase when the
values of the other variable decrease.
• "Unit-free measure" means that correlations exist on
their own scale: in our example, the number given for r is
not on the same scale as either elevation or temperature.
• This is different from other summary statistics. For
instance, the mean of the elevation measurements is on
the same scale as its variable.

144
Statistical Inference and Estimation

145
Statistical Sampling
• Sampling is the foundation of statistical analysis.
• We use sample data in business analytics applications for
many purposes.
• For example, we might wish to estimate the mean,
variance, or proportion of a very large or unknown
population; provide values for inputs in decision models;
understand customer satisfaction; reach a conclusion as to
which of several sales strategies is more effective; or
understand if a change in a process resulted in an
improvement.
• We discussed sampling methods used to estimate
population parameters, and how we can assess the error
inherent in sampling.
146
Estimating Population Parameters
• Sample data provide the basis for many useful
analyses to support decision making.
• Estimation involves assessing the value of an
unknown population parameter—such as a
population mean, population proportion, or
population variance—using sample data.

147
Estimating Population Parameters
• Estimators are the measures used to estimate
population parameters.
• For example, we use the sample mean to
estimate a population mean µ.
• The sample variance estimates a population
variance , and the sample proportion p
estimates a population proportion π.
• A point estimate is a single number derived
from sample data that is used to estimate the
value of a population parameter.
148
Unbiased Estimators
• Statisticians develop many types of estimators,
from a theoretical as well as a practical
perspective.
• It is important that they “truly estimate” the
population parameters they are supposed to
estimate.
• Suppose we perform an experiment in which we
repeatedly sampled from a population and
computed a point estimate for a population
parameter.
149
Unbiased Estimators
• Each individual point estimate will vary from the
population parameter.
• However, we would hope that the long-term
average (expected value) of all possible point
estimates would equal the population parameter.
• If the expected value of an estimator equals the
population parameter it is intended to estimate,
the estimator is said to be unbiased.
• If this is not true, the estimator is called biased
and will not provide correct results.
150
For example. . .
• The population variance is computed by

• whereas the sample variance is computed by

the formula

151
Errors in Point Estimation
• One of the drawbacks of using point estimates
is that they do not provide any indication of
the magnitude of the potential error in the
estimate.

152
Look at this story. . .
• A national newspaper in Ghana reported that,
based on a FWSC survey, university teachers
are the highest-paid workers in the Ghana,
with an average salary of GHC150,004.
• Actual averages for two local universities were
less than GHC70,000. What happened?

153
Well. . .
• As reported in a follow-up story, the sample
size was very small and included a large
number of highly paid medical school faculty;
as a result, there was a significant error in the
point estimate that was used.
• When we sample, the estimators we use—
such as a sample mean, sample proportion, or
sample variance — are actually random
variables that are characterized by some
distribution.
154
• By knowing what this distribution is, we can
use probability theory to quantify the
uncertainty associated with the estimator.
• To understand this, we first need to discuss
sampling error and sampling distributions
(again).

155
Sampling Error
• Different samples from the same population
have different characteristics—for example,
variations in the mean, standard deviation,
frequency distribution, and so on.
• Sampling (statistical) error occurs because
samples are only a subset of the total
population.
• Sampling error is inherent in any sampling
process, and although it can be minimized, it
cannot be totally avoided.
156
• Another type of error, called non-sampling
error, occurs when the sample does not
represent the target population adequately.
• This is generally a result of poor sample
design, such as using a convenience sample
when a simple random sample would have
been more appropriate or choosing the wrong
population frame.
• It may also result from inadequate data
reliability.

157
Note carefully. . .

• To draw good conclusions from samples, the statistician

needs to eliminate non-sampling error and understand
the nature of sampling error.
• Sampling error depends on the size of the sample relative
to the population.
• Thus, determining the number of samples to take is
essentially a statistical issue that is based on the accuracy
of the estimates needed to draw a useful conclusion.

158
Sampling Distribution
• Let’s take a population
and sample it.
• The sample is as a result
of simple random sample
(why SRS?).
• Suppose what we are
looking for is the average
height.
• Again suppose sample is
with replacement.
• We can get multiple
samples, isn’t it?
159
Sampling Distribution of the Mean
• So if we replace the
sample and take
another sample and
find the mean. . .
• We can go on and on
and find several sample
and means.
• Suppose these means
are , , , , . . .,

160
Sampling Distribution of the Proportion
• We can do the same
thing for other sample
statistics like the
proportion as shown on
the right.
• We are going to have a
sample of
proportions , , , , . . .,

161
Statistics Have a Distribution
• Each of these statistics
from the sample (mean,
proportion, standard
deviation, etc.) varies
with the samples.
• And because they vary,
they have a distribution.

162
163
Probability

• Outcomes, Sample Spaces, and Events

• We start by defining what can be an outcome
of a random experiment:

164
• Definition
– outcome
• A result of a random experiment that cannot be further
decomposed.

• Consider rolling a standard six-sided die and observing the

top face.
• The possible outcomes are
1, 2, 3, 4, 5, and 6.
• We place all possible outcomes into a set called the sample
space
• Definition
sample space
• The set of all possible outcomes of a random experiment.
165
• Event
• Any subset E of sample space S is known as an
event.

166
• Probability
• A probability is a number between 0 and 1 that we
attach to each element of the sample space.
• Informally, that number simply describes he
chance of that event happening.
• A probability of 1 means that the event will
happen for sure.
• A probability of 0 means that we are talking about
an impossible event.
• The number in between represent various degrees
of certainty about the occurrence of the event.

167
Definition of a ‘Distribution’
• A statistical distribution or probability
distribution is a mathematical function that
provides the probabilities of the occurrence of
various possible outcomes in an experiment.
• In plain English. . .
– A distribution is simply a collection of data, or
scores, on a variable. Usually, these scores are
arranged in order from smallest to largest and
then they can be presented graphically (Statistics
in Plain English, Third Edition, 2010)

168
Distribution of Heights

169
When we plot the means. . .

170
• This is called the sampling
distribution of the mean.
• Formally, we define the
sampling distribution of a
statistic as distribution of the
statistic for all possible
samples from the same
population of a given size.
• Like all distributions, this
one will also have its own
properties like its own mean
and standard distribution.
171
Properties of the Sampling Distribution
• The overall shape of the distribution is
symmetric and approximately normal.
• There are no outliers or other important
deviations from the overall pattern.
• The center of the distribution is very close to
the true population mean.

172
Mathematical Properties
• The mean of the sample
means equal to the
population mean ie
• The standard deviation
of the sample mean
called the standard
error is where s is the
sample standard
deviation and n is the
sample size.
173
Statistical Inference
• Once we have gathered our sample data, we
can try to learn something about the larger
population.
• A statistic is a summary measure of the
sample data used to infer something about
the larger population.
• Prior to sampling, the statistic is called an
estimator and is merely a formula.

174
• For example, the sample mean is this formula:

• Before we have any data, this is a random

variable whose value is unknown.
• A specific value of an estimator is computed
after collecting the sample data.
• After sampling, this statistic is called an
estimate.
• For example, we might find that = 5’9’’.
• This might be our estimate of the average
height of a Ghanaian male.
175
Law of Large Numbers
• The Law of Large Numbers (LLN) is a theorem
that describes the result of performing the
same experiment a large number of times.
• The average of the results obtained from a
large number of trials should be close to the
expected value and will tend to become closer
as more trials are performed.

176
Towards CLT
• Suppose you are
interested in some
population mean μ.
• You might be interested
in the average income,
average hours of sleep,
or the average number
of children of all
Ghanaian.

177
• We draw a random sample from the
population. We might collect the sample
observations .

• Here the sample size is n.

• Once we have done this, what should we do

with the data to come up with the best
estimate of the population mean μ?

178
Discussion
• First, the standard deviation of the sampling
distribution of the mean, called the standard error
of the mean, is computed as:
• Standard error of the mean =
• where is the standard deviation of the population
from which the individual observations are drawn
and n is the sample size.
• We use the sample standard deviation if the is not
known.
• From this formula, we see that as n increases, the
standard error decreases.
179
• This suggests that the estimates of the mean
that we obtain from larger sample sizes
provide greater accuracy in estimating the
true population mean. In other words, larger
sample sizes have less sampling error.

180
181
Example
• Suppose the variance of a population is 8.33.
• Compute the standard error of the mean for
each of the sample sizes:
• 10, 25, 100, 500.

182
Second Result
• The second result is called the central limit
theorem, one of the most important practical
results in statistics that makes systematic
inference possible.
• The central limit theorem states that if the
sample size is large enough, the sampling
distribution of the mean is approximately
normally distributed, regardless of the
distribution of the population and that the mean
of the sampling distribution will be the same as
that of the population.
183
184
Normal Distribution

185
• Whenever the variates
eg. heights of students,
income of farmers,
number of years people
have worked with us,
etc. follow the nicely
drawn curve on the
right, we say it has a
normal or Gaussian
distribution.
• The normal distribution
is symmetrical about the
line which goes through
the middle.
186
• The points of
inflection from the
mean give you the
standard deviation.
• The second point
of inflection give
the second
standard deviation
from the mean and
so on.
187
• All distributions denote
probabilities.
• The maximum
probability under any
distribution is one.
• In the case of the
normal distribution too,
the are under the curve
(pdf) is one.

188
Formally. . .
• Characteristics of a Normal Curve are. . .
• 1. All normal curves are bell-shaped with
points of inflection at ,
• All normal curves are symmetric about the
mean .

189
• The area under an entire normal curve is 1.
• All normal curves are positive for all . That is,
for all .
• The height of any normal curve is maximized
at .
• The shape of any normal curve depends on its
mean and standard deviation .

190
Finding the Area Under the Curve
• The equation of the normal distribution curve
also known as the probability density curve is
• for , , and .
• The mean of X is and the variance is .
• This is written simply as

191
Exercise
• Let denote the mark of students on the
Statistics exam. It has long been known that
the marks follow a normal distribution with
mean 68 and standard deviation of 16. That is,
. Draw a picture of the normal curve, that is,
the distribution, of .

192
Finding Normal Probabilities
• Let equal the IQ of a randomly selected
Ghanaian. Assume . What is the probability
that a randomly selected Ghanaian has an IQ
below 90?

193
Partial Solution
• As is the case with all continuous distributions,
finding the probability involves finding the
area under the curve and to the left of the line
.

194
• That is:

• That is a mouthful!
• The integration is simply hard to do.
• We can bypass this by the use of normal table.
• All we need to do is transform our distribution
to a distribution and then use the cumulative
probability table for the distribution to
calculate our desired probability.

195
Proof

• We want to show that is equivalent to .

• We deduct from both sides first and dividing
relation by ie
• becomes
• .
• Divide both sides by ie
• =>
• Making , we have Z

196
Normal versus Standard Normal
Distribution

197
Solved Examples
• Suppose that the starting salary of UCC HR
graduates is normally distributed with a mean
of 54,400GHS and a standard deviation of
11,000GHS. If we randomly select 25 college
graduates, what is the probability that the
average salary of these graduates is between
56,000GHS and 58,000GHS?

198
Solved Examples
• Suppose that GRE scores are normally
distributed with a mean of 500 and a standard
deviation of 100. If we randomly select 10
Ghanaian university students, what is the
probability that their mean GRE score is
greater than 550?

199
Solved Examples
• Suppose that the annual return on
the stock market is normally
distributed with a mean of 12% and a
standard deviation of 22%. What is
the probability that the average rate
of return over an entire decade will
be over 20% a year?

200
The Student t-Distribution
• In the previous exercises, we used the
population standard deviation .

• However, this is generally unknown!

• It is particularly when we work with small

samples.

• The distribution is thicker in the tails and shorter

in the middle.
201
202
Student-t Distribution

203
Properties of the Student's t-Distribution
• The graph for the Student's t-distribution is similar to the standard normal
curve.

• The mean for the Student's t-distribution is zero and the distribution is
symmetric about zero.

• The Student's t-distribution has more probability in its tails than the standard
normal distribution because the spread of the t-distribution is greater than the
spread of the standard normal. So the Student's t-distribution is thicker in the
tails and shorter in the center than the standard normal distribution.

• The exact shape of the Student's t-distribution depends on the degrees of

freedom. As the degrees of freedom increases, the Student's t-distribution
becomes more like the standard normal distribution.

204
205
The Normal Distribution
• Recall that the normal distribution was

• and we wrote it simply as

• which we later transformed to

206
The Student-t Distribution
• The Student-t is

• is written as
• and transformed as

207
Interval Estimates
• An interval estimate provides a range for a
population characteristic based on a sample.
• Intervals are quite useful in statistics because
they provide more information than a point
estimate.
• Intervals specify a range of plausible values for
the characteristic of interest and a way of
assessing “how plausible” they are.

208
• In general, a 100(1 – α)% probability interval
is any interval [A, B] such that the probability
of falling between A and B is 1 - α.
• Probability intervals are often centered on the
mean or median.

209
Confidence Intervals
• Confidence interval estimates provide a way of
assessing the accuracy of a point estimate.
• A confidence interval is a range of values
between which the value of the population
parameter is believed to be, along with a
probability that the interval correctly estimates
the true (unknown) population parameter.
• This probability is called the level of
confidence, denoted by 1 - , where is a
number between 0 and 1.
210
211
• The level of confidence is usually expressed as a
percent; common values are 90%, 95%, or 99%.
• Note that if the level of confidence is 90%, then =
0.1.
• The margin of error depends on the level of
confidence and the sample size.
• Many different types of confidence intervals may be
developed.
• The formulas used depend on the population
parameter we are trying to estimate and possibly
other characteristics or assumptions about the
population.
212
Example 1
• Suppose you do a study of acupuncture to
determine how effective it is in relieving pain.
You measure sensory rates for 15 subjects
with the results given:
• 8.6; 9.4; 7.9; 6.8; 8.3; 7.3; 9.2; 9.6; 8.7; 11.4;
10.3; 5.4; 8.1; 5.5; 6.9
• Use the sample data to construct a 95%
confidence interval for the mean sensory rate
for the population from which you took the
data.
213
Example II
• You do a study of hypnotherapy to determine
how effective it is in increasing the number of
hours of sleep subjects get each night. You
measure hours of sleep for 12 subjects with the
following results.
• 8.2; 9.1; 7.7; 8.6; 6.9; 11.2; 10.1; 9.9; 8.9; 9.2;
7.5; 10.5
• Construct a 95% confidence interval for the
mean number of hours slept for the population
(assumed normal) from which you took the data.

214
Example III
• A random sample of statistics students were asked to
estimate the total number of hours they spend watching
television in an average week. The responses are recorded
in Table below. Use this sample data to construct a 98%
confidence interval for the mean number of hours statistics
students will spend watching television in one week.

215
Confidence Interval: Mean with Known
Population Standard Deviation

• The simplest type of confidence interval is for

the mean of a population where the standard
deviation is assumed to be known.

• You should realize, however, that in nearly all

practical sampling applications, the population
standard deviation will not be known.
216
• In some applications, such as measurements
of parts from an automated machine, a
process might have a very stable variance that
has been established over a long history.
• In that case, it can reasonably be assumed
that the standard deviation is known.

217
• A 100(1 – )% confidence interval for the
population mean based on a sample of size n
with a sample mean and a known population
standard deviation is given by

• Note that this formula is simply the sample

mean (point estimate) plus or minus a margin
of error.
• The margin of error is a number multiplied by
the standard error of the sampling distribution
of the mean, .
218
• The value represents the value of a standard
normal random variable that has an upper tail
probability of or, equivalently, a cumulative
probability of 1 - .
• It may be found from the standard normal
table.

219
In short. . .
• Let be a random sample from a normal
population with a mean and variance . Then

• If the population variance is known, then a

100(1 - α)% confidence interval for the mean
μ is:

• The interval, because it depends on Z, is often

referred to as the Z-interval for a mean.

220
Example
• In a production process for filling bottles of
liquid detergent, historical data have shown
that the variance in the volume is constant;
however, clogs in the filling machine often
affect the average volume. The historical
standard deviation is 15 milliliters. In filling
800-milliliter bottles, a sample of 25 found an
average volume of 796 milliliters.
• Find the 95% confidence interval for the
population mean.
221
Example 2
• A random sample of 126 police officers subjected to
constant inhalation of automobile exhaust fumes in
Accra, Ghana had an average blood lead level
concentration of 29.2 μg/dl. Assume X, the blood lead
level of a randomly selected policeman, is normally
distributed with a standard deviation of σ = 7.5 μg/dl.
Historically, it is known that the average blood lead level
concentration of humans with no exposure to
automobile exhaust is 18.2 μg/dl. Is there convincing
evidence that policemen exposed to constant auto
exhaust have elevated blood lead level concentrations?
222
Solution
• Let's try to answer the question by calculating
a 95% confidence interval for the population
mean. For a 95% confidence interval, 1−α =
0.95, so that α = 0.05 and α/2 = 0.025.
Therefore, as the following diagram illustrates
the situation, z0.025 = 1.96:

223
• Now, substituting in what we know ( = 29.2, n
= 126, σ = 7.5, and z0.025 = 1.96) into the
formula for a Z-interval for a mean, we get:
• [29.2−1.96(7.5/126),29.2+1.96(7.5/126)]
• Simplifying, we get a 95% confidence interval
for the mean blood lead level concentration of
all policemen exposed to constant auto
exhaust: [27.89,30.51]

224
• That is, we can be 95% confident that the mean
blood lead level concentration of all policemen
exposed to constant auto exhaust is between
27.9 μg/dl and 30.5 μg/dl.
• Note that the interval does not contain the value
18.2, the average blood lead level concentration of
humans with no exposure to automobile exhaust.
• In fact, all of the values in the confidence interval
are much greater than 18.2. Therefore, there is
convincing evidence that policemen exposed to
constant auto exhaust have elevated blood lead
level concentrations.
225
Hypothesis Testing

226
Example 1
• Cowbell, a producer of powdered milk, claims
that, on average, its powdered sachets weigh
at least 16 grams, and thus do not weigh less
than 16 grams.
• The company can test this claim by collecting a
random sample of powdered sachets,
determining the weight of each one, and
computing the sample mean sachet weight
from the data.

227
Example 1
• Accra Brewery, is a company that brews crispy and
good tasting beer in Ghana since 1931. It claims
that on average, the volume of its fill is 625ml.
• It wishes to monitor its brewing process to ensure
that the volume of its fill meet this requirement for
regulation and reputational purposes.
• It could obtain random samples every 2 hours from
the production line and use them to determine if
standards are being maintained.

228
• These examples are a standard industrial
procedure.
• We state a hypothesis about some population
parameter and then collect sample data to
test the validity of our hypothesis.

229
Concepts of Hypothesis Testing
• Earlier on we developed statistical methods of
estimation, primarily in the form of confidence
intervals, for answering the question "what is
the value of a population parameter?"
• In this lecture, we'll seek an answer question
like "is the value of the parameter θ equal to
a given value?“

230
• For example, rather than attempting to
estimate μ, the mean body temperature of
adults, we might be interested in testing
whether μ, the mean body temperature of
adults, is really 37 degrees Celsius.
• We'll attempt to answer such questions using
a statistical method known as hypothesis
testing.

231
• We'll look at hypothesis tests for the following
population parameters, including:
– a population proportion p, the difference in two
population proportions, p1−p2
– a population mean μ
– the difference in two population means, μ1−μ2,
– a population variance σ2
– the ratio of two population variances,
– three (or more!) means, μ1, μ2, and μ3.
• regression coefficient β of a least squares regression
line through a set of (x,y) data points as well as the
corresponding population correlation coefficient ρ.
232
*Tests About One Mean
• There are basically three tests related to the mean of
the population.
1. Hypothesis test based on the normal distribution for
the mean μ for the completely unrealistic situation
that the population variance σ2 is known
2. Hypothesis test based on the t-distribution for the
mean μ for the (much more) realistic situation that
the population variance σ2 is unknown.
3. Hypothesis test based on the t-distribution for μD,
the mean difference in the responses of two
dependent populations
233
Hypothesis-Testing Procedure
Conducting a hypothesis test involves several steps:
1. Identifying the population parameter of interest and
formulating the hypotheses to test
2. Selecting a level of significance, which defines the risk of
drawing an incorrect conclusion when the assumed
hypothesis is actually true
3. Determining a decision rule on which to base a
conclusion
4. Collecting data and calculating a test statistic
5. Applying the decision rule to the test statistic and drawing
a conclusion.
234
Significance Level and P-Value
• Before any hypothesis testing, we define a
significance level.
• Significance level determines the level that we want
to believe in the null hypothesis.
• You look at significant level as a boundary between
‘rejecting’ or ‘failing to reject’ our null hypothesis.
Reject Failure to reject

0 significant level 1
235
• In the diagram, the ends of the line are marked
0 and 1 (why?).
• The red vertical line represents the dividing line
that marks the boundary between ‘Reject’ and
‘Fail to reject’.
• This significant level is going to be defined a
priori. The conventional values are 0.01 (1%),
0.05(5%) and 0.1(10%).
• The p-value is calculated for our NULL
hypothesis. We then compare the p-value with
our significant level and make a decision with to
reject or fail to reject the NULL.
236
Decisions
• Remember

• The p-value is ‘tied’ to the NULL.

• When p-value < significant level, we reject the in
favour of the and conclude there is evidence to
support ‘counter statement about population’.
• When p-value > significant level, we do not reject
and we conclude that there is insufficient evidence
to reject ‘statement about the population’.

237
• The following diagram will help make the points on the
previous diagrams clear.

238
Summary of Hypothesis Testing
• Every time we perform a hypothesis test, this is the
basic procedure that we will follow:
(1) We'll make an initial assumption about the
population parameter.
(2) We'll collect evidence or else use somebody else's
evidence (in either case, our evidence will come in
the form of data).
(3) We specify the level of significance.
(4) Based on the available evidence (data), we'll
decide whether to "reject" or "not reject" our initial
assumption.
239
One-Sample Hypothesis Tests
• We may conduct three types of one-sample
hypothesis tests:
• : population parameter constant vs. H1:
population parameter constant
• : population parameter constant vs. H1:
population parameter > constant
• : population parameter = constant vs. H1:
population parameter ≠ constant

240
Tests About Proportions
• We perform hypothesis test for a single proportion.
• Recall the hypothesis testing procedure:
(1) State the null hypothesis H0 and the alternative
hypothesis HA.
(2) Calculate the test statistic:

(3) Determine the critical region.

(4) Make a decision. Determine if the test statistic falls
in the critical region. If it does, reject the null
hypothesis. If it does not, do not reject the null
hypothesis.
241
Example
• Let p equal the proportion of drivers who use
seatbelts in a country that does not have a
mandatory seat belt law.
• It was claimed that p = 0.14. An advertising
campaign was conducted to increase this
proportion. Two months after the campaign, y
= 104 out of a random sample of n = 590
drivers were wearing seat belts. Will you
conclude at the 1% significant level that the
campaign successful?
242
Solution
• The observed sample proportion is:
• = = 0.176
• Because it is claimed that p = 0.14, the null
hypothesis is:
• H0: p = 0.14

243
• Because we're interested in seeing if the
advertising campaign was successful, that is,
that a greater proportion of people wear seat
belts, the alternative hypothesis is:
• HA: p > 0.14

• The test statistic is therefore:

244
• If we use a significance level of α = 0.01, then the critical
region is:

• That is, we reject the null hypothesis if the test statistic Z >
2.326. Because the test statistic falls in the critical region,
that is, because Z = 2.52 > 2.326, we can reject the null
hypothesis in favor of the alternative hypothesis. There is
sufficient evidence at the α = 0.01 level to conclude the
campaign was successful (p > 0.14).
245
Example
• Among patients with lung cancer, usually 90%
or more die within three years. As a result of
new forms of treatment, it is felt that this rate
has been reduced. In a recent study of n = 150
lung cancer patients, y = 128 died within three
years. Is there sufficient evidence at the α =
0.05 level, say, to conclude that the death rate
due to lung cancer has been reduced?

246
Solution
• The sample proportion is:
• = 128/150 = 0.853
• The null and alternative hypotheses are:
• H0: p = 0.90 and HA: p < 0.90
• The test statistic is, therefore:
• -1.92

247
• And, the rejection region is:

• Since the test statistic Z = −1.92 < −1.645, we

reject the null hypothesis. There is sufficient
evidence at the α = 0.05 level to conclude that
the rate has been reduced.
248
Comparing Two Proportions
• Thus far, we have looked examples involved
testing whether a single population proportion
p equals some value p0.
• Now, let's look at testing whether one
population proportion p1 equals a second
population proportion p2.
• Again, most of our examples so far have
involved one-tailed tests in which the
alternative hypothesis involved HA: p < p0 or HA:
p > p0.
249
• Here, let's consider an example that tests the
equality of two proportions against the
alternative that they are not equal. Using
statistical notation, we have:
• H0: p1 = p2 versus HA: p1 ≠ p2

250
Example

• GHS is concerned about the drain of smoker

related diseases on the NHIS. It commissions a
poll involving 800 Ghanaian adults. The
question posed for the survey was: "Should
taxes on cigarettes be raised to pay for health
insurance?" The results of the survey were:

251
Is there sufficient evidence at the α = 0.05 level, say, to conclude
that the two populations — smokers and non-smokers — differ
significantly with respect to their opinions?

252
Solution
• If p1 = the proportion of the non-smoker
population who reply "yes" and p2 = the
proportion of the smoker population who reply
"yes," then we are interested in testing the null
hypothesis:
• H0: p1 = p2
• against the alternative hypothesis:
• HA: p1 ≠ p2
• Before conducting the hypothesis test, we'll have
to derive the appropriate test statistic.
253
• The test statistic for testing the difference in
two population proportions, that is, for testing
the null hypothesis is
• where
• is the proportion of ‘successes’ in the two
samples combined.

254
So. . .
• The overall sample proportion is:

• This implies the test statistic for testing the

difference in two population proportions, that
is, for testing the null hypothesis is:

255
• Since this is a two tail test, we put the half the
probability in each tail

• That is, we reject the null hypothesis H0 if Z ≥ 1.96 or if Z

≤ −1.96. We clearly reject H0, since 8.99 falls in the
shaded region, ie, 8.99 is (much) greater than 1.96.
There is sufficient evidence at the 0.05 level to conclude
that the two populations differ with respect to their
opinions concerning the imposition of further taxes on
cigarettes.
256
Hypothesis About Means
• Testing the hypothesis about the population
mean follows procedures similar to that of the
proportion.
• Here the test is different.

257
When Population Variance is Known
• First, it is completely unrealistic to think that
we'd find ourselves in the situation of knowing
the population variance, but not the
population mean.
• Think about it. . . but we have to start from
some where. . .

258
Example 1
• Boys of a certain age are known to have a
mean weight of μ = 85kg. A complaint is made
that the boys living in a municipal children's
home are underfed. As one bit of evidence, n
= 25 boys (of the same age) are weighed and
found to have a mean weight of = 80.94 kg. It
is known that the population standard
deviation σ is 11.6 kg. With a significant level
of α = 0.05, what should be concluded
concerning the complaint?
259
Solution 1
• We formulate the and .
• The null hypothesis is H0: μ = 85, and the
alternative hypothesis is H1: μ < 85.
• In general, we know that if the weights are
normally distributed, then:

• follows the standard normal N(0,1)

distribution.
• The Z is called the test statistic.
260
• Let’s first see the critical Z value written as . In
this case it is .
• From our z-tables, .
• Let’s calculate the test statistic:

261
• The critical region approach tells us to reject the
null hypothesis at the α = 0.05 level if Z <
−1.645.
• Therefore, we reject the null hypothesis
because Z = −1.75 < −1.645, and therefore falls
in the rejection region:

262
When Population Variance is Unknown
• Let's look at the realistic situation in which
population variance is unknown.

263
Example 2
• It is assumed that the mean systolic blood
pressure of a population of vegetarians is μ =
120 mm Hg. In a sample study of 100 people,
it was found that the average systolic blood
pressure of 130.1 mmHg with a standard
deviation of 21.21 mmHg. Assuming a 95%
confidence level, is the group significantly
different from the regular population?

264
Solution 2
• The null hypothesis is H0: μ = 120, and because
there is no specific direction implied, the
alternative hypothesis is HA: μ ≠ 120.
• In general, we know that if the data are normally
distributed, then:

• We use the t-tables to compute the critical t

values keeping in mind that we have two tails.
• We reject the null hypothesis at the α = 0.05 level
if t t0.025,99 = 1.9842 or if t ≤ t0.025,99 = −1.9842.
265
• Therefore, we reject the null hypothesis
because t = 4.762 > 1.9842, and therefore falls
in the rejection region.

266
Python Code Hypothesis Testing
• from pandas_datareader import data as pdr

• import yfinance as yf

• yf.pdr_override()

• msft = pdr.DataReader('MSFT', start = '2019-01-01',

end = '2024-07-12’)
• msft.head()
• msft.tail()
• msft.sample(10)
267
• Let’s add a column ‘logReturn’ to the dataframe.
• msft['logReturn'] = np.log(msft.Close).diff()

• msft.head()

• There is a NaN field. Let’s drop it.

• msft.dropna(inplace = True)

• msft.head()
268
• Let’s visualize the logReturn.
• msft['logReturn'].plot(figsize = (10, 8))
• The figure size is measured in pixels.

• plt.axhline(0, color = 'red’)

• A red horizontal line passes through 0 ie. the mean
return of MSFT.

• plt.grid(axis = 'x’) # Grid on the x-axis only

• plt.ylabel('Returns')
269
We see that the mean return is 0

270
Step 1: Set Hypothesis
• We want to test whether indeed the
mean return is 0.

271
Step 2: Calculate test statistic
• sample_mean = msft['logReturn'].mean()
• In calculating the stdev for the sample, remember we
lose 1 degree of freedom.
• sample_std = msft['logReturn'].std(ddof = 1)

• n = msft['logReturn'].shape[0]

• tTest = (sample_mean - 0)/(sample_std/n**0.5)

• tTest
• Out[207]: 2.160519860261913
272
Step 3: Set decision criteria
• Remember we are using a sample from the
population.
• We will therefore prefer the Student-t tables.
• import scipy as scs

• # significance level of alpha = 0.05

• alpha = 0.05

273
• tLeft = scs.stats.t.ppf(alpha/2, n-1)

• tLeft
• Out[212]: -1.961674579682696

• tRight = scs.stats.t.ppf(1 - alpha/2, n-1)

• tRight
• Out[214]: 1.9616745796826955
274
Step 4: Make decision - reject H0?
• print('At significant level of {}, shall we reject
H0: {}'.format(alpha, tTest > tRight or tTest <
tLeft))
• At significant level of 0.05, shall we reject H0:
True

275
Exercise
• Test the hypothesis

• and

276
*Paired T-Test
• In the paired t-test, we compare the means of
two independent populations, but there may
be occasions in which we are interested in
comparing the means of
two dependent populations.
• For example, suppose a researcher is
interested in determining whether the mean
IQ of the population of first-born twins differs
from the mean IQ of the population of
second-born twins.
277
• The researcher identifies a random sample
of n pairs of twins, and measures X, the IQ of
the first-born twin, and Y, the IQ of the
second-born twin.
• In that case, she's interested in determining
whether:

• or equivalently if:
.

278
• Now, the population of first-born twins is not
independent of the population of second-born
twins.
• Since all of our distributional theory requires
the independence of measurements, we're
rather stuck.
• There's a way out though... we can "remove"
the dependence between X and Y by
subtracting the two measurements Xi and Yi
for each pair of twins i, that is, by considering
the independent measurements
279
• Then, our null hypothesis involves just a single mean,
which we'll denote μD, the mean of the differences:

• and then our hard work is done!

• We can just use the t-test for a mean for conducting
the hypothesis test... it's just that, in this situation,
our measurements are differences di whose mean is
and standard deviation is sD.
• That is, when testing the null hypothesis against any
of the alternative hypotheses , , and , we compare
the test statistic:

280
• to a t-distribution with n−1 degrees of
freedom.

281
Example
• Blood samples from n = 10 people were sent
to each of two laboratories (Lab 1 and Lab 2)
for cholesterol determinations.
• The resulting data are summarized here:

282
Is there a statistically significant difference at the α =
0.01 level, say, in the (population) mean cholesterol
levels reported by Lab 1 and Lab 2?

283
Solution
• . The null hypothesis is H0: μD = 0, and the
alternative hypothesis is HA: μD ≠ 0.
• The value of the test statistic is:

• The critical region approach tells us to reject the

null hypothesis at the α = 0.01 level if t > t0.005,9 =
3.25 or if t < t0.005,9 = −3.25. Therefore, we reject
the null hypothesis because t = −6.73 < −3.25,
and therefore falls in the rejection region.
284
Test of Independence for Categorical
Variables
• The primary method for displaying the
summarization of categorical variables is called a
contingency table.
• When we have two measurements on our subjects
that are both the categorical, the contingency table
is sometimes referred to as a two-way table.
• This terminology is derived because the
summarized table consists of rows and columns
(i.e., the data display goes two ways).

285
Kid stuff. . .
• The size of a contingency table is defined by the
number of rows times the number of columns
associated with the levels of the two categorical
variables.
• The size is notated r * c, where r is the number of
rows of the table and c is the number of columns.
• A cell displays the count for the intersection of a
row and column. Thus the size of a contingency
table also gives the number of cells for that table.
For example, if we have a 2*2 table, then we have 4
cells.
286
Example: Attitude to Campus Food
• A random sample of 500 students were
surveyed on their attitude toward food sold on
campus. The results of this survey are
summarized in the following contingency
table:
Bachelor Masters PhD Total

Like 138 83 64 285

Dislike 64 67 84 215

Total 202 150 148 500

287
• The size of this table is 2*3 and NOT 3*4. There
are only two rows of observed data for attitude to
campus food and three columns of observed data
for their level of education.
• We define the attitude as the explanatory variable
and level as the response because it is more
natural to analyze how one's attitude is shaped by
their level than the other way around.
• From here, we would want to determine if an
association (relationship) exists between attitude
and level. That is, are the two variables dependent
or independent?
288
Chi-Square Test of Independence
• This test is performed by using a Chi-square
test of independence of two categorical
variables.
• As with all prior statistical tests we need to
define null and alternative hypotheses.
• Also, as we have learned, the null hypothesis
is what is assumed to be true until we have
evidence to go against it.

289
Hypothesis
• Null Hypothesis: The two categorical variables
are independent.
• Alternative Hypothesis: The two categorical
variables are dependent.
• As usual we need a test statistic. It is called
Chi-Square Test Statistic.
• where O represents the observed frequency.

290
• E is the expected frequency under the null
hypothesis and computed by:

• We will compare the value of the test statistic

to the critical value of with degree of freedom
= (r - 1) (c - 1), and reject the null hypothesis if
.

291
Procedure
• Once we have gathered our data, we
summarize the data in the two-way
contingency table.
• This table represents the observed counts and
is called the Observed Counts Table or simply
the Observed Table.
• Then from the Observed Table, we compute
our Expected Table.

292
Observed vrs Expected Values

Observed

Success Failure Total

Group 1 A B A+B

Group 2 C D C+D

Total A+C B+D A+B+C+D

Expected

Success Failure Total

Group 1 (A+B)(A+C)/(A+B+C+D) (A+B)(B+D)/(A+B+C+D) A+B

Group 2 (C+D)(A+C)/(A+B+C+D) (C+D)(B+D)/(A+B+C+D) C+D

Total A+C B+D A+B+C+D

293
Exercise
• Researchers have battled the question: Is
gender independent of education level? To
answer this question, a random sample of 395
people were surveyed and each person was
asked to report the highest education level
they obtained. The data that resulted from the
survey is summarized in the following table:

294
• Are gender and education level dependent at
5% level of significance? In other words, given
the data collected above, is there a relationship
between the gender of an individual and the
level of education that they have obtained?

295
Solution
• We build the Expected Table
High School Bachelors Masters PhD Total

Female 50.8860759 49.8683544 50.37721519 49.86835443 201

Male 49.1139241 48.1316456 48.62278481 48.13164557 194

Total 100 98 99 98 395

296
• Now 7.815
• Since 8.006 > 7.815, therefore we reject the
null hypothesis and conclude that the
education level depends on gender at a 5%
level of significance.

297
Question continued
• Apply the Chi-square Test of Independence to
our example where we have a random sample
of 500 students who are questioned regarding
their attitude to campus food. Assume a level
significance of 5%. What conclusion will you
reach?

298
Exercise 1
• The operations manager of a company that
manufactures tires wants to determine whether there
are any differences in the quality of work among the
three daily shifts. She randomly selects 496 tires and
carefully inspects them. Each tire is either classified as
perfect, satisfactory, or defective, and the shift that
produced it is also recorded. The two categorical
variables of interest are shift and condition of the tire
produced. Does the data provide sufficient evidence at
the 5% significance level to infer that there are
differences in quality among the three shifts?
299
300
Exercise 2
• A food services manager for a campus joint
wants to know if there is a relationship
between gender (male or female) and the
preferred condiment on a hot dog. The
following table summarizes the results. Test
the hypothesis with a significance level of
10%.

301
302
Statistics for Business Decision
Making
Covariance and Correlation

303
Covariance and Correlation
• How far is it true that when one’s income goes up then
one’s consumption also go up?
• What about the wage one earns and the ‘amount’ of
education one has?
• And is it true that on a beach on a sunny day, the amount
of ice-cream sales can determine the number of people
drowning in water?
• Does sales in a shop anything to do with the amount spent
on advertising?
• How about the prices of a stock and the volume of stocks
traded?
• Again, is there a relationship between your height and
weight?
304
Background
• The above and many more of such
relationships between variables are quantified
in statistics.
• For some variables, there is a direct
relationships. In this case of others, there is an
inverse relationship.
• So we can express the direct relationships as
positive and the indirect relationships as
negative.

305
Diagram for Covariance

306
Definition
• We define covariance formally a is a measure
of the relationship between two random
variables.
• It is a statistical measure of how much – to
what extent – the variables change together.
• In other words, it is essentially a measure of
the variance between two variables.
• However, the metric does not assess the
dependency between variables.

307
• Positive covariance is an indication that the
two variables tend to move in the same
direction.
• Negative covariance implies that two
variables tend to move in opposite or inverse
directions.
• Can we have a situation where the variables
are unrelated? Yes, we can. How will the
scatter plot of variables X and Y which have no
relationship look like? Next slide. . .

308
• Here there is no obvious pattern. So the
relationship between the variable on the
vertical and horizontal axes is zero.

309
Statistically. . .
• We define the covariance between two
variables x and y as
• where N is the size of the data.
• However, we work with samples instead of
population. We define the sample covariance
as
• where and area the sample means of x and y
respectively.

310
Example
• The Table below is the income and
consumption expenditure of a household for
10 years. Is there a relationship between
income and consumption for this household?
Year Income Consumption
2000 8559.4 6830.4
2001 8883.3 7148.8
2002 9060.1 7439.2
2003 9378.1 7804
2004 9937.2 8285.1
2005 10485.9 8819
2006 11268.1 9322.7
2007 11894.1 9826.4
2008 12238.8 10129.9
2009 12030.3 10088.5

311
Solution
• We set up as follows:
• , , N = 10
• = 15683018.9

• Cov(x,y) =
• Since the covariance is not zero, we conclude that
there is a relationship between income and
consumption which is positive.
• There is a general tendency for consumption to
increase whenever income increases. See attached
Excel sheet.
312
• We do have negative covariance between
variables.
• Think about it!
• There is a negative relationship between
inflation rate and the strength of the currency
of a country eg. GHC/USD.
• Whenever inflation is high in a country, that
country’s currency depreciates against the
major currencies!
• Can you give examples of two variables with
negative covariance?
313
Some Properties of Covariance
• We state briefly the properties of covariance.
• Cov(X,Y) = Cov(Y,X)
• Cov(X,c) = 0 where c is a constant. A constant
does not vary (Remember?)
• Cov(X,Y+Z) = Cov(X,Y) + Cov(X,Z)
• Cov(X+Y,X+Y) = Cov(X,X) + Cov(Y,Y) + 2Cov(X,Y)
• Cov(X, X) = Var(X)

314
General Comments
• Some covariance can be spurious!
• Is there a relationship between show sizes of
students in UCC and the amount of rainfall in
Cape Coast?
• Hardly! But if we gather data on these two
variables, it may show a relationship.
• If we cannot find any reason for the
relationship, we say that it is spurious.
• Be on the look out for such spurious
relationships.
315
Correlation
• When we computed the covariance between
Income and Consumption, we got .
• Can we say there is a strong relationship
between Income and Consumption? How very
strong is the relationship?
• On the face of it, we cannot say because we
have no reference points.
• That is the work of correlation.
• Correlation expresses the direction and
strength of the relationship between variables.
316
Example of Correlation

317
• Correlation build on covariance by providing
reference values which we use to make
decisions about the strength of the
relationship.
• Correlation spans -1 to +1.
• If the correlation between X and Y is +1, the
relationship is said to be positively perfectly
correlated.
• If the correlation between X and Y is -1, the
relationship is said to be negatively perfectly
correlated.
318
• What about the situation where there is no linear
relationship between the variables?
• Just as we can across in covariance, the
correlation is zero.
• So in general

• How then do we calculate correlation?

• Recall that
• .
• To calculate correlation, we standardize the
covariance with the standard deviation of X and Y.
319
Mathematically. . .

• Sometimes this formulae is written succinctly

as
• ,
• So covariance can also be defined as

320
Examples

321
Scatter plot of Salaries against Educ

322
323
Example
• Continuing from our Income and
Consumption example, calculate the strength
of the relationship between the variable.
• We set up as follows:
• , , N = 10
• = 15683018.9
• Cov(x,y) =
• SD(x) = 1405.0445, SD(y) = 1244.493478
• Corr(x,y) =
324
• We conclude therefore that with a correlation
of 0.9966, there is a string positive correlation
between Income and Consumption. The
diagram show the strong positive relationship.
9500 10000
9000
Consumption

8500
8000
7500
7000

9000 10000 11000 12000

325
Some Observations on Correlation
• Correlation does not imply causation, they say.
• Just because there is correlation between two variables
does not mean one is causing the other!
• For example, there is a correlation between the number
of beach resort drownings each month and the number
of ice-creams sold in the same period.
• It will seem like ice-creams cause drownings. No.
• People eat more ice-creams on hot days when they are
also more likely to go swimming. So the two variables
(ice-cream sales and drownings) are correlated, but one is
not causing the other.
• They are both caused by a third variable (temperature).
326
Coming up
• Next week, we are going to look at the ideas
of linear regression which will look at the
relationships in terms of how one or more
variables ‘cause’ another variable.
• Till then, keep safe and sound.

327
Linear Regression I
• As business people, we rely on a lot of variables to
make decisions.
• Very often, we can see that one or more variable
explain another variable.
• For example, employers determine an employee’s
salary based on the employee’s education,
experience and in some countries gender.
• In effect we say that ‘salary is a function of
education, experience and gender’.
• This is written as salary = f(educ, expr, gender)

328
• But hold on. . . are these the only factors that
determine your salary? What about your
productivity? And ‘who you know’?
• Indeed there are other variables that
determine your salary but may be we don’t
want to consider them now or they are subject
ie difficult to get an objective measure.
• We have find a way of getting all the variables
that are excluded in our relationships.
• This is done by calling them ‘errors’. So we have
• salary = f(educ, expr, gender) + errors
329
• What other relationship can you think about?
• In microeconomics, you were told that one’s
consumptions depends on one’s income.
• So consumption = f(income).
• But we know that consumption is not just a
function of income. It will also depend on
taste, age, etc. Since we don’t want to
measure that for now, we include errors in the
above relation as
• consumption = f(income) + errors
• Think about other relationships.
330
Terminology
• We name the terms in the functions for ease
of communication.
• In the relation
• salary = f(educ, expr, gender) + error, educ,
expr and gender are called exogenous, input
or independent variables. Another name for
them is regressors.
• salary is called the output, dependent,
endogenous or regression.

331
• Identify the names in the relations
• consumption = f(income) + errors
• sales = f(radio, TV, newspaper) + error

332
Simple and Multiple Linear Regression
• Linear regression is basically divided into two
and named simple or multiple depending on
the number of regressors.
• If we have only one regressor, then we call it a
simple linear regression eg.
• Consumption = f(Income) + error.
• For multiple regressors, we name it . . . .
• Sales = f(radio, TV, newspaper) + errors

333
So formally. . .
• A simple linear regression is a mathematical
approach for predicting a quantitative
response Y on the basis of a single predictor
variable X.
• It is written as

• Compare with

• There is only one regress here ie Income.

334
• Similarly, we can extend the simple linear
regression model so that it can directly
accommodate multiple regressors.
• In general, suppose that we have p distinct
predictors.
• Then the multiple linear regression model is
written as:

• For example the relationship between Sales in

a shop and the amount spent advertising in
radio, TV and in newspapers
335
Models
• The equations for both the simple and multiple regressions as
written as called mathematical models.
• A model is a representation of reality. It is not the real thing
but a ‘look alike’.
• Take for example a map. A map is a representation of land
forms.
• In building a map, we have to think of what it is to be used
for.
• If it is a tourist map, then the user is only interested in tourist
spots.
• And so anything that is not a tourist attraction will be
excluded to keep the map simple.
• In regression too, we make assumptions about our models!
336
Assumptions Underlying Linear Regression

• The basic assumptions for the linear regression model

are the following:
• A linear relationship exists between the independent
variable (X) and dependent variable (y)
• Little or no multicollinearity between the different
features
• Residuals should be normally distributed (multi-
variate normality)
• Little or no autocorrelation among residues
• Homoscedasticity of the errors
337
Linear Regression II
• Last week, we had an introduction to linear
regression look at what it is and the
assumptions underlying the model.
• Briefly, we saw that for example the
expenditures on TV, radio and newspaper
adverts in a particular city or town is related
to the sales they make.
• We wrote that as

338
• Remind yourself what we said about the linear
regression.

339
Find the s
• The whole idea of regression is to find the s in
the model

• The trick is to find the line of ‘best fit’ that

minimises the “sum of squared errors”.
• Do your remember what we said about the
errors ?
• What do we minimise the sum of squared
errors and not just the sum of errors?
• Take a look at the figure on next slide.
340
Sale of Ice Cream vrs Temperature

341
• In the diagram, the ovals are the actual points
depicting sales given a particular temperature.
• We have the line of best fit approximating the
points.
• Roughly, there are the same number of points
above as below the line.
• The distance from a point to the line is the ‘error’.
• Taking points above the line as positive and points
below as negative, summing them will give as
zero.
• That is not what we want. That is why we rather
take the sum of squared errors.
342
Ordinary Least Squares
• The method we have been describing is called
the ordinary least square method of finding the
s.
• Before we use the method, let’s rewrite the
linear regression model more generally as
• . This will make things easy for us.
• From this, we have
• is the expression for the errors.
• The sum of squared errors is
• S = . Think about this.
343
• This is all that we saying.

344
Minimising Sum of Squared Errors
• Minimising in math involve the use of basic
differential calculus. . . . . . Do you remember?
• We take the differential of with respect to
and equate the result to zero and solve for .
• But we are not interested in the intermediate
math except to say that we will get
• , for i > 0
•

345
Example
• We want to model the relationship between the sales
as output and the expenditure on Radio, TV and
Newspaper as inputs.
• The equation is

• The data is on the next slide.

• Each row is a town or city say Cape Coast.
• Row 1 for example shows the sales resulting from the
expenditure on radio, TV and local newspapers in
Cape Coast.
• Row 2 is say Mankessim and so on.
346
Data

TV Radio Newspaper Sales

230.1 37.8 69.2 22.1

44.5 39.3 45.1 10.4

17.2 45.9 69.3 12

151.5 41.3 58.5 16.5

180.8 10.8 58.4 17.9

8.7 48.9 75 7.2

57.5 32.8 23.5 11.8

347
Output I
OLS Regression Results
==================================================
Dep. Variable: Sales R-squared: 0.903
Model: OLS Adj. R-squared: 0.901
Method: Least Squares F-statistic: 605.4
Date: Fri, 22 May 2020 Prob (F-statistic): 8.13e-99
Time: 10:35:15 Log-Likelihood: -383.34
No. Observations: 200 AIC: 774.7
Df Residuals: 196 BIC: 787.9
Df Model: 3

348
Output II
coef std err t P>|t| [0.025 0.975]

--------------------------------------------------------------
const 4.6251 0.308 15.041 0.000 4.019 5.232
Radio 0.1070 0.008 12.604 0.000 0.090 0.124
TV 0.0544 0.001 39.592 0.000 0.052 0.057
Newspaper 0.0003 0.006 0.058 0.954 -0.011 0.012

349
Checking Model Assumptions
• When you build a model, the first thing is to
check the model assumptions.
• Do you remember the model assumptions?
• Let’s look at Output I.
• Look at the F-statistic. The value is 605.4
• By itself, we cannot say anything. We look at its
probability which is Prob (F-statistic): 8.13e-
99.
• This value is used in hypothesis testing. Do you
remember hypothesis testing?
350
Hypothesis For the Model
• For the model as a whole, we can state our hypothesis
as:
• vrs

• What is the meaning of this hypothesis?

• Well, the null say that Sales does not depend on adverts
Radio, TV or Newspapers. In other words, Sales is just a
random thing.
• The alternative says Sales depend on at least one of the
adverts on Radio, TV or Newspaper.
• What conclusion do we reach? We have to use out p-
value.
351
p-value
• The p-value is a probability of us failing to reject
the .
• So if it is very small, we reject the in favor of ,
otherwise we don’t have sufficient evidence to
reject .
• But how small should be p-value be?
• We have to define a boundary between reject
and non-reject.
• That boundary is called the significant level, .
• The is generally taken to be 0.05.
352
• So for our problem, since the p-value of F is
less than 0.05, we reject the in favor of the
and conclude that at least one of the is not
zero. In other words, at least one of the s is
significant.
• So our model is correctly specified.
• We test the other assumptions.

353
Distribution of the Errors

354
Homoscedasticity of Errors

355
Comment
• From the histogram, we can see that but for a
few values to the left, the errors would have
been normally distributed.
• A maxim in Statistics says ‘All models are wrong
but some are useful’. So we accept that the
assumption of normality with mean zero holds.
• The same thing applies to the homoscedasticity
of errors. Since the graph is not showing any
pattern, we say that the errors have to the
variance.
356
Autocorrelation of Errors
• By autocorrelation, we mean how the errors are correlated with
its own past values ie Corr().
• We require that value to be zero.
• There is a test known as Durbin-Watson test what we use to
make this decision.
• It says that D = 2(1 - r), where r is the correlation coefficient.
Recall that correlation coefficient span -1 through 0 to +1.
• For zero correlation, r = 0. When you put r = 0 into that
equation, we get D = 2.
• If our model should have zero auto correlation, D should be
around 2.
• Our model’s Durbin-Watson = 2.251. We will accept that
autocorrelation of zero.
357
Interpretation of Model
• Bring back Output II
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------------
const 4.6251 0.308 15.041 0.000 4.019 5.232
Radio 0.1070 0.008 12.604 0.000 0.090 0.124
TV 0.0544 0.001 39.592 0.000 0.052 0.057
Newspaper 0.0003 0.006 0.058 0.954 -0.011 0.012

• The coefficients are , , and for the constant,

Radio, TV and Newspaper, respectively.
• But this are point estimates. The uncertainty in
this point estimates is given by the standard
errors.
358
• For example the standard error in estimating
the coefficient for TV ie is 0.001.
• Now if we divide 0.0544 by 0.001, we get the
value 39.592 which is the t-statistics.
• We use an empirical rule that if the absolute
value of the t-statistics is greater than 2, then
the coefficient is significant.
• So looking at all the coefficients, Newspaper
has a t-value of 0.058 which is less than 2;
hence Newspaper is not significant.

359
• We can also use the p-values written as P >|t|
or the 95% confidence values to make decisions
about the significant of the coefficients.
• At the 5% significant level, all the coefficients,
except Newspaper are significant [Recall what
we said about the p-values].
• On the confidence levels, look at the values for
Newspaper ie. [-0.011, 0.012].
• In moving from the lower limit -0.011 to 0.012,
we will cross 0. That is what makes the
regression coefficient insignificant.
360
Conclusion
• Linear regression models are a must have in a
manager’s toolkit.
• We have taken a look at a multiple linear
regression and indeed we regressed Sales on
Radio, TV and Newspaper expenditure and we
saw that Newspaper is not significant.
• In the next lesson, we are going to look at how
good our model is ie. the coefficient of
determination and the related ANOVA issues.
361
Analysis of Variance
• Let’s suppose we have this question: Is there a
relationship between Sales and the
expenditure on TV adverts?

362
25
20
Sales

15
10
5

0 50 100 150 200 250 300

TV
363
• We can investigate by writing the relationship
as a regression:

• and then we test the hypothesis

• against

• If indeed

364
• Call:
• lm(formula = Sales ~ TV, data = ad)
• Coefficients:
• Estimate Std. Error t value Pr(>|t|)
• (Intercept) 6.974821 0.322553 21.62 <2e-16 ***
• TV 0.055465 0.001896 29.26 <2e-16 ***

• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Residual standard error: 2.296 on 198 degrees of freedom
• Multiple R-squared: 0.8122, Adjusted R-squared:
0.8112
• F-statistic: 856.2 on 1 and 198 DF, p-value: < 2.2e-16

365
• Indeed both the t-statistics and the p-value
shows at the 95% confidence interval (or
significant level) that we reject the null in
favour of the alternative hypothesis.
• So we conclude that our finding that there is a
relationship between Sales and TV adverts is
significant.
• From the relationship, we see that for a unit
increase in TV adverts, Sales increase by
0.055465 units.

366
Analysis of Variance (ANOVA)
• There is an alternative method for answering
the same question, which uses the analysis of
variance based on F-test.
• Let's first define the term "analysis of
variance“.
• Analysis of Variance (ANOVA) consists of
calculations that provide information about
levels of variability within a regression model.
• It forms a basis for tests of significance.

367
• The basic regression line viewed another way
can be written as:
• DATA = FIT + RESIDUAL
• Let’s put this in a diagram

368
369
• Based on the diagram, we see that
• SST = SSE + SSR that is

• where T, E and R stand for Total, Error and

Regression respectively.
• The square of the sample correlation is equal to
the ratio of the model sum of squares to the
total sum of squares: R² = SSR/SST.
• This formalizes the interpretation of R² as
explaining the fraction of variability in the data
explained by the regression model.
370
• So the breakdown of the total variation in
(‘total sum of squares’) into two components:
• a component that is ‘due to’ the change x
(‘regression sum of squares’)
• a component that is just due to the random
error (‘error sum of squares’).
• Recognize that the SST ie is the numerator in
computing the sample standard deviation.
• To compute the sample standard deviation,
this value is divided by , the degrees of
freedom.
371
• To compute the standard deviation of the
errors ie the SSE, we have to divide by (why?).
• So now, we are left with the number of
degrees of freedom for the SSR. What will it
be?
• In ANOVA, we gather all these facts into a nice
table. . .

372
ANOVA Table

373
• In fact,
• ie the ‘mean squared error’.
• ie the ‘regression mean square’.
• So the total variable ie is shared between the
regression model and the errors.
• Finally, the ratio of MSR to MSE is called the F-
statistic.
• F-statistic is used to assess whether the model
is correctly specified.

374
Example
• Let’s run Sales on TV to see
• options(digits = 8)
• anova.model <- aov(Sales ~ TV, data = ad)
• summary(anova.model)
• > summary(anova.model)
• Df Sum Sq Mean Sq F value Pr(>F)
• TV 1 4512.43517 4512.43517 856.17671 < 2.22e-16 ***
Residuals 198 1043.54878 5.27045
• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1
375

Managerial Statistics-Notes On All Chapter
100% (2)
Managerial Statistics-Notes On All Chapter
73 pages
Chapter 1 Correct
No ratings yet
Chapter 1 Correct
31 pages
Quantitative Techniques For Management
No ratings yet
Quantitative Techniques For Management
18 pages
Business Statitics New
No ratings yet
Business Statitics New
72 pages
What Is So Appealing About Being Spanked Flogged Dominated or Restrained Answers From Practitioners of Sexual Masochism Submission
100% (1)
What Is So Appealing About Being Spanked Flogged Dominated or Restrained Answers From Practitioners of Sexual Masochism Submission
16 pages
Lind 2024 Release Chap001 PPT Accessible
No ratings yet
Lind 2024 Release Chap001 PPT Accessible
32 pages
Introduction To Statistics (Stat 2181)
No ratings yet
Introduction To Statistics (Stat 2181)
169 pages
Quantitative Methods
100% (2)
Quantitative Methods
103 pages
Probability and Statistics Lecture 1&2
No ratings yet
Probability and Statistics Lecture 1&2
29 pages
STAT015 Summer 2011 Lectures Lecture 1 To 4
No ratings yet
STAT015 Summer 2011 Lectures Lecture 1 To 4
120 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Lecture 1 Statistics and Lecture2
No ratings yet
Lecture 1 Statistics and Lecture2
44 pages
1-Introduction To Statistics PDF
100% (1)
1-Introduction To Statistics PDF
37 pages
CH01 - Introduction To Statistics 2
No ratings yet
CH01 - Introduction To Statistics 2
52 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Assignment Answers Sample
No ratings yet
Assignment Answers Sample
24 pages
Introduction To Statistics - c1
No ratings yet
Introduction To Statistics - c1
19 pages
Intro To Course and Basic Statistics
No ratings yet
Intro To Course and Basic Statistics
31 pages
2.introduction To Statistics
No ratings yet
2.introduction To Statistics
51 pages
Introduction
No ratings yet
Introduction
31 pages
Quantitative Methods
No ratings yet
Quantitative Methods
33 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
73 pages
Report Stat
No ratings yet
Report Stat
21 pages
BBA - Sem I - Unit 1
No ratings yet
BBA - Sem I - Unit 1
40 pages
QT Summary Document 1
No ratings yet
QT Summary Document 1
45 pages
Statistik 1
No ratings yet
Statistik 1
17 pages
Introduction Data
No ratings yet
Introduction Data
32 pages
Content Document 20240914041559PM
No ratings yet
Content Document 20240914041559PM
25 pages
Chapter 1 The Nature of Probability and Statistics Updated Spring 2023-2024
No ratings yet
Chapter 1 The Nature of Probability and Statistics Updated Spring 2023-2024
38 pages
Statistical Analysis (Lecture 1)
No ratings yet
Statistical Analysis (Lecture 1)
40 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Lecture 2-Introduction To Satistics
No ratings yet
Lecture 2-Introduction To Satistics
43 pages
EECM3724 Unit 1 Ch1 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch1 Slides 2022
24 pages
Statistics Chapter-1
No ratings yet
Statistics Chapter-1
23 pages
Lecture-1-Inroduction To Statistics and Data
No ratings yet
Lecture-1-Inroduction To Statistics and Data
49 pages
Slides Week2 DataCollection
No ratings yet
Slides Week2 DataCollection
26 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Ghon Stat Chapter1
No ratings yet
Ghon Stat Chapter1
39 pages
New Chapter I
No ratings yet
New Chapter I
51 pages
Ch.1 All & Ch.2 Introduction
No ratings yet
Ch.1 All & Ch.2 Introduction
44 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Final SB: Chapter1: Overview of Statistics
No ratings yet
Final SB: Chapter1: Overview of Statistics
32 pages
Introduction Bus Statistics
No ratings yet
Introduction Bus Statistics
32 pages
SM Session 1 IPL 2024 Post Session Slides
No ratings yet
SM Session 1 IPL 2024 Post Session Slides
44 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
MGS2150 Lecture1
No ratings yet
MGS2150 Lecture1
46 pages
Week 05 - Introduction To Statistics
No ratings yet
Week 05 - Introduction To Statistics
42 pages
File tổng hợp kiến thức SB
No ratings yet
File tổng hợp kiến thức SB
148 pages
2 - Statitical Methods
No ratings yet
2 - Statitical Methods
34 pages
DSC 7001: Statistical & Quantitative Methods: Kim Menezes Email: Kim - Menezes@gdgoenka - Ac.in
No ratings yet
DSC 7001: Statistical & Quantitative Methods: Kim Menezes Email: Kim - Menezes@gdgoenka - Ac.in
21 pages
What Is Statistics?
No ratings yet
What Is Statistics?
16 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
Week 1 - Data & Statistics
No ratings yet
Week 1 - Data & Statistics
75 pages
What Is Statistics ? and Describing Data: Frequency Distributio N
No ratings yet
What Is Statistics ? and Describing Data: Frequency Distributio N
17 pages
Lesson I PDF
No ratings yet
Lesson I PDF
4 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
45 pages
Midterm Exam Reviewer
No ratings yet
Midterm Exam Reviewer
9 pages
Statistics Dont Delete
No ratings yet
Statistics Dont Delete
42 pages
Perceptual Mapping: Mdpref: MBA 651 - Measurement and Analysis
0% (1)
Perceptual Mapping: Mdpref: MBA 651 - Measurement and Analysis
32 pages
Action Verbs & Bullet Points PDF
100% (1)
Action Verbs & Bullet Points PDF
2 pages
2 36 CPT Based Compressability
No ratings yet
2 36 CPT Based Compressability
8 pages
QUANTITATIVE METHODS - Common Probability Distribution Test Questions
No ratings yet
QUANTITATIVE METHODS - Common Probability Distribution Test Questions
28 pages
Literature Review On Teacher Retention and Attrition
No ratings yet
Literature Review On Teacher Retention and Attrition
26 pages
Different HRR Values
No ratings yet
Different HRR Values
20 pages
Introduction To Network Analysis
No ratings yet
Introduction To Network Analysis
21 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
8 pages
13 Customer Frustration
No ratings yet
13 Customer Frustration
24 pages
A Comparative Analysis of The Flow Pattern Using Flue Gases and Water Inside Economizer
No ratings yet
A Comparative Analysis of The Flow Pattern Using Flue Gases and Water Inside Economizer
6 pages
EJ1097381
No ratings yet
EJ1097381
20 pages
Eap Test
100% (1)
Eap Test
2 pages
MEL761: Statistics For Decision Making: About The Course
No ratings yet
MEL761: Statistics For Decision Making: About The Course
65 pages
Stats 10
No ratings yet
Stats 10
38 pages
Tourism Merchandise As A Means
No ratings yet
Tourism Merchandise As A Means
16 pages
Syllabus (Semester Pattern) Session 2021-22: Shaheed Mahendra Karma Vishwavidyalaya, Bastar Jagdalpur, Chhattisgarh
No ratings yet
Syllabus (Semester Pattern) Session 2021-22: Shaheed Mahendra Karma Vishwavidyalaya, Bastar Jagdalpur, Chhattisgarh
37 pages
Sotheby's Institute of Art and Claremont Graduate University Announce A New Master's Degree Program in Art Business
No ratings yet
Sotheby's Institute of Art and Claremont Graduate University Announce A New Master's Degree Program in Art Business
3 pages
Universiti Utara Malaysia Additional Assessment: Confidential 1 CS/JAN 2022/SSQL1113
No ratings yet
Universiti Utara Malaysia Additional Assessment: Confidential 1 CS/JAN 2022/SSQL1113
3 pages
Mimi-Choon Quiñones, PHD, MBA, Joins Hispanic Health Council As The Co-Chief Research Officer
No ratings yet
Mimi-Choon Quiñones, PHD, MBA, Joins Hispanic Health Council As The Co-Chief Research Officer
2 pages
Reviewed by Edwin Kanatie With Index 0080
No ratings yet
Reviewed by Edwin Kanatie With Index 0080
13 pages
Stats 9
No ratings yet
Stats 9
46 pages
Unit VI - Causes of Illness 26thjune2024
No ratings yet
Unit VI - Causes of Illness 26thjune2024
28 pages
Revised - Unit IX.1 Barriers To Cultural Sensitivity in Nursing01052025
No ratings yet
Revised - Unit IX.1 Barriers To Cultural Sensitivity in Nursing01052025
28 pages
BMD 215 Cerebrospinal Fluid
No ratings yet
BMD 215 Cerebrospinal Fluid
20 pages
SOA Essential Playbook
No ratings yet
SOA Essential Playbook
7 pages
Mohamed Et Al. - 2021 - Basic and Complex Cognitive Functions in Adult ADHD
No ratings yet
Mohamed Et Al. - 2021 - Basic and Complex Cognitive Functions in Adult ADHD
18 pages
A Systematic Approach To Searching
No ratings yet
A Systematic Approach To Searching
11 pages
Customer Perception Towards Home Loans in HDFC Bank in Mayiladuthurai Town-with-cover-page-V2
No ratings yet
Customer Perception Towards Home Loans in HDFC Bank in Mayiladuthurai Town-with-cover-page-V2
8 pages
Toliet Paper Strength-Student
No ratings yet
Toliet Paper Strength-Student
3 pages
City Premier College
No ratings yet
City Premier College
10 pages
MKT 4131 Assessment 2 - AY21-22
No ratings yet
MKT 4131 Assessment 2 - AY21-22
3 pages
Compher Et Al., (2006) Best Practice Methods To Apply To Measurement of Resting Metabolic Rate in Adults
No ratings yet
Compher Et Al., (2006) Best Practice Methods To Apply To Measurement of Resting Metabolic Rate in Adults
23 pages
AFB Saurabh Last Year
No ratings yet
AFB Saurabh Last Year
11 pages
5as Physical Activity
No ratings yet
5as Physical Activity
7 pages
Health Psychology - Practise Questions
No ratings yet
Health Psychology - Practise Questions
2 pages