0% found this document useful (0 votes)

126 views

Business Statistics: Prof. Lancelot JAMES

This document provides an outline for a business statistics course taught by Professor Lancelot James at Hong Kong University of Science and Technology. It discusses prerequisites, grading, textbooks, and an introduction to descriptive statistics including populations and samples, types of variables, and methods for presenting data in tables and charts. Key topics covered are descriptive statistics, inferential statistics, and how statistics can be used in business contexts such as decision making.

Uploaded by

satyasainadh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views

Business Statistics: Prof. Lancelot JAMES

Uploaded by

satyasainadh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Business Statistics

Prof. Lancelot JAMES

Hong Kong University of Science and Technology,
Information and Systems ManagemenT

ISMT-551 _ Fall 2006

Outline of the Course

Prerequisites-Good STAMINA
Class Participation is Encouraged
Grading: Homeworks/Projects and Final Exam
Textbook: Bowerman, O Connell, Orris (2004) Essentials
of Business Statistics. Mc Graw Hill.
Use the online tutorials
(http://highered.mcgraw-hill.com/sites/
0072827823/student_view0/electronic_
tutorials.html)

Salutations

Prof/Dr. James

Introduction

Descriptive Statistics

What is statistics?

The formal definition is simply the study or analysis of data.

Statistics is a tool for studying a characteristic or a behavior in
the real world based on a sample from the entire population
1

DATA IS EVERYWHERE

Introduction

Descriptive Statistics

How might statistics be used in a business

context?

To know how to present and describe information

To know how to draw conclusions about large populations

based only on information obtained in samples

To know how to improve processes

To know how to obtain forecasts

Making DECISIONS

Introduction

Descriptive Statistics

Populations and samples

ParameterA summary measure that describes a

characteristic of an entire population. For example the
average height of people in the US.

Sample A portion of the population that is selected for

analysis.

A Statistic: A summary measure computed from sample

data that is used to describe or estimate a characteristic of
the entire population.

Introduction

Descriptive Statistics

Populations and samples

Key Definitions

a Population is a set of existing units (people, objects,

events,...)
a Variable is any characteristic of a Population
All the population measurements may be collected in a
Census.
a Sample is a subset of the units in the population.
A sample of measurements that can be
1

Described Descriptive Statistics

Used to make generalizations about important aspects of

the Population Statistical Inference

Introduction
Populations and samples

Population vs. Sample

Descriptive Statistics

Introduction
Populations and samples

Types of Data

Descriptive Statistics

Introduction

Descriptive Statistics

Populations and samples

1. Population:
Some Examples of a Population

All items or subjects under consideration.

The stars in the galaxy

red cars, cars produced by Toyota in a given year

People in ISMT551, HKUST, Hong Kong, Asia, World

Potential voters in an election

The Moon

Fish in the Ocean, Abalone population

Players in the World cup

Introduction

Descriptive Statistics

Populations and samples

Question: Why do we need to perform statistical

analysis on samples, why not just look at the
whole population?

Some reasons Many Populations are too big. Stars in the

galaxy, Populations of people, fish in the Ocean.

That is, in many cases to gather information from an entire

population is practically impossible.

Other populations are costly to sample in terms of time or

money, or it simply may be foolish to test everything.

Introduction

Descriptive Statistics

Populations and samples

Example: Destructive Sampling Car manufacturers often

need to test the safety of their cars. The methods to do so
rely on destruction of a small set of vehicles. The
manufacturer then uses the information gathered from
these tests to decide whether the rest of cars
manufactured are suitable for use.

Example: Rare Samples Some objects are rarely found or

difficult to obtain. Moon Rock, Giant Squid.

Introduction

Descriptive Statistics

Populations and samples

Describing sets of Data

Terms: Variable, Experimental Units, Datasets

Variable stores one particular kind of information contained

in an item or a subject (either from a sample or of a
population). A characteristic which changes over
individuals or time

Examples Company type, Company size, Company Sales,

Hot dog brand name, product type.

Experimental units Individual or object on which the

variable is(are) measured. If a sample of size n is taken
then there are n experimental units.

Introduction

Descriptive Statistics

Populations and samples

Suppose that the variable of interest is the number of goals

scored by a professional soccer player. Then for instance the
soccer player David Beckham would be considered a possible
experimental unit where the data collected would consist of the
number of goals he scores. In mathematical shorthand we
might write:
Let X denote the variable corresponding to goals scored by an
individual. Then XBeckham denotes the number of goals scored
by Beckham. Note that if Beckham has not played the season
yet, the variable XBeckham is random, as it may take values from 0
to say a number less than 100.

Introduction

Descriptive Statistics

Populations and samples

Includes information based on one or more variables. For

instance Let X be the number of goals scored, Let Y be the
height of a player, let Z be the weight of a player. Then
each experimental unit , say i, is associated with the vector
(Xi , Yi , Zi ) =(Goals scored for i, height of i, weight of i).
This can be put into a table format:

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Qualitative(Categorical)

Categories Examples: {Red, Blue, Green},{Male,

Female},{Yes, No}.

Ordered or ranked data Examples {High, Medium, Low},

{Very good, Good, Bad}.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Quantitative(Numerical)

values are numerical and fall essentially into two types

Discrete (finite or countable values) {0,1,2,3} or {1, 2, 3 . . .}.

Examples Number of goals scored, number of classes,
number of phone calls in one hour, number of times before
a head is tossed on a fair coin, number of customers in a
restaurant.

Continuous All values on an interval or on the real

line(Uncountable quantities). An example is Time

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Questions
1. For each of the following random variables, determine
whether the variable is categorical or numerical(quantitative).
a) Number of telephones per household
b) Type of telephone primarily used
c) Number of long-distance calls made per month
d) Length (in minutes) of longest long-distance call made per
month
e) Color of telephone primarily used
f) monthly charge (in dollars and cents) for long distance
calls made
g) ownership of a cellular phone

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

2. Which of the following is NOT a reason for sampling?

a) It is usually too costly to study the whole population.
b) It is usually too time consuming to look at the whole
population
c) It is sometimes destructive to observe the entire population
d) It is always more informative by investigating a sample
than the entire population.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

3. To monitor campus security, the campus police office is

taking a survey of the number of students in a parking lot each
30 minutes of a 24-hour period with the goal of determining
when patrols of the lot would serve the most students. If X is
the number of students in the lot each period of time, then X is
an example of
a) a categorical random variable
b) a discrete random variable
c) a continuous random variable
d) a statistic

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

4. The Chancellor of a major university was concerned about

alcohol abuse on campus and wanted to find out the
portion(percentage) of students at her university who visited
campus bars every weekend. Her advisor took a random
sample of 250 students and computed the portion of students
in the the sample who visited campus bars every weekend.
Consider the following possibilities and answer the questions
below:
(i) The total number of students in the sample who visited
campus bars every weekend is an example of (which of the
following)
(ii) The portion of students at her university who visited campus
bars every weekend is an example of (which of the following)

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

(iii) The portion of students in the sample who visited campus

bars every weekend is an example of (which of the following)
a) a categorical random variable
b) a discrete random variable
c) a parameter
d) a statistic

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Presenting Data in Tables and Charts

Tables and Graphs for Numerical Data

This section discusses how to take possibly large amounts of

information and present them in such a way that they can be
easily interpreted by visual means. Topics to be discussed
1

Basic methods to organize data- Ordered Array, Stem and

Leaf Plots

How to use and construct-Tables: Frequency distributions,

Cumulative distributions, Graphs: Histogram, Polygon,
Ogive

Bivariate Numerical Data- Scatter Diagram

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Tables and Graphs for Categorical Data

Summary Table, Bar Chart, Pie Chart, Pareto Diagram

Bivariate Categorical Data- Contingency Table, Side by

side chart

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Ordered Array

An ordered array is simply created by taking a set of data and

displaying the items in ranked fashion from lowest to Highest.
Example: Data from 3 year percentage of high risk funds n=47
-22.82 -12.57 -10.55 -5.32 -2.89
-.33
-.14
4.00
...
...
...
...
...
... ...
49.02 49.67 54.43 58.71 63.79 68.58 86.13

Introduction
Types of Variables: Qualitative, Quantitative

Stem and Leaf Displays

Descriptive Statistics

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Stem and leaf plots separates data into

Stems-leading digits

Leaves- trailing digits

Note that often numbers are rounded off and there can be
many different stem and leaf plots for the same data. Stem and
leaf plots display how values are clustered or grouped together

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Example: Consider the data relating to the previous ordered

array. The numbers range from -22% to 86% (after roundoff)
Stems- -2,-1,-0,0,1,2,3,45,6,7,8 form categories.
Consider the first 4 values -22.82 -12.57 -10.55 -5.32 Note
write -5.32=-05.32 This would be displayed as
-2
-1
-0

2
20
5

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

For the full data sets one has

-2
-1
-0
0
1
2
3
4
5
6
7
8

2
20
5320
011146688
3357
23346889999
056789
235799
48
38
6

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Stem-and-leaf display
Building a Stem-and-leaf display

Data in raw form: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Order the Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Choose Stem unit and Leave Unit: 10s digit for Stem 1s digit
for Leaf
For each measurement: list the leaves of each stem.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Question: Interpreting a Stem and leaf plot

A survey was conducted to determine how people rated the
quality of programming available on television. Respondents
were asked to rate the overall quality from 0(no quality at all) to
100(extremely good quality). The stem and leaf display is
shown below
3 24
4 03478999
5 0112345
6 12566
7 01
8
9 2

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Q1 What percentage of the respondents rated overall

television quality with a rating of 80 or above?
a)0.00 b) 0.04 c) 0.96 d)1.00

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Q2 What percentage of the respondents rated overall

television quality with a rating of 50 or below?
a)0.11 b) 0.40 c) 0.44 d)0.56

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Tables and Graphs for Numerical Data

Frequency Distribution

A frequency distribution is a summary table in which the data

are arranged into conveniently established, numerically ordered
class groupings or categories. Classes consists usually of
intervals of values

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

How to determine the number of classes and class

intervals
(1) Determine number of classes based on total # of
observations n: one simple rule is
n < 25
25 n 400
n > 400

5 categories

n categories
20 categories

(2) Determine class width by

Class width =

range
# of classes

Note: be sure that class boundaries are well differentiated.

That is, the class intervals should not overlap. The class
midpoint is the point halfway between the boundaries of each
class and is representative of the data within that class.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Frequency or class Frequency is the number of

observations in each class. Some notation, let fi denote
the frequency of class {i} for i = 1, . . . , k classes. That is,
fi = # of observations in class i

Relative Frequency/percentage distribution A relative

frequency with respect to a particular class is defined as
relative frequency of class i :=

fi
n

A percentage distribution is formed by multiplying each

relative frequency by 100%.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Cumulative percentage Distribution Describes the

percentage of values which are in or below a class interval.
Thus the cumulative percentage value associated with
class 3 is calculated as
f1 + f2 + f3
100%
n

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Example: Data for Utility Charges

96, 171, 202, 178, 147, 102, 153, 197, 127, 82, ....158

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

1. Form a frequency distribution that has 5, 6, 7 class intervals.

First need to determine minimum and maximum values
minimum is 82 and maximum is 213, hence the width of interval
formula is used to obtain class interval size as follows
Determine class width by
Class width =

213 82
= 26.2 := 30
5

213 82
= 21.83 := 25
6
213 82
Class width =
= 18.71 := 20
7
Class width =

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

EC
80<100
100<120
120<140
A chart for 7 intervals
140<160
160<180
180<200
200<220

Midpoint
90
110
130
150
170
190
210

Freq
4
7
9
13
9
5
3

Per
8%
14%
18%
26%
18%
10%
6%

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Histograms/Polygon graphs

Histogram Is a chart in which the rectangular bars are

constructed at the boundaries of each class. When plotting
a histogram variable of interest is displayed along the (X)
horizontal axis (differentiated by the class boundaries. The
vertical axis(Y) represents the number, proportion or
percentage of observations per class interval.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Polygon A percentage Polygon is a line chart, which is

useful for comparing two or more groups. The advantage
over a histogram is that it is easier to see visually the
difference between two groups. The percentage polygon is
constructed by connecting lines between the midpoints of
each interval at their respective class percentages.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Cumulative percentage Polygons or Ogives A cumulative

percentage polygon, otherwise known as an Ogive, is a
line graph of the cumulative percentage distribution.
Similar to the Polygon line chart, it is constructed by
connecting lines between the midpoints of each interval at
their respective cumulative percentages.
For example in the case of 7 classes one would construct a
plot based on the pairs (70, 0), (90, 8), (110, 22), (130,40),
(150,66), (170, 84), (190, 94), (210, 100). Where the first
value represents the class midpoints and the second value
is the cumulative percentage up to an including that class.

Introduction

Descriptive Statistics

Types of Variables: Qualitative, Quantitative

Scatter Diagram

The scatter diagram is a graphical method used to compare the

possible relationships between two variables of interest.
Variable 1 is plotted on the X-axis and variable two on the
Y-axis. If there are n associated pairs of data then these can be
represented as (X1 , Y1 ), . . . , (Xn , Yn ). A question to ask is if the
graphs seem to visually imply a solid relationship between two
variables. A common use of scatter diagrams is to determine if
there is a linear relationship between the X variable and the Y
variable.

Introduction
Types of Variables: Qualitative, Quantitative

Scatter Diagram _ Oldfaithfuldata

Descriptive Statistics

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Tables and charts are often used for Categorical data. There
are many similarities between the methods for numerical data
and categorical data. One main distinction is that the terms,
classes or class intervals, which are based on a range of
numerical values is replaced by types of objects or
categories.
The idea of frequencies or percentages is then taken with
respect to these categories.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Summary Table

The Summary Table is quite similar to the frequency distribution

table for numerical values except that now frequencies and
percentages are calculated with respect to types or categories
of objects.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Suppose that there are 4 categories labeled {A,B,C,D}, then

one would calculate the number of objects of type A, B, C, D to
obtain the frequencies and organize this in a chart.
That is, the frequency of A, can be represented as
fA = # of observations in class A

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Bar Chart

A Bar Chart is very similar to a histogram. A frequency bar

chart is constructed by representing each category as a bar,
where the length or height of the bar represents the frequency
or percentage of observations falling in that category.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Example: One plots (A, fA ), (B, fB ), (C, fC ) etc, where A, B, C

play a similar role to the class intervals in a histogram. Note
that unlike histograms there is no real concept of a midpoint.
However because the types are represented by equally wide
bars, one still has a visual midpoint.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Pie Chart

This is a popular graphical device which simply represents the

percentage in each category as pieces of a pie. Hence the
category with the largest piece of pie, represents the category
with the largest percentage and so on. In order to properly
calculate the pie chart one uses the formula
360 percentage in category.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Pareto Diagram

In many respects a Pareto Diagram is quite similar to an Ogive

for numerical data combined with a Bar chart. To construct one
displays a ranked Bar chart in decreasing order. That is, the bar
chart starts with the category with the highest frequency or
percentage then the next highest etc. A cumulative frequency
polygon or Ogive can then be constructed using the visual
midpoints of the bar chart. The ranked bar chart combined with
the overlayed Ogive define the Pareto Diagram.
The Pareto Diagram is preferred to the bar chart and pie chart
when there are many categories.

Introduction

Descriptive Statistics

Tables and Charts for Categorical Data:

Tabulating Bivariate Categorical Data

Contingency table A contingency table or

class-classification table is used when comparison is
necessary between two categorical variables. The table
consists of a matrix(table) form as follows: Suppose that
variable 1 consists of the categories (V1 , V2 ) and variable 2
has possible categories (A, B, C)

var 1
V1
V2
Total

var 2
B

Total

the elements of the rows and columns can contain counts,

or percentages relative to row totals column totals or
overall totals.

Introduction
Tables and Charts for Categorical Data:

Dot plot _ Example for bivariate Data

Descriptive Statistics

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Measures of central tendency

Measures of central tendency or location of the data are used

to identify a typical value that can be used to describe the entire
set. Three common measurements are the arithmetic mean,
median, and the mode. Respectively these measure, the
average value, the middlemost value, and the most occurring
value in a dataset.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

The Arithmetic Mean

Certainly the most commonly recognized and used measure of
central tendency is the arithmetic mean or the (common)
average. If a dataset consists of n observations, X1 , X2 , . . . , Xn ,
then the arithmetic mean of the sample is written as
= X1 + X2 + . . . + Xn
X
n
The sum or total can be expressed in shorthand as,
n
X
i=1

Xi := X1 + X2 + . . . + Xn

Introduction

Descriptive Statistics

Numerical Descriptive Measures

The mean is often a fine measurement of central tendency.

However its main drawback is that it is greatly affected by
extreme values in the data.
That is values which are much smaller or much larger than
most of the values in the data set.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Example: Imagine that one wants to find the average income in

Hong Kong. Average in this sense means one wants to be able
to pinpoint what salary per month the average person makes.
In order to do this a survey is taken say based on 10 people at
random, the people report monthly incomes of
4000, 10,000, 10,000, 15,000, 20,000, 25,000, 30,000, 60,000,
60,000, 5,000,000 (a BIG TYCOON)
The total is 5,226,000 and the average income is
= 522, 600.
X

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Naturally as n increases this number will most likely decrease

but this example illustrates how sensitive mean calculations can
be.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Median
The Median is the middle value in an ordered array of data. The
median is not affected by extreme values and may be
preferable to the mean in this situation.
There are two methods of computing the median of the set of
data depending on whether the sample size is even or odd.
First one needs to remember to order the data from the
minimum to maximum value.
1

When n is odd;
Median =

n+1
ranked observation
2

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Example: consider the data 12, 7, 7, 9, 0, 7, 3. n=7 is odd and

the ordered values are
0, 3, 7, 7, 7, 9, 12.
The median is then 7 or the 4th value in the ordered sample.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

When n is even. The median is defined to be the average of the

two middle most values.
Example: recall the income example. 4000, 10,000, 10,000,
15,000, 20,000, 25,000, 30,000, 60,000, 60,000, 5,000,000.
The two middlemost values are 20,000 and 25,000 which yields
a median of
20, 000 + 25, 000
= 22, 500
2

Introduction

Descriptive Statistics

Numerical Descriptive Measures

The Mode The mode corresponds to the value in the data

set which occurs most often.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Geometric Mean:Investments
The Geometric Mean and the Geometric Rate of Return are
used to measure the status of an investment over time.
Measures the rate of change of a variable over time.
1

The formula for the geometric mean of variables X1 , . . . , Xn

is
1
G = (X1 X2 Xn ) n
X

The formula for the Geometric mean rate of return is

G = [(1 + R1 ) (1 + Rn )] n 1
R
where Ri is the rate of return in time period i. The rate of
return is defined to be the loss or gain in period i divided by
the starting value in the period and then multiplied by
100%.

Introduction

Descriptive Statistics

Numerical Descriptive Measures

Example: to illustrate this lets look at the example on p.104 in

text. An initial investment of 100,000 if made at the end of year
one the fund declined to 50,000 and then rebounded to its
original 100,000 value at the end of year two. Hence
R1 = (

50, 000 100, 000

) 100% := 50%
100, 000

R2 = (

100, 000 50, 000

) 100% := 100%
50, 000

Introduction

Descriptive Statistics

Numerical Descriptive Measures

The average return is calculated to be 25,000,

[R1 + R2 ]/2 100, 000 = .25 100, 000

Introduction

Descriptive Statistics

Numerical Descriptive Measures

while the geometric mean rate of return is calculated to be

(.5 2)1/2 1 = 0,
which more accurately reflects the fact that at the end of the 2
year period there was no gain or loss.

Introduction

Descriptive Statistics

Measures of noncentral tendency

Quartiles
The Quartiles divide the ranked data into four quarters.
1

The value of the data where 25% of the data is below and
75% are above it is called the 1st quartile, denoted as Q1 .
A formula for Q1 is given as
Q1 =

n+1
ordered observation
4

The value of the data where 75% of the data is below and
25% are above it is called the 3rd quartile, denoted as Q3 .
A formula for Q3 is given as
Q1 =

3(n + 1)
ordered observation
4

Introduction

Descriptive Statistics

Measures of noncentral tendency

Example: for the dataset 4000, 10,000, 10,000, 15,000, 20,000,

25,000, 30,000, 60,000, 60,000, 5,000,000. Using the formulas
the Q1 correspond to the 2.75 ordered value and Q3
corresponds to the 8.25 ordered value. It follows that rounding
up 2.75 to 3 yields
Q1 = 10, 000
Rounding down 8.25 to 8,
Q3 = 60, 000

Introduction

Descriptive Statistics

Measures of Variation

In addition to measurements of central tendency it is important

to identify the amount of Variability or Spread in data. Three
such measurements are the Range, Interquartile Range and
Variance.
Example: As a simple motivating example consider the case of
two data sets {2,2,2,2,2} and {0,1,2,3,4}
1

The arithmetic mean of both sets is 2 and the median of

both sets is also 2.

However the data sets are quite different. For the first data
set all measures of variation would yield the value 0 while
this will not be the case for the second data set.

Introduction

Descriptive Statistics

Measures of Variation

Range

The Range is simply the difference between the minimum

value and the maximum value of the data.

That is the formula

Range = Xlargest Xsmallest

It is a measure of the total spread in the data.

One drawback is that it does not take into account the

other data points besides the minimum and the maximum.

Another problem is it is highly sensitive to extreme values

Introduction

Descriptive Statistics

Measures of Variation

For the simple example above the Range of the second

data set {0, 1, 2, 3, 4} is
40=4
as compared to 2 2 = 0 for the first set {2, 2, 2, 2, 2}.

Introduction

Descriptive Statistics

Measures of Variation

Interquartile Range

The interquartile range considers the spread in the middle

50% of the data and is therefore not influenced by extreme
values

The formula is
Interquartile range = Q3 Q1

Introduction

Descriptive Statistics

Measures of Variation

Example: Calculate the Interquartile range for the data sets

{2, 2, 2, 2, 2} and {0, 1, 2, 3, 4}
1

Solution: first note that n = 5 and compute the positions for

Q1 and Q3

Note that,
n+1
6
= = 1.5
4
4

It follows that Q1 corresponds to the second number and

Q3 corresponds to the 3(1.5) := 4.5 or the 5th number.

hence for {2, 2, 2, 2, 2}, Q1 = Q3 = 2 and the Interquartile

Range is 0.

For {0, 1, 2, 3, 4}, Q3 = 4 and Q1 = 1, thus the Interquartile

Range is 4 1 = 3

Introduction

Descriptive Statistics

Measures of Variation

Variance and Standard Deviation

The Sample Variance of a data set is defined as,

(X1 X) + + (Xn X)
S =
n1
2

Introduction

Descriptive Statistics

Measures of Variation

Example: Calculate the variance of the data sets

{2, 2, 2, 2, 2} and {0, 1, 2, 3, 4}

Solution: first calculate the arithmetic mean X where n = 5.

2+2+2+2+2
0+1+2+3+4
= 2 and
=2
5
5

Now calculate the squared differences from the arithmetic

mean. For the data set {2, 2, 2, 2, 2} we see that all the
differences are 0. For {0, 1, 2, 3, 4}, we have (0 2)2 =
4, (1 2)2 = 1, (2 2)2 = 0, (3 2)2 = 1, (4 2)2 = 4

Introduction

Descriptive Statistics

Measures of Variation

the variance of {2, 2, 2, 2, 2} is 0 and for {0, 1, 2, 3, 4},

S2 =

4+1+0+1+4
= 2.5
4

Introduction

Descriptive Statistics

Measures of Variation

Computationally easier formula:

S =

2
i=1 Xi

nX
.
n1

The Variance measures the average squared distance of

the individual observations from the mean.

A low variance corresponds to small spread in the data.

In other words this suggest that most values are quite near
the mean.

A high variance translates into a dataset which has values

which are more widely spread out.

Introduction

Descriptive Statistics

Measures of Variation

Standard Deviation
1

A small drawback of the Sample Variance is that it is

measured in squared units relative to the measurements
for X.

That is if X1 is expressed in terms of dollars, the variance is

expressed in terms of dollars squared.

For this reason the Sample Standard Deviation is often

preferred

The Sample standard deviation is defined to be the positive

square root of the the sample variance and is denoted as S.

For instance the standard deviation is reported in dollars

not squared dollars

The standard deviation of the data set {0, 1, 2, 3, 4} is 2.5

Introduction

Descriptive Statistics

Measures of Variation

Understanding Variation in Data

The more spread out, or dispersed, the data are, the larger
will be the Range, the Interquartile Range, the Variance,
and the Standard deviation

The more concentrated, or similar, the data are, the

smaller will be the range, interquartile range, the variance
and the standard deviation.

If the observations are all the same, the range, the

interquartile range, variance and standard deviation are all
zero.

None of the measures of variation considered here can be

negative

Introduction

Descriptive Statistics

Measures of Variation

Coefficient of Variation
The Coefficient of Variation measures the scatter in the data
relative to the mean.
1

It is expressed in terms of percentages rather than units

and is calculated as
CV =

S
100%
X

An advantage of the coefficient of variation is that one

compare the relative variability of two or more variables
even when the two variables are based on different units of
measurement.

Introduction

Descriptive Statistics

Shape of a data set

Shape of Data

The third property of a data set is related to the way the data
are distributed. All descriptions of shape are taken relative to
how symmetric the data set is. A data set which is not
symmetric is said to be asymmetrical or skewed

Introduction

Descriptive Statistics

Shape of a data set

Symmetrical data set

A data set is considered symmetric if the mean and

median are equal That is, X =Median

A data set is right-skewed if the mean is greater than the

median. X >Median In other words there are some
extremely large values in the data

A data set is left-skewed if the mean is less than the

median. X <Median That is, there are some extremely
small values in the data

Introduction
Shape of a data set

Question 1

Which of the following is sensitive to extreme values

1. The median
2. The Interquartile range
3. The arithmetic mean
4. the 1st Quartile, Q1

Descriptive Statistics

Introduction

Descriptive Statistics

Shape of a data set

Question 2
A sociologist recently conducted a survey of citizens over 60
years of age whose net worth is too high to qualify for
subsidized medical care and have no private health insurance.
A summary of ages of the 25 uninsured senior citizens were as
follows
The average age is 74.04, the median age is 73, the first
Quartile is 65, the third Quartile is 81.
Identify which of the statements is correct.
1. One fourth of the senior citizens sampled are below 64
years of age
2. The middle 50% of the senior citizens sampled are
between 65 and 73 years of age
3. 25% of the senior citizens sampled are older than 81 years
of age
4. All of the above are correct

Introduction

Descriptive Statistics

Shape of a data set

Example: Sample: {5, 7, 1, 2, 4}

sum

Xi
5
7
1
2
4
19

(Xi X)
1.2
3.2
-2.8
-1.8
0.2
0
X=

S2 =

(Xi X)
1.44
10.24
7.84
3.24
0.04
22.8

19
= 3.8
5

22.8
= 5.7 and S = 2.387
4
S
CV = = 62.8%
X

Xi 2
25
49
1
4
16
95

Introduction

Descriptive Statistics

Describing Central Tendency

Some measures of central tendency for numerical

data

Population (N objects) vs Sample (n objects) // Parameter

vs. Statistic
The sample mean x is a point estimate of the population
mean
n
1X
x1 + x2 + . . . + xn
x =
=
xi
n
n
i=1

The Median Md is the middlemost measurement in the

ordering

Introduction

Descriptive Statistics

Describing Central Tendency

if n = 2p + 1 is odd: it is the (p + 1)th .

if n = 2p is even: it is the average of the pth and (p + 1)th
The Mode M0 is the measurement that occurs most
frequently.

Introduction

Descriptive Statistics

Measure of Variation

Some measures of variation for numerical data

The Range is the largest measurement minus the smallest

measurement
The sample Variance
Standard Deviation

Introduction

Descriptive Statistics

Percentiles, Quartiles, and Box-and-Whiskers Displays

Introduction

Descriptive Statistics

Summarizing Data/Exploratory data Analysis

5-number summary provides a way of determining the

shape of data based on the quantities
Xsmallest

median

Xlargest

Box and Whisker plots: Uses the 5-number summary to

graphically represent the data.

Introduction

Descriptive Statistics

Summarizing Data/Exploratory data Analysis

R IGHT- SKEWED DATA

Xsmall Q1 med

Xlargest

S YMMETRICAL DATA
Xsmall

med

Xlargest

L EFT- SKEWED DATA

Xsmall

med Q3

Xlargest

Introduction

Descriptive Statistics

Summarizing Data/Exploratory data Analysis

Skewness
Symmetry/Skewness
Skewed to the right, Symmetrical, Skewed to the left

Introduction

Descriptive Statistics

Descriptive measures of the Population

Suppose that X1 , . . . , XN represents now all possible values

from a population of size N.
1
Note that in some cases the population size N may be
unknown or infinite. An example of a finite population is the
collection of students at UST.
2
Similar to the arithmetic mean of the sample, we can
calculate the true mean of the population, denoted as .
PN
Xi
= i=1
N
3
The variance of the Population is calculated as Population
variance 2 and Population standard deviation of data
X1 , X2 , . . . , XN :
2

N
1X
=
(Xi )2
N
i=1

and =

Introduction

Descriptive Statistics

Descriptive measures of the Population

The phrase within k standard deviations from the mean refers

to data which are in an interval
[ k, + k]

Introduction

Descriptive Statistics

Descriptive measures of the Population

Based on a known standard deviation of the population, there

are at least two rules which can tell us more about the
clustering and distribution of the data/masses.
(1) The Empirical Rule requires the data histogram is
symmetrical and bell-shaped.

k
1
2
3
4

% of data within k SD
each way from the mean
68%
95%
99%
ALL

Introduction

Descriptive Statistics

Descriptive measures of the Population

1
(2) The Chebyshev Rule: states that at least 1 2
k
data lie within k standard deviation of their mean
(regardless of how skewed the data is).

1
(in %)
k
1 2
k
1 Not calculable (NA)
2
3/4
(75%)
3
8/9
(89%)
4
15/16
(94%)

of the

Introduction

Descriptive Statistics

Descriptive measures of the Population

Example
The mean is = 28.2 and = 6.75.
a. 1 standard deviation: between 21.45 and 34.95 Ans:
Empirical Rule 68%
b. 2 standard deviations:between 14.7 and 41.7 Ans:
Empirical Rule 95%
c. Between 21.45 and 34.95 using Chebyshev rule Ans: NA
d. Between 14.7 and 41.7 using Chebyshev Rule Ans: 75%
e. Between 7.95 and 48.45 using Chebyshev Rule Ans: 89%
f. 94% should have values within 4 standard deviations from
the mean according to Chebyshev Rule, which is between
1.2 and 55.2

Introduction

Descriptive Statistics

Descriptive measures of the Population

Coefficient of Correlation
The coefficient of correlation measures the strength of the
linear relationship between two variables X and Y
1

The correlation coefficient always satisfies

1 1

If = 1 then there is a perfect positive linear relationship.

That is
Y = a + bX
where b is a positive number
If = 1 then there is a perfect negative relationship. That
is
Y = a bX
If = 0 then there is no correlation (no linear relationship)
between X and Y.

Introduction

Descriptive Statistics

Descriptive measures of the Population

Sample Coefficient of Correlation

The quantity measures the correlation for entire population.
One can use a scatter diagram.
In a sample one should not expect to see perfect
correlations. In place of we can calculate the sample
coefficient of correlation, denoted as r.
Measures linear relationship between two numerical
variables X and Y of a dataset of size n:
n
X

Xi X

Yi Y

r = v i=1
,
uX
n
u n
2 X
2
t
Xi X
Yi Y
i=1

i=1

Introduction
Descriptive measures of the Population

where 1 r 1 .

Descriptive Statistics

Introduction

Descriptive Statistics

Descriptive measures of the Population

Example. X represents Energy cost and Y represents Price of

refrigerator.
Xi
48
54
58
66
77
66
70
81
72
78
670

Yi
850
760
900
870
1100
800
650
750
750
570
8000

(Xi X)
-19
-13
-9
-1
10
-1
3
14
5
11
0

(Yi Y)
50
-40
100
70
300
0
-150
-50
-50
-330
0

(Xi X)(Yi Y)
-950
520
-900
-70
3000
0
-450
-700
-250
-3630
-3430

Introduction

Descriptive Statistics

Descriptive measures of the Population

X = 67 and Y = 800
10
X

(Xi X) = 1064

i=1
10
X

(Yi Y) = 245, 400

i=1

r = 0.1641
The result of r indicates a very weak negative relationship
between price and energy cost.

1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
1-Branches of Statistics, Types of Variables
No ratings yet
1-Branches of Statistics, Types of Variables
18 pages
BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
BCSC 108 MAY 24 Introduction to Statistics
63 pages
Intro123243ewqs1
No ratings yet
Intro123243ewqs1
37 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
41 pages
Statistics
No ratings yet
Statistics
260 pages
1.1 Introduction To Statistics and Data Gatherings
No ratings yet
1.1 Introduction To Statistics and Data Gatherings
102 pages
LEC 04 - Student - Introduction To Statistics (Part 1)
No ratings yet
LEC 04 - Student - Introduction To Statistics (Part 1)
68 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Lesson Plan For Sounds
No ratings yet
Lesson Plan For Sounds
27 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
31 pages
Engdan 203 1
No ratings yet
Engdan 203 1
43 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
32 pages
Chapter
No ratings yet
Chapter
64 pages
1 Chapt 1 Part 1
No ratings yet
1 Chapt 1 Part 1
41 pages
PowerPoint Presentation On Statistics
No ratings yet
PowerPoint Presentation On Statistics
65 pages
Session 1 Stats BBA 1Y Essec
No ratings yet
Session 1 Stats BBA 1Y Essec
39 pages
Biostatics For Nurses
No ratings yet
Biostatics For Nurses
74 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
Statistical Analysis with Software Application
100% (1)
Statistical Analysis with Software Application
6 pages
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
40 pages
Introduction To Statistics: Learning Objectives
No ratings yet
Introduction To Statistics: Learning Objectives
33 pages
Elem Stat Prob Sjcc
No ratings yet
Elem Stat Prob Sjcc
54 pages
LS 01 - Basic Concept - Dispersion
No ratings yet
LS 01 - Basic Concept - Dispersion
57 pages
Lecture # 1 Introduction and Scope of Statistics
No ratings yet
Lecture # 1 Introduction and Scope of Statistics
34 pages
Basic Concept in Statistics-Biostat
No ratings yet
Basic Concept in Statistics-Biostat
29 pages
Topic 1 ELEMENTARY STATISTICS
No ratings yet
Topic 1 ELEMENTARY STATISTICS
29 pages
Chap 1_5c732e71caaab5259ff09db79c47e6bc (1)
No ratings yet
Chap 1_5c732e71caaab5259ff09db79c47e6bc (1)
5 pages
BS1 Statistics
No ratings yet
BS1 Statistics
26 pages
STATISTICS
No ratings yet
STATISTICS
64 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
21 pages
lec1
No ratings yet
lec1
26 pages
AE9 - Statistical Analysis With Software Application
100% (1)
AE9 - Statistical Analysis With Software Application
16 pages
Statistic Chap 1
No ratings yet
Statistic Chap 1
16 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
Statistics
No ratings yet
Statistics
89 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Chapter One Definition of Statistics
No ratings yet
Chapter One Definition of Statistics
17 pages
CH 1 An Introduction 1
No ratings yet
CH 1 An Introduction 1
11 pages
Chapter 1 Introduction Bio
No ratings yet
Chapter 1 Introduction Bio
7 pages
What Is Statistics1
No ratings yet
What Is Statistics1
20 pages
Chapter 1 - NATURE OF STATISTICS
No ratings yet
Chapter 1 - NATURE OF STATISTICS
14 pages
STAT110 Biostatistics
No ratings yet
STAT110 Biostatistics
21 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
26 pages
7a1a96f31c748dbb0763fa4427dffe7b
No ratings yet
7a1a96f31c748dbb0763fa4427dffe7b
66 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
PowerPoint Presentation On Statistics
No ratings yet
PowerPoint Presentation On Statistics
66 pages
Ch. 1 1
No ratings yet
Ch. 1 1
19 pages
Statistics
100% (1)
Statistics
9 pages
Updated -BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
Updated -BCSC 108 MAY 24 Introduction to Statistics
69 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
STA132 Lecture Notes - 1
No ratings yet
STA132 Lecture Notes - 1
6 pages
Stat Chapter 1
No ratings yet
Stat Chapter 1
11 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Statistical Techniques in Business & Q Economics: Professor: Mamdouh Hamza Ahmed
No ratings yet
Statistical Techniques in Business & Q Economics: Professor: Mamdouh Hamza Ahmed
16 pages
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Ee421 - Industrial Management: Unit-I
No ratings yet
Ee421 - Industrial Management: Unit-I
1 page
Mayajaal
No ratings yet
Mayajaal
1 page
Software Testing Process: Presented by CTS
No ratings yet
Software Testing Process: Presented by CTS
90 pages
Real Time Systems Modelling
No ratings yet
Real Time Systems Modelling
38 pages
Air France Internet Marketing
33% (3)
Air France Internet Marketing
2 pages
Averages Level 2
No ratings yet
Averages Level 2
73 pages
Important math Questions for Practice - RBE material
No ratings yet
Important math Questions for Practice - RBE material
5 pages
Ins 657 Risk Management Chapter 3 (2019)
No ratings yet
Ins 657 Risk Management Chapter 3 (2019)
33 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
12 pages
Mahamaya Technicaluniversity,: Syllabus For First Year of Master of Business Administration (Mba)
No ratings yet
Mahamaya Technicaluniversity,: Syllabus For First Year of Master of Business Administration (Mba)
34 pages
STATISTICS
No ratings yet
STATISTICS
2 pages
Aso Syllabus
No ratings yet
Aso Syllabus
5 pages
Data Management - Part 2.1
No ratings yet
Data Management - Part 2.1
19 pages
Summarizing Data
No ratings yet
Summarizing Data
49 pages
GMAT-1000 DS Questions
No ratings yet
GMAT-1000 DS Questions
52 pages
AP3456 Mathematics & Physics PDF
0% (1)
AP3456 Mathematics & Physics PDF
319 pages
(Emerson) Data analysis and interpretation
No ratings yet
(Emerson) Data analysis and interpretation
7 pages
General Maths & Ability Sir Sabir Hussain - Study River
100% (1)
General Maths & Ability Sir Sabir Hussain - Study River
185 pages
DAX Formulas and Statements in Power BI
No ratings yet
DAX Formulas and Statements in Power BI
1 page
Experiment-1: Aim: - To Calculate Mean, Median, Mode, Standard Deviation and
No ratings yet
Experiment-1: Aim: - To Calculate Mean, Median, Mode, Standard Deviation and
3 pages
Pearson Statistics Chapter 8 Tools
No ratings yet
Pearson Statistics Chapter 8 Tools
4 pages
Stat Full Book MCQs
100% (1)
Stat Full Book MCQs
28 pages
Statistics
No ratings yet
Statistics
41 pages
NM
No ratings yet
NM
18 pages
Unit 8..8602 PDF
No ratings yet
Unit 8..8602 PDF
47 pages
Statistical Inferences Notes
No ratings yet
Statistical Inferences Notes
15 pages
CPT Examination: A.Mathematics 1.ratio and Proportion
No ratings yet
CPT Examination: A.Mathematics 1.ratio and Proportion
6 pages
Tutorial 15
No ratings yet
Tutorial 15
7 pages
Errors in Chemical Analyses: Assessing The Quality of Results
No ratings yet
Errors in Chemical Analyses: Assessing The Quality of Results
12 pages
Unit 05 - 08
No ratings yet
Unit 05 - 08
39 pages
BSC Statistics
No ratings yet
BSC Statistics
12 pages
Statistics - Theory Notes
No ratings yet
Statistics - Theory Notes
12 pages
Reading 1 Rates and Return Ans
No ratings yet
Reading 1 Rates and Return Ans
10 pages
Unit 3_Statistical Measures
No ratings yet
Unit 3_Statistical Measures
34 pages
Practical-3
No ratings yet
Practical-3
32 pages