Ca 1 Merged

Download as pdf or txt
Download as pdf or txt
You are on page 1of 677

Data Analytics with Python

Lecture 1: Introduction to data analytics

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Objective of the course
• The principle focus of this course is to introduce conceptual understanding
using simple and practical examples rather than repetitive and point click
mentality
• This course should make you comfortable using analytics in your career
and your life
• You will know how to work with real data, and might have learned many
different methodologies but choosing the right methodology is important

2
Objective of the course Contd…

• The danger in using quantitative method does not generally


lie in the inability to perform the calculation
• The real threat is lack of fundamental understanding of:
– Why to use a particular technique of procedure
– How to use it correctly and,
– How to correctly interpret the result

3
Learning objectives
1. Define data and its importance
2. Define data analytics and its types
3. Explain why analytics is important in today’s business environment
4. Explain how statistics, analytics and data science are interrelated
5. Why python?
6. Explain the four different levels of Data:
– Nominal
– Ordinal
– Interval and
– Ratio

4
1. Define Data and its importance

• Variable, Measurement and Data

• What is generating so much data?

• How data add value to the business?

• Why data is important?

5
1.1 Variable, Measurement and Data

• Variables – is a characteristic of any entity being studied that is capable of


taking on different values

• Measurements – is when a standard process is used to assign numbers to


particular attributes or characteristic of a variable

• Data – data are recorded measurements

6
1.2 What is generating so much data?

• Data can be generated by


– Humans,
– Machines or
– Humans-machines combines
• It can be generated anywhere where any information is
generated and stored in structured or unstructured formats

7
1.3 How data add value to business?
Data warehouse

Development of Data Product Discovery of Data Insight


Algorithm solutions in production, marketing and sales Quantitative data analysis to help steer
etc.(e.g. Recommendation Engines) strategic business decision

Business value

Source:https://datajobs.com/

8
Data Products

9
1.4 Why Data is important?

• Data helps in make better decisions


• Data helps in solve problems by finding the reason for
underperformance
• Data helps one to evaluate the performance.
• Data helps one improve processes
• Data helps one understand consumers and the market

10
2. Define data analytic and its types
• Define data analytics

• Why analytics is important?

• Data analysis

• Data analytics vs. Data analysis

• Types of Data analytics

11
2.1. Define data analytics

• Analytics is defined as “the scientific process of transforming data into


insights for making better decisions”
• Analytics, is the use of data, information technology, statistical analysis,
quantitative methods, and mathematical or computer-based models to
help managers gain improved insight about their business operations and
make better, fact-based decisions – James Evans
• Analysis = Analytics ?

12
2.2 Why analytics is important?

• Opportunity abounds for the use of analytics and big data


such as:
1. Determining credit risk
2. Developing new medicines
3. Finding more efficient ways to deliver products and services
4. Preventing fraud
5. Uncovering cyber threats
6. Retaining the most valuable customers

13
2.3 Data analysis
• Data analysis is the process of examining, transforming, and
arranging raw data in a specific way to generate useful
information from it
• Data analysis allows for the evaluation of data through
analytical and logical reasoning to lead to some sort of
outcome or conclusion in some context
• Data analysis is a multi-faceted process that involves a
number of steps, approaches, and diverse techniques

14
Analysis 2.4 Data analytics vs. Data analysis

Past

Explain
How?
Why?

15
2.4 Data analytics vs. Data analysis Analytics

Future

Explore potential future events

16
2.4 Data analytics vs. Data analysis
Analytics

Qualitative Quantitative

ll
ll
Intuition + analysis Formulas + algorithms

17
Analysis

Quantitative

ll
Qualitative Data + how the sale decreased last summer
ll

Explains How And Why Story ends the way it did ?

18
Analysis =/ Analytics
Data Analysis =/ Data analytics

Business Analysis =/ Business analytics

19
2.5 Classification of Data analytics

Based on the phase of workflow and the kind of analysis required, there are
four major types of data analytics.

• Descriptive analytics

• Diagnostic analytics

• Predictive analytics

• Prescriptive analytics

20
Classification of Data analytics

https://www.governanceanalytics.org/knowledge-
base/Main_Tools/Data_classification_and_analysis

21
Descriptive Analytics
• Descriptive Analytics, is the conventional form of Business Intelligence and
data analysis
• It seeks to provide a depiction or “summary view” of facts and figures in
an understandable format
• This either inform or prepare data for further analysis
• Descriptive analysis or statistics can summarize raw data and convert it
into a form that can be easily understood by humans
• They can describe in detail about an event that has occurred in the past

22
Example
A common example of Descriptive Analytics are company reports that simply
provide a historic review like:
• Data Queries
• Reports
• Descriptive Statistics
• Data Visualization
• Data dashboard

Source: https://www.linkedin.com/learning/478e9692-d13d-338f-907e-d76f0724d773

23
Diagnostic analytics

• Diagnostic Analytics is a form of advanced analytics which examines data


or content to answer the question “Why did it happen?”

• Diagnostic analytical tools aid an analyst to dig deeper into an issue so


that they can arrive at the source of a problem

• In a structured business environment, tools for both descriptive and


diagnostic analytics go parallel

24
Example

• It uses techniques such as:

1. Data Discovery

2. Data Mining

3. Correlations

25
Predictive analytics

• Predictive analytics helps to forecast trends based on the current events

• Predicting the probability of an event happening in future or estimating


the accurate time it will happen can all be determined with the help of
predictive analytical models

• Many different but co-dependent variables are analysed to predict a trend


in this type of analysis

26
Source: https://www.logianalytics.com/wp-content/uploads/2017/11/predictive-1.png

27
Example

• Set of techniques that use model constructed from past data to predict
the future or ascertain impact of one variable on another:
1. Linear regression
2. Time series analysis and forecasting
3. Data mining

Source: https://bigdata-madesimple.com/5-examples-predictive-analytics-travel-industry/

28
Prescriptive analytics

• Set of techniques to indicate the best course of action


• It tells what decision to make to optimize the outcome
• The goal of prescriptive analytics is to enable:
1. Quality improvements
2. Service enhancements
3. Cost reductions and
4. Increasing productivity

29
Prescriptive analytics: Example

• Optimization Model
• Simulation
• Decision Analysis

30
3. Explain why analytics is important

• Demand for Data Analytics


• Element of data Analytics

31
3. Explain why analytics is important

Data Scientist
Search Trends
Statistician, Operations Researcher

32
https://timesofindia.indiatimes.com/india/Data-scientists-earning-more-than-
CAs-engineers/articleshow/52171064.cms

33
3.1 Demand for Data Analytics

http://timesofindia.indiatimes.com/articleshow/52171064.cms?utm_source=
contentofinterest&utm_medium=text&utm_campaign=cppst

34
3.2 Element of data Analytics

35
4. Data analyst and Data scientist

• The requisite skill set

• Difference between Data analyst and Data Scientist

36
4.1 The requisite skill set

Technology;
Mathematic
Hacking Skill
Expertise

Business and
strategy Data Science
acumen

37
4.1 The requisite skill set

Mathematic Technology;
Expertise Hacking Skill

Business and
strategy
Data Science
acumen

38
4.1 The requisite skill set

Mathematic Technology;
Expertise Hacking Skill

Business and
strategy
Data Science
acumen

39
4.2 Difference between Data analyst and Data Scientist

Business Administration
Analyst

Domain specific responsibility : For Example marketing analyst, Financial analyst etc.

Data exploration analysis and insight

Data Scientist
Advance algorithms and machine learning

Data product engineering

Source:https://datajobs.com/

40
5. Why python?

Features
• Simple and easy to learn
• Freeware and Open source
• Interpreted
• Dynamically Typed
• Extensible
• Embedded
• Extensive library

41
5. Why python?

Usability
• Desktop and web applications
• Database applications
• Networking applications
• Data analysis (Data Science)
• Machine learning
• IoT and AI applications
• Games

42
Companies using Python

43
Why Jupyter NoteBook?

Why?
• Client – Server Application
• Edit code on web browser
• Easy in documentation
• Easy in demonstration
• User- friendly Interface

44
6. Explain the four different levels of Data
• Types of Variables
• Levels of Data Measurement
• Compare the four different levels of Data:
Nominal
Ordinal
Interval and
Ratio
• Usage Potential of Various Levels of Data
• Data Level, Operations, and Statistical Methods

45
6.1 Types of Variables

Data

Categorical Numerical

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
Examples: Examples:
(Defined categories)
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
6.2 Levels of Data Measurement

• Nominal — Lowest level of measurement


• Ordinal
• Interval
• Ratio — Highest level of measurement

47
6.3.1 Nominal

• A nominal scale classifies data into distinct categories in which no ranking


is implied
• Example : Gender, Marital Status

48
6.3.2 Ordinal scale

• An ordinal scale classifies data into distinct categories in which ranking is


implied
• Example:
– Product satisfaction  Satisfied, Neutral, Unsatisfied
– Faculty rank  Professor, Associate Professor, Assistant Professor
– Student Grades  A, B, C, D, F

49
6.3.3. Interval scale

• An interval scale is an ordered scale in which the difference between


measurements is a meaningful quantity but the measurements do not have a
true zero point.
• Example
– Temperature in Fahrenheit and Celsius
– Year

50
6.3.4 Ratio scale

• A ratio scale is an ordered scale in which the difference between the


measurements is a meaningful quantity and the measurements have a true
zero point.
• Example
– Weight
– Age
– Salary

51
6.4 Usage Potential of Various
Levels of Data
Ratio
Interval
Ordinal

Nominal

52
6.5 Impact of choice of measurement scale

Statistical
Data Level Meaningful Operations
Methods

Nominal Classifying and Counting Nonparametric

Ordinal All of the above plus Ranking Nonparametric

Interval All of the above plus Parametric


Addition, Subtraction

Ratio All of the above plus


multiplication and division Parametric

53
Thank You

54
Data Analytics with Python
Lecture 2: Python – Fundamentals

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Learning objectives
1. Installing Python
2. Fundamentals of Python
3. Data Visualisation

2
Python Installation Process

Installation Process –

Step 1: Type https://www.anaconda.com at the address bar of web


browser.
Step 2: Click on download button
Step 3: Download python 3.7 version for windows OS
Step 4: Double click on file to run the application
Step 5: Follow the instructions until completion of installation process

3
Python Installation Process

Installation Process –

Step 1: Type https://www.anaconda.com at the address bar of web browser.

4
Python Installation Process

Step 2: Click on download button

5
Python Installation Process

Step 3: Download python 3.7 version for windows OS

6
Python Installation Process

Step 4: Double click on file to run the application

7
Python Installation Process

8
Python Installation Process

9
Python Installation Process

10
Python Installation Process

11
Python Installation Process

12
Python Installation Process

13
Python Installation Process

14
Python Installation Process

15
Python Installation Process

16
Why Jupyter NoteBook?

Why?
• Edit code on web browser
• Easy in documentation
• Easy in demonstration
• User- friendly Interface

17
Python and Jupyter

Python Programming Language Jupyter Application

Software Package contains both


python and jupyter application

18
19
About Jupyter NoteBook

Cell -> Access using Enter Key

20
About Jupyter NoteBook

Input Field -> Green color indicates edit mode


Blue color indicates command mode

21
About Jupyter NoteBook

-> It contains documentation


-> Text not executed as code

22
About Jupyter Notebook

• Command mode allow to edit notebook as whole


• To close edit mode (Press Escape key)
• Execution (Three ways)
o Ctrl +Enter (Output field can not be modified)
o Shift +Enter (Output field is modified)
o Run button on Jupyter interface

• Comment line is written preceding with # symbol.

23
About Jupyter Notebook

• Important shortcut keys

o A -> To create cell above


o B -> To create cell below
o D + D -> For deleting cell
o M -> For markdown cell
o Y -> For code cell

24
Fundamentals of Python

• Loading a simple delimited data file


• Counting how many rows and columns were loaded
• Determining which type of data was loaded
• Looking at different parts of the data by subsetting rows
and columns

25
26
Loading a simple delimited data file

Data Source: www.github.com/jennybc/gapminder.

27
28
• head method shows us only the first 5 rows

29
Get the number of rows and columns

30
get column names

31
get the dtype of each column

32
Pandas Types Versus Python Types

33
get more information about data

34
Looking at Columns, Rows, and Cells

• # get the country column and save it to its own variable

35
# show the first 5 observations

36
# show the last 5 observations

37
# Looking at country, continent, and year

38
39
Data Analytics with Python
Lecture 3: Python – Fundamentals - II

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Looking at Columns, Rows, and Cells

• Subset Rows by Index Label: loc

2
get the first row

• Python counts from 0

3
• # get the 100th row
# Python counts from 0

4
• get the last row

5
Subsetting Multiple Rows

• # select the first, 100th, and 1000th rows

6
Subset Rows by Row Number: iloc

• # get the 2nd row

7
• get the 100th row

8
• # using -1 to get the last row

9
With iloc, we can pass in the -1 to get the last row—something we couldn’t do with loc.

10
• # get the first, 100th, and 1000th rows

11
Subsetting Columns

• The Python slicing syntax uses a colon, :


• If we have just a colon, the attribute refers to everything.
• So, if we just want to get the first column using the loc or iloc syntax,
we can write something like df.loc[:, [columns]] to subset the column(s).

12
• # subset columns with loc
# note the position of the colon
# it is used to select all rows

13
14
• # subset columns with iloc
• # iloc will alow us to use integers
• # -1 will select the last column

15
Subsetting Columns by Range

• # create a range of integers from 0 to 4 inclusive

16
• # subset the dataframe with the range

17
Subsetting Rows and Columns

• # using loc

18
• # using iloc

19
Subsetting Multiple Rows and Columns

• #get the 1st, 100th, and 1000th rows


# from the 1st, 4th, and 6th columns

20
• if we use the column names directly,
# it makes the code a bit easier to read
# note now we have to use loc, instead of iloc

21
22
23
Grouped Means

• # For each year in our data, what was the average life
expectancy?
# To answer this question,
# we need to split our data into parts by year;
# then we get the 'lifeExp' column and calculate the mean

24
25
26
• If you need to “flatten” the dataframe, you can use the
reset_index method.

27
Grouped Frequency Counts

• use the nunique to get counts of unique values on a Pandas Series.

28
Basic Plot

29
30
Visual Representation of the Data
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for categories of a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot

31
Methods of visual presentation of data

• Table

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr


East 20.4 27.4 90 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

32
Methods of visual presentation of data

• Graphs
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

33
Methods of visual presentation of data

• Pie chart

1st Qtr
2nd Qtr
3rd Qtr
4th Qtr

34
Methods of visual presentation of data
• Multiple bar chart

4th Qtr

3rd Qtr North


West
2nd Qtr East

1st Qtr

0 20 40 60 80 100

35
Methods of visual presentation of data

• Simple pictogram

100
80
60
40
North
20
East
0 West
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

36
Frequency distributions

• Frequency tables

Observation Table
Class Interval Frequency Cumulative Frequency
< 20 13 13
<40 18 31
<60 25 56
<80 15 71
<100 9 80

37
Frequency diagrams
Frequency

30 Cumulative Frequency
25 Frequency

20
90
80
15
70
10 60
5 50
Cumulative Frequency
0 40
< 20 <40 <60 <80 <100 30
20
Frequency 10
0
30 < 20 <40 <60 <80 <100
25
20
15 Frequency
10
5
0
< 20 <40 <60 <80 <100

38
Histogram

20
Class Interval Frequency

Frequency
20-under 30 6

10
30-under 40 18
40-under 50 11
50-under 60 11

0
60-under 70 3 0 10 20 30 40 50 60 70 80
Years
70-under 80 1

39
Histogram Construction

20
Class Interval Frequency
20-under 30 6

Frequency
30-under 40 18

10
40-under 50 11
50-under 60 11
60-under 70 3

0
70-under 80 1
0 10 20 30 40 50 60 70 80
Years

40
Frequency Polygon

20
Class IntervalFrequency
20-under 30 6

Frequency
30-under 40 18

10
40-under 50 11
50-under 60 11
60-under 70 3

0
70-under 80 1 0 10 20 30 40 50 60 70 80
Years

41
Ogive

Cumulative

60
Class Interval Frequency

40
Frequency
20-under 30 6
30-under 40 24

20
40-under 50 35
50-under 60 46

0
60-under 70 49 0 10 20 30 40 50 60 70 80

70-under 80 50 Years

42
Relative Frequency Ogive

Cumulative

Cumulative Relative Frequency


1.00
Relative 0.90
0.80
Class Interval Frequency 0.70
0.60
20-under 30 .12 0.50
30-under 40 .48 0.40
0.30
40-under 50 .70 0.20
0.10
50-under 60 .92 0.00
60-under 70 .98 0 10 20 30 40 50 60 70 80
70-under 80 1.00 Years

43
Pareto Chart
100 100%
90 90%
80 80%
70 70%
Frequency 60 60%
50 50%
40 40%
30 30%
20 20%
10 10%
0 0%
Poor Short in Defective Other
Wiring Coil Plug

44
Scatter Plot

Registered Gasoline Sales


Vehicles (1000's of 200

(1000's) Gallons)

Gasoline Sales
5 60 100

15 120
9 90
0
15 140 0 5 10 15 20
Registered Vehicles
7 60

45
Principles of Excellent Graphs
• The graph should not distort the data
• The graph should not contain unnecessary adornments (sometimes
referred to as chart junk)
• The scale on the vertical axis should begin at zero
• All axes should be properly labeled
• The graph should contain a title
• The simplest possible graph should be used for a given set of data
Graphical Errors: Chart Junk

Bad Presentation  Good Presentation

Minimum Wage Minimum Wage


1960: $1.00
$
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
Graphical Errors:
Compressing the Vertical Axis

Bad Presentation  Good Presentation


Quarterly Sales Quarterly Sales
$ $
200 50

100 25

0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point on the Vertical Axis

Bad Presentation
 Good Presentations
Monthly Sales $ Monthly Sales
$ 45
45
42
42 39
39 36
36 0
J F M A M J J F M A M J

Graphing the first six months of sales


Lecture 4: Central Tendency and Dispersion

Dr. A. Ramesh
Department of Management Studies

1
Lecture objectives

• Central tendency
• Measures of Dispersion

2
Measures of Central Tendency

• Measures of central tendency yield information about “particular places or


locations in a group of numbers.”
• A single number to describe the characteristics of a set of data

3
Summary statistics

• Central tendency or measures of • Dispersion


location – Skewness
– Arithmetic mean – Kurtosis
– Weighted mean – Range
– Median – Interquartile range
– Percentile – Variance
– Standard score
– Coefficient of variation

4
Arithmetic Mean
• Commonly called ‘the mean’
• It is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including extreme values
• Computed by summing all values in the data set and dividing the sum by
the number of values in the data set

5
Population Mean

 X  X 1
 X 2
 X 3
 ...  X N

N N
24  13  19  26  11

5
93

5
 18.6

6
Sample Mean

X 
X  X 1
 X 2
 X 3
 ...  X n

n n
57  86  42  38  90  66

6
379

6
 63.167

7
Mean of Grouped Data
• Weighted average of class midpoints
• Class frequencies are the weights

  fM
f

 fM
N
f 1M 1  f 2 M 2  f 3M 3    fiMi

f 1  f 2  f 3    fi

8
Calculation of Grouped Mean
Class Interval Frequency(f) Class Midpoint(M) fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150


fM 2150
  43.0
f 50

9
Weighted Average

• Sometimes we wish to average numbers, but we want to assign more


importance, or weight, to some of the numbers.

• The average you need is the weighted average.


Formula for Weighted Average

 xw
Weighted Average 
w
where x is a data value and w is
the weight assigned to that data
value. The sum is taken over all
data values.
Example
Suppose your midterm test score is 83 and your final exam score is 95.
Using weights of 40% for the midterm and 60% for the final exam, compute
the weighted average of your scores. If the minimum average for an A is
90, will you earn an A?

Weighted Average 
830.40 950.60
0.40  0.60
32  57
  90.2
1 You will earn an A!
Median
• Middle value in an ordered array of numbers

• Applicable for ordinal, interval, and ratio data

• Not applicable for nominal data

• Unaffected by extremely large and extremely small values

13
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array
– If there is an odd number of terms, the median is the middle term of the
ordered array
– If there is an even number of terms, the median is the average of the
middle two terms
• Second Procedure
– The median’s position in an ordered array is given by (n+1)/2.

14
Median: Example with an Odd Number of Terms

Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
• There are 17 terms in the ordered array.
• Position of median = (n+1)/2 = (17+1)/2 = 9
• The median is the 9th term, 15.
• If the 22 is replaced by 100, the median is 15.
• If the 3 is replaced by -103, the median is 15.

15
Median: Example with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21

• There are 16 terms in the ordered array

• Position of median = (n+1)/2 = (16+1)/2 = 8.5

• The median is between the 8th and 9th terms, 14.5

• If the 21 is replaced by 100, the median is 14.5

• If the 3 is replaced by -88, the median is 14.5

16
Median of Grouped Data

N
 cfp
Median  L  2 W 
fmed
Where :
L  the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies

17
Median of Grouped Data -- Example
Cumulative
N
Class Interval Frequency Frequency  cfp
20-under 30 6 6 Md  L  2 W 
30-under 40 18 24 fmed
40-under 50 11 35 50
 24
50-under 60 11 46
60-under 70 3 49
 40  2 10 
11
70-under 80 1 50  40.909
N = 50

18
Mode

• The most frequently occurring value in a data set

• Applicable to all levels of data measurement (nominal, ordinal, interval,


and ratio)

• Bimodal -- Data sets that have two modes

• Multimodal -- Data sets that contain more than two modes

19
Mode -- Example

• The mode is 44
• There are more 44s 35 41 44 45

than any other value 37 41 44 46

37 43 44 46

39 43 44 46

40 43 44 46

40 43 45 48

20
Mode of Grouped Data
• Midpoint of the modal class
• Modal class has the greatest frequency

Class Interval Frequency


20-under 30 6  d1 
Mode  LMo   w 
30-under 40 18  d1  d 2 
40-under 50 11
 12 
50-under 60 11 30   10  36.31
60-under 70 3  12  7 
70-under 80 1

21
22
Percentiles
• Measures of central tendency that divide a group of data into 100 parts

• Example: 90th percentile indicates that at most 90% of the data lie
below it, and at least 10% of the data lie above it

• The median and the 50th percentile have the same value

• Applicable for ordinal, interval, and ratio data

• Not applicable for nominal data

23
Percentiles: Computational Procedure
• Organize the data into an ascending ordered array
• Calculate the p th percentile location:
P
i ( n)
100
• Determine the percentile’s location and its value.

• If i is a whole number, the percentile is the average of the values at the


i and (i+1) positions

• If i is not a whole number, the percentile is at the (i+1) position in the


ordered array

24
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of 30th percentile:
30
i (8)  2.4
100

• The location index, i, is not a whole number; i+1 = 2.4+1=3.4;


the whole number portion is 3; the 30th percentile is at the 3rd
location of the array; the 30th percentile is 13.

25
Dispersion

• Measures of variability describe the spread or the dispersion of a set of


data

• Reliability of measure of central tendency

• To compare dispersion of various samples

26
Variability

No Variability in Cash Flow Mean

Variability in Cash Flow Mean

27
Measures of Variability or dispersion
Common Measures of Variability
• Range
• Inter-quartile range
• Mean Absolute Deviation
• Variance
• Standard Deviation
• Z scores
• Coefficient of Variation

28
Range – ungrouped data

• The difference between the largest and the smallest values in 35 41 44 45


a set of data
37 41 44 46
• Simple to compute
37 43 44 46
• Ignores all data points except the two extremes
• Example: 39 43 44 46

Range = Largest – Smallest = 48 - 35 = 13 40 43 44 46

40 43 45 48

29
Quartiles
• Measures of central tendency that divide a group of data into four subgroups

• Q1: 25% of the data set is below the first quartile

• Q2: 50% of the data set is below the second quartile

• Q3: 75% of the data set is below the third quartile

• Q1 is equal to the 25th percentile

• Q2 is located at 50th percentile and equals the median

• Q3 is equal to the 75th percentile

• Quartile values are not necessarily members of the data set

30
Quartiles

Q1 Q2 Q3

25% 25% 25% 25%

31
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122, 125, 129
• Q1 i
25
(8)  2 Q 
109  114
1  111.5
100 2

• Q2:
50 116  121
i (8)  4 Q2   118.5
100 2

• Q3:
75 122  125
i (8)  6 Q3   123.5
100 2

32
Interquartile Range

• Range of values between the first and third quartiles


• Range of the “middle half”
• Less influenced by extremes

Interquartile Range  Q3  Q1

33
Deviation from the Mean

• Data set: 5, 9, 16, 17, 18


• Mean:

 X  65  13
N 5
• Deviations from the mean: -8, -4, 3, 4, 5

-4 +5
-8 +4
+3
0 5 10 15 20


34
Mean Absolute Deviation

• Average of the absolute deviations from the mean

X X   X  
M . A.D. 
 X 
5 -8 +8 N
9 -4 +4 24

16 +3 +3 5
17 +4 +4  4.8
18 +5 +5
0 24

35
Population Variance
• Average of the squared deviations from the arithmetic mean

X X   X  
2

 X  
2

 
2

5 -8 64 N
130
9 -4 16 
5
16 +3 9  26.0
17 +4 16
18 +5 25
0 130

36
Population Standard Deviation
• Square root of the variance

X X   X  
2

 X  
2

 
2

N
5 -8 64 130
9 -4 16 
5
16 +3 9  26.0
17 +4 16   
2

18 +5 25  26.0
0 130  5.1

37
Sample Variance
• Average of the squared deviations from the arithmetic mean

X X  X X  X 
2

 X  X 
2

2,398 625 390,625



2

1,844 71 5,041 S n 1
1,539 -234 54,756 663,866
1,311 -462 213,444 
3
7,092 0 663,866
 221, 288.67

38
Sample Standard Deviation
• Square root of the sample variance

X X  X X  X 
2

 X  X 
2


2

2,398 625 390,625 S n 1


663,866
1,844 71 5,041 
3
1,539 -234 54,756  221, 288.67
1,311 -462 213,444 S  S
2

7,092 0 663,866  221, 288.67


 470.41

39
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants

40
Standard Deviation as an Indicator of Financial Risk

Annualized Rate of Return


Financial  
Security

A 15% 3%
B 15% 7%

41
Lecture 5: Central Tendency and Dispersion- II

Dr. A. Ramesh
Department of Management Studies

1
The Empirical Rule… If the histogram is bell shaped

• Approximately 68% of all observations fall


within one standard deviation of the mean.

• Approximately 95% of all observations fall


within two standard deviations of the mean.

• Approximately 99.7% of all observations fall


within three standard deviations of the mean.

2
Empirical Rule

• Data are normally distributed (or approximately normal)

Distance from Percentage of Values


the Mean Falling Within Distance

  1 68
  2 95
  3 99.7
3
Chebysheff’s Theorem…Not often used because interval is very wide.

• A more general interpretation of the standard deviation is derived


from Chebysheff’s Theorem, which applies to all shapes of histograms
(not just bell shaped).
• The proportion of observations in any sample that lie within k
standard deviations of the mean is at least:

For k=2 (say), the theorem states that at least


3/4 of all observations lie within 2 standard
deviations of the mean. This is a “lower bound”
compared to Empirical Rule’s approximation
(95%).

41
Coefficient of Variation

• Ratio of the standard deviation to the mean, expressed as a percentage


• Measurement of relative dispersion


. .  100 
CV

5
Coefficient of Variation

  29
1
  84
2

 1
 4.6  2
 10
 100  100
C.V .  
1
1
C.V .  
2
2

1 2

4.6 10
 100  100
29 84
 1586
.  11.90

6
Variance and Standard Deviation
of Grouped Data

Population Sample

 f  M   S  M  X 
2 2
f
 
2
2 
n1
N
2
S 
   S
2

7
Population Variance and Standard Deviation of
Grouped Data(mu=43)

f M fM M M  M 


2 2
Class Interval f

20-under 30 6 25 150 -18 324 1944


30-under 40 18 35 630 -8 64 1152
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
60-under 70 3 65 195 22 484 1452
70-under 80 1 75 75 32 1024 1024
50 2150 7200

M    2
2
 f 7200
 144  12

2
   144
N 50
8
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness

9
Skewness

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed

10
Skewness..
The skewness of a distribution is measured by comparing the relative positions
of the mean, median and mode.
• Distribution is symmetrical
• Mean = Median = Mode
• Distribution skewed right
• Median lies between mode and mean, and mode is less than mean
• Distribution skewed left
• Median lies between mode and mean, and mode is greater than
mean

11
Skewness

Mean Mode Mean Mean


Mode
Median
Median Mode Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed

12
Coefficient of Skewness

• Summary measure for skewness

3   Md 
S

• If S < 0, the distribution is negatively skewed (skewed to the left)

• If S = 0, the distribution is symmetric (not skewed)

• If S > 0, the distribution is positively skewed (skewed to the right)

13
Coefficient of Skewness

 1
 23  2
 26  3
 29

M
d1  26 M
d2  26 M
d3  26
 1
 12.3  2
 12.3  3
 12.3



3 1  M 
d1


3 2  M d2  

3 3  M 
d3
S 1
 S 2
 S 3

1 2 3

3 23  26 3 26  26 3 29  26


  
12.3 12.3 12.3
 0.73 0  0.73
14
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out

Leptokurtic

Mesokurtic
Platykurtic

15
Box and Whisker Plot

• Five specific values are used:

– Median, Q2

– First quartile, Q1

– Third quartile, Q3

– Minimum value in the data set

– Maximum value in the data set

16
Box and Whisker Plot

Minimum Q1 Q2 Q3 Maximum

17
Skewness: Box and Whisker Plots, and Coefficient of
Skewness
S=0 S>0
S<0

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed

18
THANK YOU

19
Lecture 6: Introduction to Probability

Dr. A. Ramesh
Department of Management Studies

1
Lecture objectives

• Comprehend the different ways of assigning probability


• Understand and apply marginal, union, joint, and conditional probabilities
• Solve problems using the laws of probability including the laws of addition,
multiplication and conditional probability
• Revise probabilities using Bayes’ rule

2
Probability

• Probability is the numerical measure of the likelihood that an event will occur.

• The probability of any event must be between 0 and 1, inclusively


– 0 ≤ P(A) ≤ 1 for any event A.

• The sum of the probabilities of all mutually exclusive and collectively


exhaustive events is 1.
– P(A) + P(B) + P(C) = 1
– A, B, and C are mutually exclusive and collectively exhaustive

3
Range of Probability

1 Certain

.5

0 Impossible

4
Methods of Assigning Probabilities

• Classical method of assigning probability (rules and laws)

• Relative frequency of occurrence (cumulated historical data)

• Subjective Probability (personal intuition or reasoning)

5
Classical Probability

• Number of outcomes leading to the event divided by the total number of


outcomes possible
• Each outcome is equally likely
• Determined a priori -- before performing the experiment
• Applicable to games of chance
• Objective -- everyone correctly using the method assigns an identical
probability

6
Classical Probability

P( E ) 
n e
N
Where:
N  total number of outcomes
ne
 number of outcomes in E

7
Relative Frequency Probability

• Based on historical data

• Computed after performing the experiment

• Number of times an event occurred divided by the number of trials

• Objective -- everyone correctly using the method assigns an identical


probability

8
Relative Frequency Probability

P( E )  ne
N
Where:
N  total number of trials
n e
 number of outcomes
producing E

9
Subjective Probability

• Comes from a person’s intuition or reasoning


• Subjective -- different individuals may (correctly) assign different numeric
probabilities to the same event
• Degree of belief
• Useful for unique (single-trial) experiments
– New product introduction
– Initial public offering of common stock
– Site selection decisions
– Sporting events

10
Probability - Terminology

• Experiment
• Event
• Elementary Events
• Sample Space
• Unions and Intersections
• Mutually Exclusive Events
• Independent Events
• Collectively Exhaustive Events
• Complementary Events

11
Experiment, Trial, Elementary Event, Event
• Experiment: a process that produces outcomes
– More than one possible outcome
– Only one outcome per trial
• Trial: one repetition of the process
• Elementary Event: cannot be decomposed or broken down into other
events
• Event: an outcome of an experiment
– may be an elementary event, or
– may be an aggregate of elementary events
– usually represented by an uppercase letter, e.g., A, E1

12
An Example Experiment
• Experiment: randomly select,
without replacement, two families Tiny Town Population
from the residents of Tiny Town
• Elementary Event: the sample Children in Number of
Family Household
includes families A and C Automobiles
• Event: each family in the sample
has children in the household A Yes 3
• Event: the sample families own a B Yes 2
total of four automobiles C No 1
D Yes 2

13
Sample Space

• The set of all elementary events for an experiment


• Methods for describing a sample space
– roster or listing
– tree diagram
– set builder notation
– Venn diagram

14
Sample Space: Roster Example

• Experiment: randomly select, without replacement, two families from the


residents of Tiny Town
• Each ordered pair in the sample space is an elementary event, for example
-- (D,C)
Children in Number of Listing of Sample Space
Family
Household Automobiles
(A,B), (A,C), (A,D),
A Yes 3
(B,A), (B,C), (B,D),
B Yes 2
(C,A), (C,B), (C,D),
C No 1
(D,A), (D,B), (D,C)
D Yes 2

15
Sample Space: Tree Diagram for Random Sample of Two
Families

16
Sample Space: Set Notation for Random Sample of Two
Families
• S = {(x,y) | x is the family selected on the first draw, and y is the family
selected on the second draw}
• Concise description of large sample spaces

17
Sample Space
• Useful for discussion of general principles and concepts

Listing of Sample Space


Venn Diagram
(A,B), (A,C), (A,D),
(B,A), (B,C), (B,D),
(C,A), (C,B), (C,D),
(D,A), (D,B), (D,C)

18
Union of Sets

• The union of two sets contains an instance of each element of the two
sets.

X  1,4,7,9
Y  2,3,4,5,6 X Y
X  Y  1,2,3,4,5,6,7,9

C   IBM , DEC , Apple


F   Apple, Grape, Lime
C  F   IBM , DEC , Apple, Grape, Lime

19
Intersection of Sets

• The intersection of two sets contains only those element common to the
X  1,4,7,9
two sets.

Y  2,3,4,5,6 X Y

X  Y   4

C   IBM , DEC , Apple


F   Apple, Grape, Lime
C  F   Apple
20
Mutually Exclusive Events
• Events with no common outcomes
• Occurrence of one event precludes the occurrence of the other event

C   IBM , DEC , Apple


F  Grape, Lime
C F  
X Y
X  1,7,9
Y  2 ,3,4 ,5,6
X Y    P( X  Y )  0
21
Independent Events

• Occurrence of one event does not affect the occurrence or nonoccurrence


of the other event
• The conditional probability of X given Y is equal to the marginal probability
of X.
• The conditional probability of Y given X is equal to the marginal probability
of Y.

P( X | Y )  P( X ) and P(Y | X )  P(Y )

22
Collectively Exhaustive Events

• Contains all elementary events for an experiment

E1 E2 E3

Sample Space with three


collectively exhaustive events

23
Complementary Events

• All elementary events not in the event ‘A’ are in its complementary event.

P( Sample Space )  1
A
Sample
Space A
P( A)  1  P( A)

24
Counting the Possibilities

• mn Rule
• Sampling from a Population with Replacement
• Combinations: Sampling from a Population without Replacement

25
mn Rule

• If an operation can be done m ways and a second operation can be done n


ways, then there are mn ways for the two operations to occur in order.
• This rule is easily extend to k stages, with a number of ways equal to
n1.n2.n3..nk

• Example: Toss two coins . The total umber of simple events is 2 x 2 =4

26
Sampling from a Population with Replacement

• A tray contains 1,000 individual tax returns. If 3 returns are randomly


selected with replacement from the tray, how many possible samples are
there?
• (N)n = (1,000)3 = 1,000,000,000

27
Combinations

• A tray contains 1,000 individual tax returns. If 3 returns are randomly


selected without replacement from the tray, how many possible samples
are there?

N N! 1000!
    166,167,00 0
 n  n!( N  n)! 3!(1000  3)!

28
Four Types of Probability
Marginal Union Joint Conditional

P( X ) P( X  Y ) P( X  Y ) P( X | Y )
The probability The probability The probability The probability
of X occurring of X or Y of X and Y of X occurring
occurring occurring given that Y
has occurred
X X Y X Y

29
General Law of Addition

P ( X  Y )  P( X )  P( Y )  P( X  Y )
X Y

30
Design for improving productivity?

31
Problem
• A company conducted a survey for the American Society of Interior
Designers in which workers were asked which changes in office design
would increase productivity.
• Respondents were allowed to answer more than one type of design
change.

Reducing noise would increase 70 %


productivity
More storage space would 67 %
increase productivity

32
Problem

• If one of the survey respondents was randomly selected and asked what
office design changes would increase worker productivity,
– what is the probability that this person would select reducing noise or
more storage space?

33
Solution

• Let N represent the event “reducing noise.”


• Let S represent the event “more storage/ filing space.”
• The probability of a person responding with N or S can be symbolized
statistically as a union probability by using the law of addition.

34
General Law of Addition -- Example
P( N  S )  P( N )  P( S )  P( N  S )

N S P ( N ) .70
P ( S ) .67
P ( N  S ) .56
.56
.70 .67 P ( N  S ) .70.67 .56
 0.81

35
Office Design Problem
Probability Matrix

Increase
Storage Space
Yes No Total
Noise Yes .56 .14 .70
Reduction
No .11 .19 .30
Total .67 .33 1.00

36
Joint Probability Using a Contingency Table
Event
Event B1 B2 Total

A1 P(A1 and B1) P(A1 and B2) P(A1)

A2 P(A2 and B1) P(A2 and B2) P(A2)

Total P(B1) P(B2) 1

Joint Probabilities Marginal (Simple) Probabilities


37
Office Design Problem - Probability Matrix
Increase
Storage Space

Yes No Total
Noise Yes .56 .14 .70
Reduction
No .11 .19 .30
Total .67 .33 1.00

P( N  S )  P( N )  P( S )  P( N  S )
.70.67 .56
.81

38
Law of Conditional Probability

39
Office Design Problem

40
Problem

• A company data reveal that 155 employees worked one of four types of
positions.
• Shown here again is the raw values matrix (also called a contingency table)
with the frequency counts for each category and for subtotals and totals
containing a breakdown of these employees by type of position and by
sex.

41
Contingency Table

42
Solution

• If an employee of the company is selected randomly, what is the


probability that the employee is female or a professional worker?

43
Problem

• Shown here are the raw values matrix and corresponding probability
matrix for the results of a national survey of 200 executives who were
asked to identify the geographic locale of their company and their
company’s industry type.
• The executives were only allowed to select one locale and one industry
type.

44
Lecture 7: Introduction to Probability-II

Dr. A. Ramesh
Department of Management Studies

1
Problem

• A company data reveal that 155 employees worked one of four types of
positions.
• Shown here again is the raw values matrix (also called a contingency table)
with the frequency counts for each category and for subtotals and totals
containing a breakdown of these employees by type of position and by
sex.

2
Contingency Table

3
Solution

• If an employee of the company is selected randomly, what is the


probability that the employee is female or a professional worker?

4
Problem

• Shown here are the raw values matrix and corresponding probability
matrix for the results of a national survey of 200 executives who were
asked to identify the geographic locale of their company and their
company’s industry type.
• The executives were only allowed to select one locale and one industry
type.

5
6
Questions

a. What is the probability that the respondent is from the Midwest (F)?

b. What is the probability that the respondent is from the communications


industry (C) or from the Northeast (D)?

c. What is the probability that the respondent is from the Southeast (E) or
from the finance industry (A)?

7
8
Mutually Exclusive Events
Type of Gender
Position Male Female Total
Managerial 8 3 11
Professional 31 13 44
Technical 52 17 69
Clerical 9 22 31
Total 100 55 155
P(T  C )  P(T )  P(C )
69 31
 
155 155
.645
9
Mutually Exclusive Events
Type of Gender
Position Male Female Total
Managerial 8 3 11
Professional 31 13 44
Technical 52 17 69
Clerical 9 22 31
Total 100 55 155
P( P  C )  P( P)  P(C )
44 31
 
155 155
.484
10
Law of Multiplication

P( X  Y )  P( X )  P( Y | X )  P( Y )  P( X | Y )

11
Problem

• A company has 140 employees, of which 30 are supervisors.


• Eighty of the employees are married, and 20% of the married employees
are supervisors.
• If a company employee is randomly selected, what is the probability that
the employee is married and is a supervisor?

12
Married
Y N Sub total
Supervisor Y 0.1143 30
N 110
Sub 80 60 140
total

13
80
P( M )   0. 5714
140
P( S| M )  0. 20
P ( M  S )  P ( M )  P ( S| M )
 ( 0. 5714 )( 0. 20 )  0.1143

14
Law of Multiplication

P( S )  1  P( S )
Probability Matrix  1  0. 2143  0. 7857
of Employees P( M  S )  P( S )  P( M  S )
Married  0. 7857  0. 4571  0. 3286
Supervisor Yes No Total P( M  S )  P( M )  P( M  S )
Yes .1143 .1000 .2143  0. 5714  0.1143  0. 4571
No .4571 .3286 .7857 P( M  S )  P( S )  P( M  S )
Total .5714 .4286 1.00  0. 2143  0.1143  0.1000
P( M )  1  P( M )
 1  0. 5714  0. 4286

15
Special Law of Multiplication for Independent Events

• General Law

P( X  Y )  P( X )  P(Y | X )  P(Y )  P( X | Y )

• Special Law
If events X and Y are independent,
P( X )  P( X | Y ), and P (Y )  P (Y | X ).
Consequently,
P( X  Y )  P( X )  P(Y )

16
Law of Conditional Probability

• The conditional probability of X given Y is the joint probability of X and Y


divided by the marginal probability of Y.

P( X  Y ) P( Y | X )  P( X )
P( X | Y )  
P( Y ) P( Y )

17
Conditional Probability
• A conditional probability is the probability of one event, given that
another event has occurred:
P(A and B) The conditional
P(A | B)  probability of A given
P(B) that B has occurred
P(A and B) The conditional
P(B | A) 
P(A) probability of B given
that A has occurred
Where P(A and B) = joint probability of A and B
P(A) = marginal probability of A
P(B) = marginal probability of B
18
Computing Conditional Probability

• Of the cars on a used car lot, 70% have air conditioning (AC)
and 40% have a CD player (CD). 20% of the cars have both.
• What is the probability that a car has a CD player, given that it
has AC ?
• We want to find P(CD | AC).
Computing Conditional Probability

CD No CD Total

AC 0.2 0.5 0.7

No 0.2 0.1 0.3


AC
Total 0.4 0.6 1.0

P(CD and AC) .2


P(CD | AC)    .2857
P(AC) .7

Given AC, we only consider the top row (70% of the cars). Of
these, 20% have a CD player. 20% of 70% is about 28.57%.
Computing Conditional Probability: Decision Trees

.2
.7
Given AC or P(AC and CD) = .2
no AC:
.5
P(AC and CD/) = .5
.7
All
Cars .2
.3
P(AC/ and CD) = .2

.1 P(AC/ and CD/) = .1


.3
Independent Events

• If X and Y are independent events, the occurrence of Y does not affect the
probability of X occurring.
• If X and Y are independent events, the occurrence of X does not affect the
probability of Y occurring.

If X and Y are independent events


,
P( X | Y )  P( X ), and
P(Y | X )  P(Y ).
Statistical Independence

 Two events are independent if and only if:

P(A | B)  P(A)

 Events A and B are independent when the


probability of one event is not affected by the
other event
Independent Events Demonstration
Geographic Location
Northeast Southeast Midwest West
D E F G
Finance A .12 .05 .04 .07 .28

Manufacturing B .15 .03 .11 .06 .35

Communications C .14 .09 .06 .08 .37

.41 .17 .21 .21 1.00

Test the matrix for the 200 executive responses to determine


whether industry type is independent of geographic location.
Independent Events Demonstration Contd…

P( A  G) 0.07
P( A| G)    0.33 P( A)  0.28
P(G ) 0.21
P( A| G)  0.33  P( A)  0.28
Independent Events

D E
A 8 12 20 8
P( A| D)  .2353
34
B 20 30 50
20
P ( A)  .2353
C 6 9 15 85
P( A| D)  P( A)  0.2353
34 51 85
Revision of Probabilities: Bayes’ Rule

• An extension to the conditional law of probabilities


• Enables revision of original probabilities with new
information

P(Y | Xi ) P( Xi )
P( Xi| Y ) 
P(Y | X 1) P( X 1)  P(Y | X 2 ) P( X 2 )  P(Y | Xn ) P( Xn )
28
29
30
31
Problem
• A particular type of printer ribbon is produced by only
two companies, Alamo Ribbon Company and South
Jersey Products.
• Suppose Alamo produces 65% of the ribbons and
that South Jersey produces 35%.
• Eight percent of the ribbons produced by Alamo are
defective and 12% of the South Jersey ribbons are
defective
• A customer purchases a new ribbon. What is the
probability that Alamo produced the ribbon? What is
the probability that South Jersey produced the
ribbon?
Revision of Probabilities
with Bayes' Rule: Ribbon Problem
P( Alamo)  0. 65
P( SouthJersey)  0. 35
P( d | Alamo)  0. 08
P( d | SouthJersey)  0.12
P( d | Alamo)  P( Alamo)
P( Alamo| d ) 
P( d | Alamo)  P( Alamo)  P( d | SouthJersey)  P( SouthJersey)
( 0. 08)( 0. 65)
  0. 553
( 0. 08)( 0. 65)  ( 0.12 )( 0. 35)
P( d | SouthJersey)  P( SouthJersey)
P( SouthJersey| d ) 
P( d | Alamo)  P( Alamo)  P( d | SouthJersey)  P( SouthJersey)
( 0.12 )( 0. 35)
  0. 447
( 0. 08)( 0. 65)  ( 0.12 )( 0. 35)
Revision of Probabilities with Bayes’ Rule: Ribbon Problem
Revision of Probabilities
with Bayes' Rule: Ribbon Problem
Defective
0.08 0.052
Alamo
0.65
Acceptable + 0.094
0.92
Defective 0.042
0.12
South
Jersey
0.35 Acceptable
0.88
THANK YOU

36
Data Analytics with Python
Lecture 8: Probability Distributions

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Lecture Objectives

• Empirical Distribution
• Discrete Distributions
• Continuous Distributions

2
What is a distribution?

• Describes the ‘shape’ of a batch of numbers

• The characteristics of a distribution can sometimes be defined using a


small number of numeric descriptors called ‘parameters’

3
Why distribution?
• Can serve as a basis for standardized comparison of empirical
distributions
• Can help us estimate confidence intervals for inferential statistics
• Form a basis for more advanced statistical methods
– ‘fit’ between observed distributions and certain theoretical
distributions is an assumption of many statistical procedures

4
Random variable
• A variable which contains the outcomes of a chance experiment
• “Quantifying the outcomes”
• Example X= (1 = Head, 0 = Tails)
• A variable that can take on different values in the population
according to some “random” mechanism
• Discrete
– Distinct values, countable
– Year
• Continuous
– Mass

5
Probability Distributions

• The probability distribution function or probability density function (PDF)


of a random variable X means the values taken by that random variable
and their associated probabilities.

• PDF of a discrete r.v. (also known as PMF):


Example 1: Let the r.v. X be the number of heads obtained in two tosses of
a coin.
Sample Space: {HH, HT, TH, TT}

6
PDF of Discrete r.v.
Number of Heads (X): 0 1 2 sum
PDF (P(X)): ¼ ½ ¼ 1
The PDF of the Number of Heads in Two
Tosses of a Coin

0.6
0.5
Probability Density

0.5
0.4
0.3 0.25 0.25
0.2
0.1
0
0 1 2
Number of Heads

7
Probability Distribution for the Random Variable X
A probability distribution for a discrete random
variable X:
x –8 –3 –1 0 1 4 6
P(X = x) 0.13 0.15 0.17 0.20 0.15 0.11 0.09

Find
a. P  X  0  0.65
b. P  3  X  1 0.67

8
Discrete Distribution -- Example

Distribution of Daily
Crises P
Number of r 0.5
Probability o
Crises 0.4
b
0 0.37 a 0.3
b
1 0.31 0.2
i
2 0.18 l 0.1
3 0.09 i
0
4 0.04 t 0 1 2 3 4 5
y
5 0.01 Number of Crises

9
Requirements for a Discrete Probability Function
• Probabilities are between 0 and 1, inclusively

• Total of all probabilities equals 1

0  P( X )  1 for all X

 P( X )  1
over all x

10
Cumulative Distribution Function

• The CDF of a random variable X (defined as F(X)) is a graph


associating all possible values, or the range of possible values with
P(X  x).
• CDFs always lie between 0 and 1 i.e., 0  F(Xi)  1, Where F(Xi) is
the CDF.

11
The Expected Value of X
Let X be a discrete rv with set of possible values D and pmf p(x). The
expected value or mean value of X, denoted

E ( X ) or  X , is

E( X )   X   x  p ( x)
xD

12
Mean and Variance of a Discrete Random Variable

A probability distribution can be viewed as a loading with the


mean equal to the balance point. Parts (a) and (b) illustrate
equal means, but Part (a) illustrates a larger variance.
Mean and Variance of a Discrete Random Variable

The probability distribution illustrated in Parts (a) and (b)


differ even though they have equal means and equal
variances.
Example – Expected Value
• Use the data below to find out the expected number of credit cards that a
customer to a retail outlet will possess.
x = # credit cards
x P(x =X) E  X   x1 p1  x2 p2  ...  xn pn
0 0.08
1 0.28
 0(.08)  1(.28)  2(.38)  3(.16)
2 0.38  4(.06)  5(.03)  6(.01)
3 0.16
4 0.06 =1.97
5 0.03
About 2 credit cards
6 0.01

15
The Variance and Standard Deviation

Let X have pmf p(x), and expected value Then the 


variance of X, denoted V(X)

(or  X2 or  2 ), is
V ( X )   ( x   ) 2  p( x)  E[( X   ) 2 ]
D
The standard deviation (SD) of X is
 X   X2

16
The quiz scores for a particular student are given below:

22, 25, 20, 18, 12, 20, 24, 20, 20, 25, 24, 25, 18
Find the variance and standard deviation.
Value 12 18 20 22 24 25
Frequency 1 2 4 1 2 3
Probability .08 .15 .31 .08 .15 .23
  21
V ( X )  p1  x1     p2  x2     ...  pn  xn   
2 2 2

  V (X )
17
V ( X )  .08 12  21  .15 18  21  .31 20  21
2 2 2

.08  22  21  .15  24  21  .23  25  21


2 2 2

V ( X )  13.25

  V (X )  13.25  3.64

18
Shortcut Formula for Variance

 
V ( X )      x  p( x)    2
2 2

D 

E X    E  X 
2 2

19
Mean of a Discrete Distribution

  E  X    X  P( X )
X P(X) X.P(X)

-1 .1 -.1
0 .2 .0
1 .4 .4
2 .2 .4
3 .1 .3
1.0

20
Variance and Standard Deviation
of a Discrete Distribution


2
 X     P( X )  1.2
2
  
2
 12
.  110
.
X P(X) X  ( X   ) ( X  )
2 2
 P( X )

-1 .1 -2 4 .4
0 .2 -1 1 .2
1 .4 0 0 .0
2 .2 1 1 .2
3 .1 2 4 .4
1.2
21
Mean of the Data Example

  E  X    X  P( X )  115
.
X P(X) XP(X) P
r 0.5
0 .37 .00
o 0.4
1 .31 .31 b
a 0.3
2 .18 .36 b
0.2
i
3 .09 .27
l 0.1
4 .04 .16 i
0
t 0 1 2 3 4 5
5 .01 .05 y
Number
1.15

22
Properties of Expected Value

1.E (b)  b, b is a constant.


2. E(X +Y)= E(X)+ E(Y).

 
3.E X
Y

E( X )
E (Y )
.

4.E ( XY )  E ( X ) E (Y ) unless they are indpendendent.


5.E (aX )  aE ( X ), a constant.
6.E (aX  b)  aE ( X )  b, a and b are constants.

23
Properties of Variance
1. Var(constant) = 0
2. If X and Y are two independent random variables, then
Var(X + Y) = Var(X) + Var (Y) and
Var(X - Y) = Var(X) + Var (Y)
3. If b is a constant then Var(b+X) = Var(X)
4. If a is a constant then Var(aX) = a2Var(X)
5. If a and b are constants then Var(aX+b) = a2Var(X)
6. If X and Y are two independent random variables and a and b are
constants then Var(aX+bY) = a2Var(X) + b2Var(Y)

24
Covariance

Covariance: For two discrete random variables X and Y with E(X) =


x and E(Y) = y, the covariance between X and Y is defined as
Cov(XY) = xy = E(X - x) E(Y - y) = E(XY) - x y.

25
Covariance
• In general, the covariance between two random variables can be
positive or negative.
• If two random variables move in the same direction, then the
covariance will be positive, if they move in the opposite direction
the covariance will be negative.
Properties:
1.If X and Y are independent random variables, their covariance is
zero. Since E(XY) = E(X)E(Y)
2. Cov(XX) = Var(X)
3. Cov(YY) = Var(Y)

26
Correlation Coefficient

• The covariance tells the sign but not the magnitude about how
strongly the variables are positively or negatively related. The
correlation coefficient provides such measure of how strongly the
variables are related to each other.
• For two random variables X and Y with E(X) = x and E(Y) = y,
the correlation coefficient is defined as

Cov( XY )  xy
xy  
 x y  x y

27
28
Thank You

29
Data Analytics with Python
Lecture 9: Probability Distributions-II

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Some Special Distributions
• Discrete
– Binomial
– Poisson
– Hyper geometric
• Continuous
– Uniform
– Exponential
– Normal

2
Binomial Distribution

• Let us consider the purchase decisions of the next three customers who
enter a store.

• On the basis of past experience, the store manager estimates the


probability that any one customer will make a purchase is .30.

• What is the probability that two of the next three customers will make a
purchase?

3
Tree diagram for the Martin clothing store problem

4
Trial Outcomes

5
Graphical representation of the probability distribution
for the number of customers making a purchase
x P(x)
0 0.7 x 0.7 x 0.7=0.343

1 0.3x0.7x07+
0.7x0.3x0.7+
0.7x0.7x0.3 = 0.441

2 0.189
3 0.027

6
Binomial Distribution- Assumtions
• Experiment involves n identical trials
• Each trial has exactly two possible outcomes: success and failure
• Each trial is independent of the previous trials
• p is the probability of a success on any one trial
q = (1-p) is the probability of a failure on any one trial
• p and q are constant throughout the experiment
• X is the number of successes in the n trials

7
Binomial Distribution

• Probability n! X n X
P( X )  p q for 0  X  n
function X ! n  X !

• Mean
value   n p
• Variance and
standard  2
 n pq
deviation    2
 n pq

8
Binomial Table
SELECTED VALUES FROM THE BINOMIAL PROBABILITY TABLE
EXAMPLE: n = 10, x = 3, p = .40; f (3) = .2150

9
Mean and Variance
• Suppose that for the next month the Clothing Store forecasts 1000
customers will enter the store.
• What is the expected number of customers who will make a purchase?
• The answer is μ = np = (1000)(.3) = 300.
• For the next 1000 customers entering the store, the variance and
standard deviation for the number of customers who will make a
purchase are

10
Poisson Distribution

• Describes discrete occurrences over a continuum or interval


• A discrete distribution
• Describes rare events
• Each occurrence is independent any other occurrences.
• The number of occurrences in each interval can vary from zero to infinity.
• The expected number of occurrences must hold constant throughout the
experiment.

11
Poisson Distribution: Applications
• Arrivals at queuing systems
– airports -- people, airplanes, automobiles, baggage
– banks -- people, automobiles, loan applications
– computer file servers -- read and write operations

• Defects in manufactured goods


– number of defects per 1,000 feet of extruded copper wire
– number of blemishes per square foot of painted surface
– number of errors per typed page

12
Poisson Distribution
• Probability function

e
X 

P( X )  for X  0,1, 2, 3,...


X!
where:
  long  run average
e  2. 718282 ... (the base of natural logarithms )

Mean value Variance Standard deviation

  
13
Poisson Distribution: Example

  3.2 customers/4 minutes   3.2 customers/4 minutes


X = 10 customers/8 minutes X = 6 customers/8 minutes
Adjusted  Adjusted 
 =6.4 customers/8 minutes  =6.4 customers/8 minutes

P(X)= 


P(X)= 
X  X
e e
X! X!
10 6.4 6 6.4

P(X =10)= 6.4 e  0.0528 P(X =6)= 6.4 e  0.1586


10! 6!

14
Poisson Probability Table
Example: μ = 10, x = 5; f (5) = .0378

15
The Hypergeometric Distribution

• The binomial distribution is applicable when selecting from a


finite population with replacement or from an infinite population
without replacement.

• The hypergeometric distribution is applicable when selecting


from a finite population without replacement.
Hyper Geometric Distribution
• Sampling without replacement from a finite population

• The number of objects in the population is denoted N.

• Each trial has exactly two possible outcomes, success and failure.

• Trials are not independent

• X is the number of successes in the n trials

• The binomial is an acceptable approximation, if N/10 > n Otherwise it is not.

17
Hypergeometric Distribution
• Probability function
– N is population size
P( x) 
 ACx  N  ACn  x 
– n is sample size
N Cn
– A is number of successes in population
– x is number of successes in sample An
 
N
• Mean Value
A( N  A) n( N  n)

2
 2
N ( N  1)
• Variance and standard deviation
 
2

18
The Hypergeometric Distribution Example
• Different computers are checked from 10 in the department. 4 of the 10
computers have illegal software loaded.
• What is the probability that 2 of the 3 selected computers have illegal
software loaded?
• So, N = 10, n = 3, A = 4, X = 2
 A  N  A   4  6 
     
 X  n  X   2 1  (6)(6)
P(X  2)           0.3
N 10  120
   
n  3 
   
• The probability that 2 of the 3 selected computers have illegal
software loaded is .30, or 30%.
Continuous Probability Distributions

• A continuous random variable is a variable that can assume any value on


a continuum (can assume an uncountable number of values)
– thickness of an item
– time required to complete a task
– temperature of a solution
– height
• These can potentially take on any value, depending only on the ability to
measure precisely and accurately.
Continuous Distributions

• Uniform
• Normal
• Exponential
The Uniform Distribution

• The uniform distribution is a probability distribution that has equal


probabilities for all possible outcomes of the random variable

• Because of its shape it is also called a rectangular distribution


Uniform Distribution

 1
b  a for a xb
 1
f ( x)  
 0 ba
for all other values f (x)

Area = 1
a x b
Uniform Distribution: Mean and Standard Deviation

Mean
a +b
 =
2

Standard Deviation
ba

12
The Uniform Distribution

Example: Uniform probability distribution over the range 2 ≤ X ≤ 6:

1
f(X) = 6 - 2 = .25 for 2 ≤ X ≤ 6

f(X)
ab 26
μ   4
.25 2 2

(b - a) 2 (6 - 2 ) 2
σ   1 .1 5 4 7
2 6 X 12 12
Uniform Distribution Example

 1
 47  41 for 41  x  47
 1 1
f ( x)   
 0 47  41 6

for all other values f ( x)

Area = 1

41 47 x
Uniform Distribution: Mean and Standard Deviation

Mean Mean
a +b 41+47 88
 = =   44
2 2 2

Standard Deviation Standard Deviation


ba 47  41 6
    1. 732
12 12 3. 464
Uniform Distribution Probability

P ( x1  X  x2)  x x1
2
ba 45  42 1

47  41 2
f (x)
45  42 1
P( 42  X  45)  
47  41 2 Area
= 0.5

41 42 45 47 x
Example : Uniform Distribution

• Consider the random variable x representing the flight time of an airplane


traveling from Delhi to Mumbai.

• Suppose the flight time can be any value in the interval from 120 minutes
to 140 minutes.

• Because the random variable x can assume any value in that interval, x is a
continuous rather than a discrete random variable

29
Example : Uniform Distribution contd….

• Let us assume that sufficient actual flight data are available to conclude
that the probability of a flight time within any 1-minute interval is the
same as the probability of a flight time within any other 1-minute interval
contained in the larger interval from 120 to 140 minutes.

• With every 1-minute interval being equally likely, the random variable x is
said to have a uniform probability distribution.

30
Uniform Probability Distribution for Flight time

31
Probability of a flight time between 120 and 130
minutes

32
Exponential Probability Distribution
• The exponential probability distribution is useful in describing the time it
takes to complete a task.
• The exponential random variables can be used to describe:

Time required Distance between


Time between
to complete major defects
vehicle arrivals
a questionnaire in a highway
at a toll booth
Exponential Probability Distribution

• Density Function
for x > 0,  > 0

1  x /
f ( x)  e

where:  = mean
e = 2.71828
Exponential Probability Distribution

• Suppose that x represents the loading time for a truck at loading dock and
follows such a distribution.
• If the mean, or average, loading time is 15 minutes ( μ = 15), the
appropriate probability density function for x is
Exponential Distribution for the loading Dock Example
Exponential Probability Distribution
• Cumulative Probabilities
Cumulative Probabilities

 xo / 
P( x  x0 )  1  e

where:
x0 = some specific value of x x
Example: Exponential Probability Distribution

• The time between arrivals of cars at a Petrol pump follows an exponential

probability distribution with a mean time between arrivals of 3 minutes.

• The Petrol pump owner would like to know the probability that the time

between two successive arrivals will be 2 minutes or less.


Example: Petrol Pump Problem

f(x)

.4 P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866


.3
.2
.1
x
1 2 3 4 5 6 7 8 9 10
Time Between Successive Arrivals (mins.)
Relationship between the Poisson and Exponential
Distributions
The Poisson distribution
provides an appropriate description
of the number of occurrences
per interval

The exponential distribution


provides an appropriate description
of the length of the interval
between occurrences
Mean of Poisson and Mean of Exponential Distributions

• Because the average number of arrivals is 10 cars per hour, the average
time between cars arriving is
42
The Normal Distribution: Properties

• ‘Bell Shaped’
• Symmetrical f(X)
• Mean, Median and Mode are equal
• Location is characterized by the mean, μ σ
• Spread is characterized by the standard μ
deviation, σ
Mean = Median = Mode
• The random variable has an infinite
theoretical range: - to +
The Normal Distribution: Density Function
The formula for the normal probability density function is

2
1  (X μ) 
1   
2  
f(X)  e
2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
Chap 6-44
The Normal Distribution: Shape

By varying the parameters μ and σ, we obtain different normal


distributions
Data Analytics with Python
Lecture 10: Probability Distributions-III

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
The Normal Distribution: Properties

• ‘Bell Shaped’
• Symmetrical f(X)
• Mean, Median and Mode are equal
• Location is characterized by the mean, μ σ
• Spread is characterized by the standard μ
deviation, σ
Mean = Median = Mode
• The random variable has an infinite
theoretical range: - to +
The Normal Distribution: Density Function
The formula for the normal probability density function is

2
1  (X μ) 
1   
2  
f(X)  e
2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
Chap 6-3
The Normal Distribution: Shape

By varying the parameters μ and σ, we obtain different normal


distributions
The Normal Distribution: Shape

f(X) Changing μ shifts the distribution


left or right.

Changing σ increases or
decreases the spread.
σ

μ X
The Standardized Normal Distribution

• Any normal distribution (with any mean and standard deviation


combination) can be transformed into the standardized normal
distribution (Z).

• Need to transform X units into Z units.

• The standardized normal distribution has a mean of 0 and a standard


deviation of 1.
The Standardized Normal Distribution

• Translate from X to the standardized normal (the “Z” distribution) by


subtracting the mean of X and dividing by its standard deviation:

X μ
Z
σ
The Standardized Normal Distribution: Density
Function

• The formula for the standardized normal probability density


function is
Z2
1 2
f(Z)  e

Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
Z = any value of the standardized normal distribution
The Standardized Normal Distribution: Shape
• Also known as the “Z” distribution
• Mean is 0
• Standard Deviation is 1

f(Z)

Z
0
Values above the mean have positive Z-values, values below the mean have
negative Z-values
The Standardized Normal Distribution: Example

• If X is distributed normally with mean of 100 and standard deviation of


50, the Z value for X = 200 is

X  μ 200  100
Z   2 .0
σ 50
• This says that X = 200 is two standard deviations (2 increments of 50
units) above the mean of 100.
The Standardized Normal Distribution: Example

100 200 X (μ = 100, σ = 50)


0 2.0 Z (μ = 0, σ = 1)

Note that the distribution is the same, only the scale has changed. We
can express the problem in original units (X) or in standardized units (Z)
Normal Probabilities

Probability is measured by the area under the curve

f(X)
P(a ≤ X ≤ b)

(Note that the


probability of any
individual value is zero)

a b
Normal Probabilities

The total area under the curve is 1.0, and the curve is symmetric,
so half is above the mean, half is below.

f(X) P (    X  μ )  0 .5
P (μ  X   )  0 .5

0.5 0.5

P (    X   )  1 .0
Normal Probability Tables

Example:
P(Z < 2.00) = .9772

.9772

0 2.00 Z
Normal Probability Tables

The column gives the value of


Z to the second decimal point
Z 0.00 0.01 0.02 …

The row shows 0.0


the value of Z to 0.1
.
the first decimal . The value within the
. table gives the probability
point 2.0 .9772
from Z =   up to the
desired Z value.
2.0
P(Z < 2.00) = .9772
Finding Normal Probability
Procedure

To find P(a < X < b) when X is distributed normally:

• Draw the normal curve for the problem in terms of X.

• Translate X-values to Z-values.

• Use the Standardized Normal Table.


Finding Normal Probability: Example
• Let X represent the time it takes (in seconds) to download an image file
from the internet.
• Suppose X is normal with mean 8.0 and standard deviation 5.0
• Find P(X < 8.6)

X
8.0
8.6
Finding Normal Probability: Example

• Suppose X is normal with mean 8.0 and standard deviation 5.0. Find
P(X < 8.6).
X  μ 8 .6  8 .0
Z   0 .1 2
σ 5 .0

μ=8 μ=0
σ = 10 σ=1

8 8.6 X 0 0.12 Z

P(X < 8.6) P(Z < 0.12)


Finding Normal Probability: Example
Standardized Normal Probability P(X < 8.6)
Table (Portion)
= P(Z < 0.12)
.5478
Z .00 .01 .02

0.0 .5000 .5040 .5080


μ=0
0.1 .5398 .5438 .5478 σ=1

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255 0 0.12 Z


Finding Normal Probability: Example

• Find P(X > 8.6)…

P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)


= 1.0 - .5478 = .4522
.5478

1.0 - .5478 = .4522

Z
0
0.12
Finding Normal Probability: Between Two Values

• Suppose X is normal with mean 8.0 and standard deviation 5.0.


Find P(8 < X < 8.6)

Calculate Z-values:

X μ 88
Z  0
σ 5
8 8.6 X
X  μ 8.6  8 0 0.12 Z
Z   0.12
σ 5 P(8 < X < 8.6)
= P(0 < Z < 0.12)
Finding Normal Probability
Between Two Values

P(8 < X < 8.6)


• Standardized Normal Probability = P(0 < Z < 0.12)
• Table (Portion) = P(Z < 0.12) – P(Z ≤ 0)
= .5478 - .5000 = .0478
Z .00 .01 .02
.0478
0.0 .5000 .5040 .5080 .5000

0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255 Z


0.00 0.12
Given Normal Probability: Find the X Value

• Let X represent the time it takes (in seconds) to download an image file
from the internet.
• Suppose X is normal with mean 8.0 and standard deviation 5.0
• Find X such that 20% of download times are less than X.

.2000

? 8.0 X
? 0 Z
Given Normal Probability, Find the X Value

• First, find the Z value corresponds to the known probability


using the table.

Z …. .03 .04 .05

-0.9 …. .1762 .1736 .1711


.2000
-0.8 …. .2033 .2005 .1977

-0.7 …. .2327 .2296 .2266


? 8.0 X
-0.84 0 Z
Given Normal Probability,
Find the X Value

• Second, convert the Z value to X units using


the following formula.

X  μ  Zσ
 8.0  (0.84)5.0
 3.80

So 20% of the download times from the distribution with mean 8.0
and standard deviation 5.0 are less than 3.80 seconds.
Assessing Normality
• It is important to evaluate how well the data set is approximated by a normal
distribution.
• Normally distributed data should approximate the theoretical normal
distribution:
– The normal distribution is bell shaped (symmetrical) where the mean is
equal to the median.
– The empirical rule applies to the normal distribution.
– The interquartile range of a normal distribution is 1.33 standard deviations.
Assessing Normality
• Construct charts or graphs
– For small- or moderate-sized data sets, do stem-and-leaf display
and box-and-whisker plot look symmetric?
– For large data sets, does the histogram or polygon appear bell-
shaped?
• Compute descriptive summary measures
– Do the mean, median and mode have similar values?
– Is the interquartile range approximately 1.33 σ?
– Is the range approximately 6 σ?
Assessing Normality

• Observe the distribution of the data set


– Do approximately 2/3 of the observations lie within mean ± 1 standard
deviation?
– Do approximately 80% of the observations lie within mean ± 1.28
standard deviations?
– Do approximately 95% of the observations lie within mean ± 2 standard
deviations?
Z Table
Second Decimal Place in Z
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.00 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.10 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.20 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.30 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.90 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.00 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.10 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.20 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

2.00 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

3.00 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.40 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.50 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998
Table Lookup of a
Standard Normal Probability

P( 0  Z  1)  0. 3413

Z 0.00 0.01 0.02

0.00 0.0000 0.0040 0.0080


0.10 0.0398 0.0438 0.0478
0.20 0.0793 0.0832 0.0871

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.3686


1.20 0.3849 0.3869 0.3888
-3 -2 -1 0 1 2 3
Applying the Z Formula

X is normally distributed with  = 485, and  = 105


P(485  X  600)  P(0  Z  1.10)  .3643
For X = 485, Z 0.00 0.01 0.02
X -  485  485
Z=  0 0.00 0.0000 0.0040 0.0080
 105 0.10 0.0398 0.0438 0.0478

1.00 0.3413 0.3438 0.3461


For X = 600,
X -  600  485 1.10 0.3643 0.3665 0.3686
Z=   1.10
 105 1.20 0.3849 0.3869 0.3888
32
33
Thank You

34
Lecture 11: Python demo for Distribution

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES

1
Agenda
• Different numerical problems are solved for the following Distribution
using Python:
– Discrete
• Binomial
• Poisson
• Hyper geometric
– Continuous
• Uniform
• Exponential
• Normal

2
THANK YOU

3
Lecture 11: Sampling and Sampling Distribution
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES
IIT ROORKEE

1
Lecture Objectives
After completing this lecture, you should be able to:
• Describe a simple random sample and why sampling is important
• Explain the difference between descriptive and inferential statistics
• Define the concept of a sampling distribution
• Determine the mean and standard deviation for the sampling distribution
of the sample mean,

2
Lecture Objectives

• Describe the Central Limit Theorem and its importance


• Determine the mean and standard deviation for the sampling distribution
of the sample proportion,
• Describe sampling distributions of sample variances

3
Descriptive vs Inferential Statistics

• Descriptive statistics
– Collecting, presenting, and describing data
• Inferential statistics
– Drawing conclusions and/or making decisions concerning a population
based only on sample data

4
Populations and Samples

• A Population is the set of all items or individuals of interest


– Examples: All likely voters in the next election
All parts produced today
All sales receipts for November

• A Sample is a subset of the population


– Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Random receipts selected for audit

5
Population vs. Sample

• Population • Sample

a b cd b c
ef ghi jkl m n gi n
o pq rs t uv w o r u
x y z y

6
Why Sample?
• Less time consuming than a census
• Less costly to administer than a census
• It is possible to obtain statistical results of a sufficiently high precision
based on samples.
• Because the research process is sometimes destructive, the sample can
save product
• If accessing the population is impossible; sampling is the only option

7
Reasons for Taking a Census

• Eliminate the possibility that a random sample is not representative of the


population

• The person authorizing the study is uncomfortable with sample


information
Random Versus Nonrandom Sampling
• Random sampling
• Every unit of the population has the same probability of being included in the
sample.
• A chance mechanism is used in the selection process.
• Eliminates bias in the selection process
• Also known as probability sampling
• Nonrandom Sampling
• Every unit of the population does not have the same probability of being
included in the sample.
• Open the selection bias
• Not appropriate data collection methods for most statistical methods
• Also known as non-probability sampling
Random Sampling Techniques

• Simple Random Sample

• Stratified Random Sample

– Proportionate

– Disproportionate

• Systematic Random Sample

• Cluster (or Area) Sampling


Simple Random Samples

• Every object in the population has an equal chance of being selected


• Objects are selected independently
• Samples can be obtained from a table of random numbers or computer
random number generators
• A simple random sample is the ideal against which other sample methods
are compared

11
Simple Random Sample:
Numbered Population Frame

01 Andhra Pradesh 11 Madhya Pradesh


02 Himachal Pradesh 12 Uttar Pradesh
03 Gujrath 13 Bihar
04 Maharashtra 14 Rajasthan
05 Nagaland 15 J & K
06 Goa 16 Tamil Nadu
07 West bengal 17 Karantaka
08 Haryana 18 Kerala
09 Punjab 19 Orissa
10 Delhi 20 Manipur
Simple Random Sampling:
Random Number Table

9 9 4 3 7 8 7 9 6 1 4 5 7 3 7 3 7 5 5 2 9 7 9 6 9 3 9 0 9 4 3 4 4 7 5 3 1 6 1 8
5 0 6 5 6 0 0 1 2 7 6 8 3 6 7 6 6 8 8 2 0 8 1 5 6 8 0 0 1 6 7 8 2 2 4 5 8 3 2 6
8 0 8 8 0 6 3 1 7 1 4 2 8 7 7 6 6 8 3 5 6 0 5 1 5 7 0 2 9 6 5 0 0 2 6 4 5 5 8 7
8 6 4 2 0 4 0 8 5 3 5 3 7 9 8 8 9 4 5 4 6 8 1 3 0 9 1 2 5 3 8 8 1 0 4 7 4 3 1 9
6 0 0 9 7 8 6 4 3 6 0 1 8 6 9 4 7 7 5 8 8 9 5 3 5 9 9 4 0 0 4 8 2 6 8 3 0 6 0 6
5 2 5 8 7 7 1 9 6 5 8 5 4 5 3 4 6 8 3 4 0 0 9 9 1 9 9 7 2 9 7 6 9 4 8 1 5 9 4 1
8 9 1 5 5 9 0 5 5 3 9 0 6 8 9 4 8 6 3 7 0 7 9 5 5 4 7 0 6 2 7 1 1 8 2 6 4 4 9 3
Simple Random Sample:
Sample Members

01 Andhra Pradesh 11 Madhya Pradesh


02 Himachal Pradesh 12 Uttar Pradesh
03 Gujrath 13 Bihar
04 Maharashtra 14 Rajasthan
05 Nagaland 15 J & K
06 Goa 16 Tamil Nadu
07 West bengal 17 Karantaka
08 Haryana 18 Kerala
09 Punjab 19 Orissa
10 Delhi 20 Manipur

• N = 20
• n=4
Stratified Random Sample

• Population is divided into non-overlapping subpopulations called strata


• A random sample is selected from each stratum
• Potential for reducing sampling error
• Proportionate -- the percentage of these sample taken from each stratum
is proportionate to the percentage that each stratum is within the
population
• Disproportionate -- proportions of the strata within the sample are
different than the proportions of the strata within the population
Stratified Random Sample:
Population of FM Radio Listeners
Stratified by Age

20 - 30 years old
(homogeneous within)
(alike) Heterogeneous
(different)
30 - 40 years old between
(homogeneous within)
(alike) Heterogeneous
(different)
40 - 50 years old between
(homogeneous within)
(alike)
Systematic Sampling
• Convenient and relatively easy to
N
administer k = ,
n
• Population elements are an ordered
where:
sequence (at least, conceptually).
n = sample size
• The first sample element is selected
N = population size
randomly from the first k population
elements. k = size of selection interval

• Thereafter, sample elements are selected


at a constant interval, k, from the
ordered sequence frame.
Systematic Sampling: Example

• Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N =
10,000).
• A sample of fifty (n = 50) purchases orders is needed for an audit.
• k = 10,000/50 = 200
• First sample element randomly selected from the first 200 purchase
orders. Assume the 45th purchase order was selected.
• Subsequent sample elements: 245, 445, 645, . . .
Cluster Sampling

• Population is divided into non-overlapping clusters or areas

• Each cluster is a miniature of the population.

• A subset of the clusters is selected randomly for the sample.

• If the number of elements in the subset of clusters is larger than the


desired value of n, these clusters may be subdivided to form a new
set of clusters and subjected to a random selection process.
Cluster Sampling
 Advantages
• More convenient for geographically dispersed populations
• Reduced travel costs to contact sample elements
• Simplified administration of the survey
• Unavailability of sampling frame prohibits using other random
sampling methods
 Disadvantages
• Statistically less efficient when the cluster elements are similar
• Costs and problems of statistical analysis are greater than for simple
random sampling
Nonrandom Sampling
• Convenience Sampling: Sample elements are selected for the convenience
of the researcher

• Judgment Sampling: Sample elements are selected by the judgment of the


researcher

• Quota Sampling: Sample elements are selected until the quota controls are
satisfied

• Snowball Sampling: Survey subjects are selected based on referral from


other survey respondents
Errors
• Data from nonrandom samples are not appropriate for analysis by inferential
statistical methods.
• Sampling Error occurs when the sample is not representative of the
population
• Non-sampling Errors
• Missing Data, Recording, Data Entry, and Analysis Errors
• Poorly conceived concepts , unclear definitions, and defective questionnaires
• Response errors occur when people so not know, will not say, or overstate in their
answers
Sampling Distribution of x
Proper analysis and interpretation of a sample statistic
requires knowledge of its distribution.

Calculate x
to estimate 
Population Sample
 Process of x
Inferential Statistics
(parameter) (statistic)

Select a
random sample
Inferential Statistics

• Making statements about a population by examining sample results


Sample statistics Population parameters
(known) Inference (unknown, but can be estimated
from sample evidence)

Sample
Population

24
Inferential Statistics
Drawing conclusions and/or making decisions concerning a
population based on sample results.

• Estimation
– e.g., Estimate the population mean weight
using the sample mean weight
• Hypothesis Testing
– e.g., Use sample evidence to test the claim
that the population mean weight is 120
pounds

25
Sampling Distributions

• A sampling distribution is a distribution of all of the possible values of a


statistic for a given size sample selected from a population

26
Types of sampling distributions

Sampling
Distributions

Sampling Sampling Sampling


Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance

27
Sampling Distributions of Sample Means

Sampling
Distributions

Sampling Sampling Sampling


Distribution of Distribution of Distribution of
Sample Sample Proportion Sample Variance
Mean

28
Developing a Sampling Distribution

• Assume there is a population … A B C D

• Population size N=4


• Random variable, X,
is age of individuals
• Values of X:
18, 20, 22, 24 (years)

29
Developing a Sampling Distribution
(continued)

Summary Measures for the Population Distribution:

μ
 X i P(x)
N
.25
18  20  22  24
  21
4
0
18 20 22 24 x
σ
 (X i  μ) 2

 2.236
A B C D
N Uniform Distribution

30
Developing a Sampling Distribution
(continued)
Now consider all possible samples of size n = 2
1st 2nd Observation
Obs 18 20 22 24 16 Sample
18 18,18 18,20 18,22 18,24 Means
20 20,18 20,20 20,22 20,24
22 22,18 22,20 22,22 22,24 1st 2nd Observation
Obs 18 20 22 24
24 24,18 24,20 24,22 24,24
18 18 19 20 21
16 possible samples 20 19 20 21 22
(sampling with 22 20 21 22 23
replacement) 24 21 22 23 24

31
Developing a Sampling Distribution
(continued)

• Sampling Distribution of All Sample Means


16 Sample Means Sample Means Distribution
_
1st 2nd Observation P(X)
Obs 18 20 22 24 .3
18 18 19 20 21 .2
20 19 20 21 22 .1
22 20 21 22 23 _
0
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
32
Developing a Sampling Distribution
(continued)

• Summary Measures of this Sampling Distribution:

E(X) 
 X i

18  19  21   24
 21  μ
N 16

σX 
 ( X  μ)
i
2

N
(18 - 21)2  (19 - 21)2    (24 - 21)2
  1.58
16

33
Comparing the Population with its Sampling
Distribution
Population Sample Means Distribution
N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0 0 _
18 20 22 24 X 18 19 20 21 22 23 24 X
A B C D

34
1,800 Randomly Selected Values
from an Exponential Distribution

450
F
400
r
e 350
q 300
u 250
e 200
n 150
c 100
y
50
0
0 .5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
X
Means of 60 Samples (n = 2)
from an Exponential Distribution

F 9
r 8
e
77
q
u 66
e 55
n
44
c
y 33

22

11

00
0.00 0.25
0.00 0.25 0.50
0.50 0.75
0.75 1.00
1.00 1.25
1.25 1.50
1.50 1.75
1.75 2.00
2.00 2.25
2.25 2.50
2.50 2.75
2.75 3.00
3.00 3.25
3.25 3.50
3.50 3.75
3.75 4.00
4.00
xx
Means of 60 Samples (n = 5)
from an Exponential Distribution
10
F
r 9
e 8
q 7
u
6
e
n 5
c 4
y 3
2
1
0
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
x
Means of 60 Samples (n = 30)
from an Exponential Distribution
16
F
14
r
e 12
q
10
u
e 8
n
c 6
y 4

0
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
x
1,800 Randomly Selected Values
from a Uniform Distribution

F 250
250
r
e 200
200
q
u 150
150
e
n 100
100
c
y 50
50

00
0.0
0.0 0.5
0.5 1.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

X-bar
Means of 60 Samples (n = 2)
from a Uniform Distribution

F 10
r 9
e 8
q 7
u
6
e
n 5
c 4
y 3
2
1
0
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25

x
Means of 60 Samples (n = 5)
from a Uniform Distribution

12

10

F
r 8
e
q 6
u
e 4
n
c 2
y
0
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25
x
Means of 60 Samples (n = 30)
from a Uniform Distribution
25

20
F
r
15
e
q
u 10
e
n 5
c
y
0
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25
x
Expected Value of Sample Mean

• Let X1, X2, . . . Xn represent a random sample from a population

• The sample mean value of these observations is defined as

1 n
X   Xi
n i1

43
Standard Error of the Mean
• Different samples of the same size from the same population will yield
different sample means
• A measure of the variability in the mean from sample to sample is given by
the Standard Error of the Mean:

σ
σX 
n
• Note that the standard error of the mean decreases as the sample size
increases

44
If sample values are not independent
(continued)

• If the sample size n is not a small fraction of the population size N,


then individual sample members are not distributed independently
of one another
• Thus, observations are not selected independently
• A correction is made to account for this:

σ2 N  n σ Nn
Var(X)  or σX 
n N 1 n N 1

45
If the Population is Normal

• If a population is normal with mean μ and standard deviation σ, the


sampling distribution of X is also normally distributed with

σ
μX  μ σX 
and n

• If the sample size n is not large relative to the population size N, then

μX  μ and
σX 
σ Nn
n N 1

46
Z-value for Sampling Distribution of the Mean

• Z-value for the sampling distribution of :


( X  μ)
Z
σX
where: X = sample mean
μ = population mean
σ x = standard error of the mean

47
Sampling Distribution Properties

Normal Population
Distribution
μx  μ
μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
x
48
Sampling Distribution Properties

• For sampling with replacement:

As n increases,
σ x decreases Larger sample
Smaller sample size
size

x
μ
49
If the Population is not Normal- Central Limit Theorem
We can apply the Central Limit Theorem:

– Even if the population is not normal,


– sample means from the population will be approximately normal as
long as the sample size is large enough.
Properties of the sampling distribution:

σ
μx  μ And σx 
n
50
Central Limit Theorem

n
the sampling
As the sample distribution becomes
size gets large almost normal
enough… regardless of shape of
population

51
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution
properties:

Central Tendency
μx  μ
μ x
Variation Sampling Distribution (becomes normal as n increases)
σ
σx  Larger
n Smaller sample
sample
size
size
μx x
52
How Large is Large Enough?

• For most distributions, n > 25 will give a sampling distribution that is


nearly normal
• For normal population distributions, the sampling distribution of the mean
is always normally distributed

53
Example

• Suppose a large population has mean μ = 8 and standard deviation σ = 3.


Suppose a random sample of size n = 36 is selected.

• What is the probability that the sample mean is between 7.8 and 8.2?

54
Example

Solution:
• Even if the population is not normally distributed, the central limit
theorem can be used (n > 25)
• … so the sampling distribution of x is approximately normal
• … with mean μx = 8
• …and standard deviation
σ 3
σx    0.5
n 36
55
Example (continued)
Solution (continued)
 
 7.8 - 8 μX -μ 8.2 - 8 
P(7.8  μ X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.5  Z  0.5)  0.3830
Sampling Standard Normal
Distribution Distribution .1915
??? +.1915
? ??
? ? Sample Standardize
?? ?
?
-0.5 0.5
μ8 X 7.8
μX  8
8.2
x μz  0 Z
56
Distribution of Sample Mean, proportion,
and variance
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
2
Acceptance Intervals
Goal: determine a range within which sample means are likely to occur, given a
population mean and variance
• By the Central Limit Theorem, we know that the distribution of X is
approximately normal if n is large enough, with mean μ and standard
deviation
• Let zα/2 be the z-value that leaves area α/2 in the upper tail of the normal
distribution (i.e., the interval - zα/2 to zα/2 encloses probability 1– α)
• Then
μ  z/2σ X
is the interval that includes X with probability 1 – α
3
Sampling Distributions of Sample Proportions

Sampling
Distributions

Sampling Sampling Sampling


Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance

4
Sampling Distributions of Sample Proportions
P = the proportion of the population having some characteristic
• Sample proportion (p̂) provides an estimate of P:

X number of items in the sample having the characteristic of interest


pˆ  
n sample size
• 0 ≤ p̂ ≤ 1
• p̂ has a binomial distribution, but can be approximated by a normal
distribution when nP(1 – P) > 5

5
^
Sampling Distribution of p

• Normal approximation:
Sampling Distribution
P(Pˆ )
.3
.2
Properties: E(pˆ )  P
.1
0
0 .2 .4 .6 8 1
(where P = population proportion)
 X  P(1 P)
And σ p2ˆ  Var   
n n

6
7
Z-Value for Proportions

Standardize p̂ to a Z value with the formula:

pˆ  P pˆ  P
Z 
σ pˆ P(1 P)
n

8
Example

• If the true proportion of voters who support Proposition A is


P = .4, what is the probability that a sample of size 200 yields
a sample proportion between .40 and .45?
• i.e.:
if P = .4 and n = 200, what is
P(.40 ≤ p̂ ≤ .45) ?

9
Example (continued)

• if P = .4 and n = 200, what is


P(.40 ≤ p̂ ≤ .45) ?

Find: σ pˆ P(1  P) .4(1  .4)


σ p̂    .03464
n 200

Convert to  .40  .40 .45  .40 


P(.40  pˆ  .45)  P Z 
standard  .03464 .03464 
normal:  P(0  Z  1.44)

10
Example
(continued)

if P = .4 and n = 200, what is P(.40 ≤ p̂ ≤ .45) ?


Use standard normal table: P(0 ≤ Z ≤ 1.44) = .4251
Standardized
Sampling Distribution Normal Distribution

.4251

Standardize

.40 .45 p̂ 0 1.44


Z
11
Sampling Distributions of Sample Variance

Sampling
Distributions

Sampling Sampling Sampling


Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance

12
Sample Variance
• Let x1, x2, . . . , xn be a random sample from a population. The
sample variance is
1 n
s 
2
 i
n  1 i1
(x  x) 2

• the square root of the sample variance is called the sample


standard deviation
• the sample variance is different for different random samples from
the same population

13
Sampling Distribution of Sample Variances
• The sampling distribution of s2 has mean σ2

E(s2 )  σ 2

• If the population distribution is normal then


(n - 1)s2
σ2
has a 2 distribution with n – 1 degrees of freedom

14
15
The Chi-square Distribution

• The chi-square distribution is a family of distributions, depending on


degrees of freedom: d.f. = n – 1

0 4 8 12 16 20 24 28 2 0 4 8 12 16 20 24 28 2 0 4 8 12 16 20 24 28 2

16
Degrees of Freedom (df)
Idea: Number of observations that are free to vary after sample
mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
If the mean of these three values is 8.0,
Let X1 = 7 then X3 must be 9
Let X2 = 8 (i.e., X3 is not free to vary)
What is X3?

Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2


(2 values can be any numbers, but the third is not free to vary for a
given mean)
17
Chi-square Example

• A commercial freezer must hold a selected temperature with little


variation. Specifications call for a standard deviation of no more than 4
degrees (a variance of 16 degrees2).
• A sample of 14 freezers is to be tested
• What is the upper limit (K) for the sample variance such that the
probability of exceeding this limit, given that the population standard
deviation is 4, is less than 0.05?

18
Finding the Chi-square Value
(n  1)s2
χ 
2
Is chi-square distributed with (n – 1) = 13
σ 2
degrees of freedom
• Use the the chi-square distribution with area 0.05 in the
upper tail:
213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.)

probability
α = .05

2
213 = 22.36
19
Chi-square Example
(continued)

213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.)


 (n  1)s 2 2 
P(s  K)  P
2
 χ13   0.05
So:  16 
(n  1)K
or  22.36 (where n = 14)
16

(22.36)(16)
so K  27.52
(14  1)

If s2 from the sample of size n = 14 is greater than 27.52, there is


strong evidence to suggest the population variance exceeds 16.
20
Summary
• Introduced sampling distributions
• Described the sampling distribution of sample means
– For normal populations
– Using the Central Limit Theorem
• Described the sampling distribution of sample proportions
• Introduced the chi-square distribution
• Examined sampling distributions for sample variances
• Calculated probabilities using sampling distributions
21
Thank You

22
Confidence Interval Estimation: Single
Population
Dr. A. Ramesh
Department of Management Studies
IIT ROORKEE

1
Goals
After completing this lecture, you should be able to:
• Distinguish between a point estimate and a confidence interval estimate
• Construct and interpret a confidence interval estimate for a single
population mean using both the Z and t distributions
• Form and interpret a confidence interval estimate for a single population
proportion
• Create confidence interval estimates for the variance of a normal
population

2
Confidence Intervals
• Confidence Intervals for the Population Mean, μ
– when Population Variance σ2 is Known
– when Population Variance σ2 is Unknown
• Confidence Intervals for the Population Proportion, p̂ (large samples)
• Confidence interval estimates for the variance of a normal population

3
Definitions

• An estimator of a population parameter is


– a random variable that depends on sample information . . .
– whose value provides an approximation to this unknown parameter

• A specific value of that random variable is called an estimate

4
Point and Interval Estimates

• A point estimate is a single number,


• a confidence interval provides additional information about
variability

Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
5
Point Estimates

We can estimate a with a Sample


Population Parameter … Statistic
(a Point Estimate)

Mean μ x
Proportion P p̂

6
Unbiasedness

• A point estimator θ̂ is said to be an unbiased estimator of the


parameter  if the expected value, or mean, of the sampling
distribution of θ̂ is ,

E(θˆ )  θ
• Examples:
– The sample mean x is an unbiased estimator of μ
– The sample variance s2 is an unbiased estimator of σ2
– The sample proportion p̂ is an unbiased estimator of P

7
Unbiasedness
(continued)
• θ̂1 is an unbiased estimator, θ̂2 is biased:

θ̂1 θ̂2

θ θ̂
8
Bias

• Let θ̂ be an estimator of 

• The bias in θ̂ is defined as the difference between its mean and 

Bias(θˆ )  E(θˆ )  θ
• The bias of an unbiased estimator is 0

9
Most Efficient Estimator
• Suppose there are several unbiased estimators of 
• The most efficient estimator or the minimum variance unbiased estimator
of  is the unbiased estimator with the smallest variance
• Let θ̂1 and θ̂2 be two unbiased estimators of , based on the same number
of sample observations. Then,
– θ̂1 is said to be more efficient than θ̂2 if Var(θˆ 1 )  Var(θˆ 2 )

– The relative efficiency of θ̂1 with respect to θ̂2 is the ratio of


their variances:
Var( θˆ 2 )
Relative Efficiency 
Var( θˆ )
1

10
Confidence Intervals

• How much uncertainty is associated with a point estimate of a population


parameter?

• An interval estimate provides more information about a population


characteristic than does a point estimate

• Such interval estimates are called confidence intervals

11
Confidence Interval Estimate

• An interval gives a range of values:


– Takes into consideration variation in sample statistics from sample to
sample
– Based on observation from 1 sample
– Gives information about closeness to unknown population
parameters
– Stated in terms of level of confidence
• Can never be 100% confident

12
Confidence Interval and Confidence Level

• If P(a <  < b) = 1 -  then the interval from a to b is called a 100(1 -


)% confidence interval of .
• The quantity (1 - ) is called the confidence level of the interval (
between 0 and 1)

– In repeated samples of the population, the true value of the


parameter  would be contained in 100(1 - )% of intervals
calculated this way.
– The confidence interval calculated in this manner is written as a <  <
b with 100(1 - )% confidence

13
Estimation Process

Random Sample I am 95% confident


that μ is between 40
Population & 60.
Mean
(mean, μ, is X = 50
unknown)

Sample

14
Confidence Level, (1-)
(continued)
• Suppose confidence level = 95%
• Also written (1 - ) = 0.95
• A relative frequency interpretation:
– From repeated samples, 95% of all the confidence intervals that can
be constructed will contain the unknown true parameter
• A specific interval either will contain or will not contain the true
parameter

15
General Formula

• The general formula for all confidence intervals is:

Point Estimate  (Reliability Factor)(Standard Error)

• The value of the reliability factor depends on the desired level of confidence

16
Confidence Intervals

Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

17
Confidence Interval for μ (σ2 Known)
• Assumptions
– Population variance σ2 is known
– Population is normally distributed
– If population is not normal, use large sample
• Confidence interval estimate:
σ σ
x  z α/2  μ  x  z α/2
n n
(where z/2 is the normal distribution value for a probability of /2 in each tail)

18
Margin of Error
• The confidence interval,
σ σ
x  z α/2  μ  x  z α/2
n n

• Can also be written as x  ME


where ME is called the margin of error

σ
ME  z α/2
n

19
Reducing the Margin of Error

σ
ME  z α/2
n
The margin of error can be reduced if

• the population standard deviation can be reduced (σ↓)

• The sample size is increased (n↑)

• The confidence level is decreased, (1 – ) ↓

20
Finding the Reliability Factor, z/2
• Consider a 95% confidence interval:
1    .95

α α
 .025  .025
2 2

Z units: z = -1.96 0 z = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit

 Find z.025 = 1.96 from the standard normal distribution table


21
Common Levels of Confidence

• Commonly used confidence levels are 90%, 95%, and 99%

Confidence
Confidence
Coefficient, Z/2 value
Level
1 
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.58
99.8% .998 3.08
99.9% .999 3.27
22
Intervals and Level of Confidence
Sampling Distribution of the Mean
/2 1  /2

Intervals
x
μx  μ
extend from 100(1-)%
x1
of intervals
σ
LCL  x  z x2 constructed
n contain μ;
to
σ 100()% do
UCL  x  z not.
n
Confidence Intervals
23
Example

• A sample of 11 circuits from a large normal population has a mean


resistance of 2.20 ohms. We know from past testing that the population
standard deviation is 0.35 ohms.

• Determine a 95% confidence interval for the true mean resistance of the
population.

24
Example
(continued)

• A sample of 11 circuits from a large normal population has a mean resistance


of 2.20 ohms. We know from past testing that the population standard
deviation is .35 ohms.
σ
x z
• Solution: n

 2.20  1.96 (.35/ 11)

 2.20  .2068

1.9932  μ  2.4068

25
Interpretation

• We are 95% confident that the true mean resistance is


between 1.9932 and 2.4068 ohms
• Although the true mean may or may not be in this interval,
95% of intervals formed in this manner will contain the true
mean

26
Confidence Intervals

Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

27
Confidence Interval Estimation: Single
Population-II
Dr. A. Ramesh
Department of Management Studies
IIT ROORKEE

1
Student’s t Distribution
• Consider a random sample of n observations
– with mean x and standard deviation s
– from a normally distributed population with mean μ

• Then the variable x μ


t
s/ n

follows the Student’s t distribution with (n - 1) degrees of freedom

2
Confidence Interval for μ (σ2 Unknown)

• If the population standard deviation σ is unknown, we can substitute


the sample standard deviation, s
• This introduces extra uncertainty, since s is variable from sample to
sample
• So we use the t distribution instead of the normal distribution

3
Confidence Interval for μ (σ Unknown)
(continued)
• Assumptions
– Population standard deviation is unknown
– Population is normally distributed
– If population is not normal, use large sample
• Use Student’s t Distribution
• Confidence Interval Estimate:
S S
x  t n-1,α/2  μ  x  t n-1,α/2
n n

where tn-1,α/2 is the critical value of the t distribution with n-1 d.f. and an area of α/2 in each tail

4
Margin of Error
• The confidence interval,
S S
x  t n-1,α/2  μ  x  t n-1,α/2
n n

• Can also be written as


x  ME
where ME is called the margin of error:

σ
ME  t n-1,α/2
n

5
Student’s t Distribution

• The t is a family of distributions


• The t value depends on degrees of freedom (d.f.)
– Number of observations that are free to vary after sample mean has
been calculated
d.f. = n - 1

6
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
7
Student’s t Table

Upper Tail Area


Let: n = 3
df .10 .05 .025 df = n - 1 = 2
 = .10
1 3.078 6.314 12.706 /2 =.05
2 1.886 2.920 4.303
3 1.638 2.353 3.182 /2 = .05

The body of the table


contains t values, not
probabilities
0 2.920 t
8
t distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) ____

.80 1.372 1.325 1.310 1.282


.90 1.812 1.725 1.697 1.645
.95 2.228 2.086 2.042 1.960
.99 3.169 2.845 2.750 2.576

Note: t Z as n increases
9
Example

A random sample of n = 25 has x = 50 and s = 8. Form a 95%


confidence interval for μ
t n1,α/2  t 24,.025  2.0639
– d.f. = n – 1 = 24, so

The confidence interval is


S S
x  t n-1,α/2  μ  x  t n-1,α/2
n n
8 8
50  (2.0639)  μ  50  (2.0639)
25 25
46.698  μ  53.302
10
Confidence Intervals

Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

11
Confidence Intervals for the
Population Proportion

• An interval estimate for the population proportion ( P ) can


be calculated by adding an allowance for uncertainty to the
sample proportion ( p̂ )

12
Confidence Intervals for the Population
Proportion, p
(continued)

• Recall that the distribution of the sample proportion is approximately


normal if the sample size is large, with standard deviation

P(1 P)
σP 
n
• We will estimate this with sample data:
pˆ (1 pˆ )
n

13
Confidence Interval Endpoints

• Upper and lower confidence limits for the population proportion are
calculated with the formula

pˆ (1 pˆ ) ˆ (1 pˆ )
p
pˆ  z α/2  P  pˆ  z α/2
n n
• where
– z/2 is the standard normal value for the level of confidence desired
– p̂ is the sample proportion
– n is the sample size
– nP(1−P) > 5

14
Example

• A random sample of 100 people shows that 25 are left-


handed.
• Form a 95% confidence interval for the true proportion of
left-handers

15
Example (continued)

• A random sample of 100 people shows that 25 are left-handed. Form a


95% confidence interval for the true proportion of left-handers.

ˆ ˆ ˆ ˆ
ˆp  z α/2 p(1 p)  P  pˆ  z α/2 p(1 p)
n n
25 .25(.75) 25 .25(.75)
 1.96  P   1.96
100 100 100 100
0.1651  P  0.3349

16
Interpretation

• We are 95% confident that the true percentage of left-handers in the


population is between
16.51% and 33.49%.

• Although the interval from 0.1651 to 0.3349 may or may not contain the true
proportion, 95% of intervals formed from samples of size 100 in this manner
will contain the true proportion.

17
Confidence Intervals

Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

18
Confidence Intervals for the Population
Variance

 Goal: Form a confidence interval for the population variance, σ2

• The confidence interval is based on the sample variance, s2

• Assumed: the population is normally distributed

19
Confidence Intervals for the Population Variance
(continued)

The random variable


(n  1)s2
 n21 
σ2
follows a chi-square distribution with (n – 1)
degrees of freedom

20
Confidence Intervals for the Population Variance

The (1 - )% confidence interval for the population variance is

(n  1)s2 (n  1)s 2
 σ 2
 2
χn1, α/2
2
χn1, 1 - α/2

21
Example

You are testing the speed of a batch of computer processors. You


collect the following data (in Mhz):

Sample size 17
Sample mean 3004
Sample std dev 74

Assume the population is normal. Determine the 95%


confidence interval for σx2

22
Finding the Chi-square Values

• n = 17 so the chi-square distribution has (n – 1) = 16 degrees of


freedom
•  = 0.05, so use the the chi-square values with area 0.025 in each tail:
χn21, α/2  χ16
2
, 0.025  28.85

probability probability
χ 2
n 1, 1 - α/2 χ 2
16 , 0.975  6.91 α/2 = .025 α/2 = .025

216
216 = 6.91 216 = 28.85

23
Calculating the Confidence Limits

• The 95% confidence interval is


(n  1)s2 (n  1)s 2
 σ 2
 2
χn1, α/2
2
χn1, 1 - α/2

(17  1)(74)2 (17  1)(74)2


σ 
2

28.85 6.91
3037  σ 2  12683
Converting to standard deviation, we are 95% confident that the population standard
deviation of CPU speed is between 55.1 and 112.6 Mhz

24
Finite Populations

• If the sample size is more than 5% of the population size (and


sampling is without replacement) then a finite population correction
factor must be used when calculating the standard error

25
Finite Population Correction Factor

• Suppose sampling is without replacement and the sample size is large


relative to the population size
• Assume the population size is large enough to apply the central limit
theorem
• Apply the finite population correction factor when estimating the
population variance

Nn
finite population correction factor 
N 1

26
Estimating the Population Mean

• Let a simple random sample of size n be taken from a population


of N members with mean μ
• The sample mean is an unbiased estimator of the population mean
μ
• 1 n
x   xi
The point estimate is:
n i1

27
Finite Populations: Mean

• If the sample size is more than 5% of the population size, an unbiased


estimator for the variance of the sample mean is

Nn
2
ˆ  s
σ  
2

 N 1 
x
n
• So the 100(1-α)% confidence interval for the population mean is

ˆ x  μ  x  t n-1,α/2σ
x - t n-1,α/2σ ˆx

28
Estimating the Population Proportion

• Let the true population proportion be P


• Let p̂ be the sample proportion from n observations from a simple
random sample
• The sample proportion, p̂ , is an unbiased estimator of the population
proportion, P

29
Finite Populations: Proportion

• If the sample size is more than 5% of the population size, an unbiased


estimator for the variance of the population proportion is
ˆ (1- pˆ )  N  n 
p
ˆ 
σ 2
pˆ  
n  N 1 
• So the 100(1-α)% confidence interval for the population proportion is

pˆ - zα/2σ
ˆ pˆ  P  pˆ  zα/2σ
ˆ pˆ

30
Lecture Summary
• Introduced the concept of confidence intervals
• Discussed point estimates
• Developed confidence interval estimates
• Created confidence interval estimates for the mean (σ2
known)
• Introduced the Student’s t distribution
• Determined confidence interval estimates for the mean (σ2
unknown)

31
Lecture Summary
(continued)
• Created confidence interval estimates for the proportion
• Created confidence interval estimates for the variance of a normal
population
• Applied the finite population correction factor to form confidence
intervals when the sample size is not small relative to the population size

32
Summary
• Introduced sampling distributions
• Described the sampling distribution of sample means
– For normal populations
– Using the Central Limit Theorem
• Described the sampling distribution of sample proportions
• Introduced the chi-square distribution
• Examined sampling distributions for sample variances
• Calculated probabilities using sampling distributions
33
Thank You

34
Hypothesis Testing
Class Objectives

• Developing Null and Alternative Hypotheses

• Type I and Type II Errors- Explanation

• Population Mean: Sigma Known

• Population Mean: Sigma Unknown

• Population Proportion
Hypothesis Testing

• Hypothesis testing can be used to determine whether a statement about


the value of a population parameter should or should not be rejected.

• The null hypothesis, denoted by H0 , is a tentative assumption about a


population parameter

• The alternative hypothesis, denoted by Ha, is the opposite of what is stated


in the null hypothesis

• The hypothesis testing procedure uses data from a sample to test the two
competing statements indicated by H0 and Ha.
Developing Null and Alternative Hypotheses

• It is not always obvious how the null and alternative hypotheses should be
formulated

• Care must be taken to structure the hypotheses appropriately so that the test
conclusion provides the information the researcher wants

• The context of the situation is very important in determining how the hypotheses
should be stated

• In some cases it is easier to identify the alternative hypothesis first. In other


cases the null is easier

• Correct hypothesis formulation will take practice


Developing Null and Alternative Hypotheses

Alternative Hypothesis as a Research Hypothesis


•Many applications of hypothesis testing involve an attempt to gather evidence in support of a research
hypothesis

• In such cases, it is often best to begin with the alternative hypothesis and make it the conclusion that
the researcher hopes to support

• The conclusion that the research hypothesis is true is made if the sample data provide sufficient
evidence to show that the null hypothesis can be rejected
Developing Null and Alternative Hypotheses

Alternative Hypothesis as a Research Hypothesis

• Example: A new manufacturing method is believed to be better than the current method.

• Alternative Hypothesis:

– The new manufacturing method is better.

• Null Hypothesis:

– The new method is no better than the old method.


Developing Null and Alternative Hypotheses

• Alternative Hypothesis as a Research Hypothesis

• Example: A new bonus plan, that is developed in an attempt to increase sales

• Alternative Hypothesis:

– The new bonus plan increase sales

• Null Hypothesis:

– The new bonus plan does not increase sales


Developing Null and Alternative Hypotheses

• Alternative Hypothesis as a Research Hypothesis

• Example:

– A new drug is developed with the goal of lowering Cholesterol-level more


than the existing drug

• Alternative Hypothesis:

– The new drug lowers Cholesterol-level more than the existing drug

• Null Hypothesis:

– The new drug does not lower Cholesterol-level more than the existing
drug
Developing Null and Alternative Hypotheses

• Null Hypothesis as an assumption to be challenged

• We might begin with a belief or assumption that a statement about the value of a population
parameter is true

• We then using a hypothesis test to challenge the assumption and determine if there is statistical
evidence to conclude that the assumption is incorrect

• In these situations, it is helpful to develop the null hypothesis first


Developing Null and Alternative Hypotheses

• Null Hypothesis as an Assumption to be Challenged

• Example:

– The label on a milk bottle states that it contains 1000 ml

• Null Hypothesis:

– The label is correct. µ > 1000 ml

• Alternative Hypothesis:

– The label is incorrect. µ < 1000 ml


Null and Alternative Hypotheses about a Population Mean 

• The equality part of the hypotheses always appears in the null hypothesis

• In general, a hypothesis test about the value of a population mean  must take one of the following
three forms (where 0 is the hypothesized value of the population mean)

One-tailed One-tailed Two-tailed


(lower-tail) (upper-tail)
Null and Alternative Hypotheses
• A major hospital in Chennai provides
one of the most comprehensive
emergency medical services in the
world
• Operating in a multiple hospital
system with approximately 10 mobile
medical units, the service goal is to
respond to medical emergencies with
a mean time of 8 minutes or less
• The director of medical services
wants to formulate a hypothesis test
that could use a sample of
emergency response times to
determine whether or not the
service goal of 8 minutes or less is
being achieved.
Null and Alternative Hypotheses

The emergency service is meeting the response


H0:   8 goal; no follow-up action is necessary.

The emergency service is not meeting the


Ha:   8 response goal; appropriate follow-up action is
necessary.

where:  = mean response time for the population


of medical emergency requests
Type I Error

• Because hypothesis tests are based on sample data, we must allow for the
possibility of errors

• A Type I error is rejecting H0 when it is true

• The probability of making a Type I error when the null hypothesis is called
the level of significance

• Applications of hypothesis testing that only control the Type I error are
often called significance tests
Type II Error

• A Type II error is accepting H0 when it is false.

• It is difficult to control for the probability of making a Type II error.

• Statisticians avoid the risk of making a Type II error by using “do not reject H0” and not “accept H0”.
Type I and Type II Errors

Population Condition

H0 True H0 False
Conclusion ( < 8) (  8)

Accept H0 Correct
Type II Error
(Conclude  < 8) Decision

Reject H0 Correct
Type I Error
(Conclude  > 8) Decision
Three Approaches for Hypothesis Testing

• P- Value

• Critical Value

• Confidence Interval Value


p-Value Approach to One-Tailed Hypothesis Testing

• The p-value is the probability, computed using the test statistic, that measures the support (or lack of

support) provided by the sample for the null hypothesis

• If the p-value is less than or equal to the level of significance  , the value of the test statistic is in the

rejection region

• Reject H0 if the p-value < 


Lower-Tailed Test About a Population Mean: s Known

p-Value Approach p-Value <  ,


so reject H0.

 = .10 Sampling
distribution
of
p-value
 72

z
z = -za = 0
-1.46 -1.28
p-Value Approach
Upper-Tailed Test About a Population Mean :s Known

p-Value Approach p-Value <  ,


so reject H0.
Sampling
distribution
of  = .04

p-Value
 11

z
0 z = z=
1.75 2.29
p-Value Approach
Critical Value Approach to One-Tailed Hypothesis Testing
• The test statistic z has a standard
normal probability distribution.
• We can use the standard normal
probability distribution table to
find the z-value with an area of 
in the lower (or upper) tail of the
distribution.
• The value of the test statistic that
established the boundary of the
rejection region is called the
critical value for the test.
• The rejection rule is:
Lower tail: Reject H0 if z < -z
Upper tail: Reject H0 if z > z
Lower-Tailed Test About a Population Mean: s Known

Critical Value Approach

Sampling
distribution
of
Reject H0

  1
Do Not Reject H0

z
-z = -1.28 0
Upper-Tailed Test About a Population Mean: s Known
Critical Value Approach

Sampling
distribution
of
Reject H0

  
Do Not Reject H0

z
0 z = 1.645
Steps of Hypothesis Testing – P value approach

• Step 1. Develop the null and alternative hypotheses.

• Step 2. Specify the level of significance .

• Step 3. Collect the sample data and compute the test statistic.

• p-Value Approach

• Step 4. Use the value of the test statistic to compute the p-value.

• Step 5. Reject H0 if p-value < .


Steps of Hypothesis Testing

Critical Value Approach

•Step 4. Use the level of significance  to determine the critical value and

the rejection rule.

•Step 5. Use the value of the test statistic and the rejection rule to determine

whether to reject H0.


Hypothesis Testing

1
Class Objectives

• Population Mean: Sigma Known –Example

2
One-Tailed Tests About a Population Mean: s Known

• Example: The mean response times for a random sample


of 30 Pizza Deliveries is 32 minutes
• The population standard deviation is believed to be 10
minutes.
• The pizza delivery services director wants to perform a
hypothesis test, with a =0.05 level of significance, to
determine whether the service goal of 30 minutes or less
is being achieved.

3
Given Values

• Sample • Population
• Sample mean = 32 Min • a =0.05
• Sample size = 30 • Population mean = 30 Min

4
p -Value Approach

5
One-Tailed Tests About a Population Mean:
s Known
1. Develop the hypotheses.
2. Specify the level of significance. H0: 30
3. Compute the value of the test statistic. Ha:30
a = .05

x 32  30
z   1.09
s / n 10 / 30

6
7
One-Tailed Tests About a Population Mean: s Known

p –Value Approach
4. Compute the p –value.

For z = 1.09, p–value = = 0.137

5. Determine whether to reject H0.

• Because p–value = 0.137 > a = .05 , we do not reject H0.

• There are not sufficient statistical evidence to infer that Pizza delivery services is not meeting the response
goal of 30 minutes.

8
One-Tailed Tests About a Population Mean: s Known
p –Value Approach

Sampling
distribution a = .05
of

p-value
0.137

z
z = za =
0 1.09 1.645

9
Critical Value Approach

10
One-Tailed Tests About a Population Mean: s Known

Critical Value Approach


4. Determine the critical value and rejection rule.

– For a = .05, z.05 = 1.645

– Reject H0 if z > 1.645

5. Determine whether to reject H0.

– Because 1.645 > 1.05, we do not reject H0.

11
p-Value Approach to Two-Tailed Hypothesis Testing

12
Compute the p-value using the following three steps:

1. Compute the value of the test statistic z.

2. If z is in the upper tail (z > 0), find the area under the standard normal curve to the right of z.

3. If z is in the lower tail (z < 0), find the area under the standard normal curve to the left of z.

4. Double the tail area obtained in step 2 to obtain the p –value.

The rejection rule:

Reject H0 if the p-value < a .

13
Critical Value Approach to Two-Tailed Hypothesis Testing

• The critical values will occur in both the lower and upper tails of the standard normal curve.

• Use the standard normal probability distribution table to find za/2 (the z-value with an area of a/2
in the upper tail of the distribution).

• The rejection rule is:

Reject H0 if z < -za/2 or z > za/2.

14
Two-Tailed Tests About a Population Mean:
s Known

• Example: Milk Carton


• Assume that a sample of 30 milk carton provides a sample mean of 505 ml.
• The population standard deviation is believed to be 10 ml.
• Perform a hypothesis test, at the 0.03 level of significance, population
mean 500 ml and to help determine whether the filling process should
continue operating or be stopped and corrected.

15
Given Values

• Sample • Population
• Sample size = 30 • Population mean = 500 ml
• Sample mean = 505 ml • Standard deviation = 10 ml
• Significance level 0.03

16
p –Value approach

17
Two-Tailed Tests About a Population Mean:
s Known
1. Determine the hypotheses.
2. Specify the level of significance.
3. Compute the value of the test statistic.

a = .03

x   505  500
z   2.74
s / n 10 / 30

18
19
Two-Tailed Tests About a Population Mean:
s Known
p –Value Approach
4. Compute the p –value.
– For z = 2.74, p–value = 2(1 - .9969) = .0061

5. Determine whether to reject H0.


– Because p–value = .0062 < a = .03, we reject H0.

There is no sufficient statistical evidence to infer that the null hypothesis is true (i.e. the mean filling
quantity is not 500 ml)

20
Two-Tailed Tests About a Population Mean: s Known

p-Value Approach

1/2 1/2
p -value p -value
= .0031 = .0031

a/2 = a/2 =
.015 .015

z
z = -2.74 0 z = 2.74
-za/2 = -2.17 za/2 = 2.17

21
Critical Value Approach

22
Two-Tailed Tests About a Population Mean :s Known

• Critical Value Approach


4. Determine the critical value and rejection rule, for a/2 = .03/2 = .015, z.015 = 2.17

Reject H0 if z < -2.17 or z > 2.17

5. Determine whether to reject H0.

Because 2.74 > 2.17, we reject H0.

There is sufficient statistical evidence to infer that the null hypothesis is not true

23
24
Two-Tailed Tests About a Population Mean :s Known

Critical Value Approach


Sampling
distribution
x   505  500
z   2.74 of
s / n 10 / 30

Reject H0 Do Not Reject H0 Reject H0


a/2 = .015 a/2 = .015

z
-2.17 0 2.17

25
Confidence Interval Approach

26
Confidence Interval Approach to
Two-Tailed Tests About a Population Mean

• Select a simple random sample from the population and use the value of the sample mean to
develop the confidence interval for the population mean .

• If the confidence interval contains the hypothesized value 500, do not reject H0.

• Otherwise, reject H0.

• Actually, H0 should be rejected if 0 happens to be equal to one of the end points of the confidence
interval.

27
Confidence Interval Approach to Two-Tailed Tests About a Population Mean
The 97% confidence interval for 500 is

5 5 3.9619
501.03814 ,508.96186

Because the hypothesized value for the population mean, 0 = 500ml, is not in this interval, the
hypothesis-testing conclusion is that the null hypothesis, H0:  = 500, is rejected.

28
Thanks

29
Hypothesis Testing-III

1
Tests About a Population Mean:s Unknown
• Test Statistic

This test statistic has a t distribution with n - 1 degrees of freedom.

2
Tests About a Population Mean:s Unknown

Rejection Rule: p -Value Approach


Reject H0 if p –value < 
Rejection Rule: Critical Value Approach
H0:  Reject H0 if t < -t

H0:  Reject H0 if t > t

H0:  Reject H0 if t < - t or t > t

3
4
One-Tailed Test About a Population Mean: s Unknown
Example: Ice Cream Demand
Day No. of Ice- Day No. of Ice-
• In a ice cream parlor at IIT Roorkee, the following data cream cream
Sold Sold
represent the number of ice-creams sold in 20 days
1 13 11 12
2 8 12 11
• Test hypothesis H0:  < 10 3 10 13 11
4 10 14 12
• Use = .05 to test the hypothesis. 5 8 15 10
6 9 16 12
7 10 17 7
8 11 18 10
9 6 19 11
10 8 20 8

5
Given Data

6
7
One-Tailed Test About a Population Mean:
s Unknown

Reject H0

Do Not Reject H0


t
0

8
Hypothesis Testing – proportion

9
Null and Alternative Hypotheses: Population Proportion

• The equality part of the hypotheses always appears in the null hypothesis.

• In general, a hypothesis test about the value of a population proportion p must take one of the
following three forms (where p0 is the hypothesized value of the population proportion).

H0: p > p0 H0: p < p0 H0: p = p0


H a : p < p0 H a : p > p0 H a : p ≠ p0

One-tailed One-tailed
(lower tail) (upper tail) Two-tailed

10
Tests About a Population Proportion
Test Statistic

where:

assuming np > 5 and n(1 – p) > 5

11
Tests About a Population Proportion
Rejection Rule: p –Value Approach
Reject H0 if p –value < 
Rejection Rule: Critical Value Approach
H0: pp Reject H0 if z > z

H0: pp Reject H0 if z < -z

H0: pp Reject H0 if z < -z or z > z

12
Two-Tailed Test About a Population Proportion
Example: City Traffic Police

For a New Year’s week, the City


Traffic Police claimed that 50% of the
accidents would be caused by drunk
driving.

A sample of 120 accidents showed


that 67 were caused by drunk driving.
Use these data to test the Traffic
Police’s claim with  = .05.

13
p –Value Approach

14
Two-Tailed Test About a Population Proportion

H 0 : p  .5
1. Determine the hypotheses.
H a : p  .5

2. Specify the level of significance.  = .05

3. Compute the value of the test statistic.

p0 (1  p0 ) .5(1  .5)
sp    .045644
n 120
p  p0 (67 /120)  .5
z   1.28
sp .045644
15
Two-Tailed Test About a Population Proportion

4. Compute the p -value.

For z = 1.28, cumulative probability = .8997 p–value = 2(1 - .8997) = .2006

5. Determine whether to reject H0.

Because p–value = .2006 >  = .05, we cannot reject H0.

16
17
Critical Value Approach

18
Two-Tailed Test About a Population Proportion

4. Determine the critical value and rejection rule.

For /2 = .05/2 = .025, z.025 = 1.96

Reject H0 if z < -1.96 or z > 1.96

5. Determine whether to reject H0.

Because 1.278 > -1.96 and < 1.96, we cannot reject H0.

19
Errors in Hypothesis Testing

Dr. A. Ramesh
Department of Management Studies
Indian Institute of Technology Roorkee

1
Example

• We are interested in burning rate of a solid propellant used to power aircrew escape systems

• Burning rate is a random variable that can be described by a probability distribution

• Suppose our interest focus on mean burning rate

• Ho: µ = 50 centimeters per second

• H1: µ ≠ 50 centimeters per second

Reference: Applied statistics and probability for engineers, Douglas C. Montgomery, George C. Runger, John Wiley &
Sons, 2007

2
Value of the null hypothesis

• The value of the null hypothesis can be obtained by

– Past experience or knowledge of the process, or even from the previous tests or experiments

– From some theory or model regarding the process under study

– From external consideration, such as design or engineering specifications, or from contractual

obligations

3
Note: for this example n=10

Note: for this example we will


assume  = 2.5

4
Type I Error

• The true mean burning rate of the propellant could be equal to 50 centimeters per second

• However randomly selected propellant specimens that are tested, we could observe a value of test

statistics x that falls into the critical region(rejection region).

• We would then reject the null hypothesis Ho in favor of the alternate H1, in fact, Ho is really true

• This type of wrong conclusion is called a type I error

5
Type I Error

• Rejecting the null hypothesis Ho when


it is true is defined as a type I error

6
Type II Error

• Now suppose the true mean burning rate is different from 50 centimeters per second, yet the sample

mean x falls in the acceptance region

• In this case we would fail to reject Ho when it is false

• This type of wrong conclusion is called a type II error

7
Type II Error

• Failing to reject the null


hypothesis when it is false is
defined as a type II error

8
Type 1 and Type II Errors

H0 is correct H0 is incorrect

H0 is accepted correct decision Type II error ()


Incorrect
acceptance

H0 is rejected Type I error () correct decision


Incorrect rejection

9
Type I error

• In the propellant burning rate example, a type I error will occur when either x  51.5 _ or _ x  48.5

when the true mean burning rate is µ = 50 centimeters per second

• Suppose the standard deviation of burning rate is σ = 2.5 centimeters per second and n = 10

• Probability distribution µ = 50,standard error = 0.79.

• Type I error is
  P( x  48.5 _ when _   50)  P( x  51.5 _ when _   50)

10
Where
does this We will reject the null
number hypothesis ( = 50) if our
come sample mean is either of
from? these two regions

11
12
Type I error

• Type I error = 0.057434

• This implies that 5.7 % of all random samples would lead to rejection of the hypothesis Ho: µ=50

centimeters per second.

• We can reduce the type I error by widening the acceptance region. If we make critical value 48 and

52, the value of alpha is 0.0114 ( adding 0.0057 and 0.0057).

• Change sample size to 16 then alpha is 0.0164.

13
TYPE II ERROR

14
The pink area is
the probability
of a Type II error
if the actual mean
is 52.

15
Type II Error

• Type II error will be committed if the sample mean x-bar falls between 48.5 and 51.5 (critical region

boundaries) when µ = 52.   P(48.5  x  51.5 _ when _   52)

• 0.2643

• When µ = 50.5

• 0.8923

16
17
18
Computing the
probability of a type II
error may be the most
difficult concept

19
For constant n, increasing the acceptance region (hence
decreasing ) increases .

Increasing n, can decrease both types of errors.

20
Type I & II Errors Have an Inverse Relationship

If you reduce the probability of one error, the other


one increases so that everything else is unchanged.

21
Factors Affecting Type II Error

• True value of population parameter


–  Increases when the difference between hypothesized parameter and its true value
decrease
• Significance level
– Increases when  decreases
• Population standard deviation  
– 
Increases when increases

• Sample size
–  Increases when n decreases  


n
22
How to Choose between Type I and Type II Errors

• Choice depends on the cost of the errors

• Choose smaller Type I Error when the cost of rejecting the maintained hypothesis is high

– A criminal trial: convicting an innocent person

• Choose larger Type I Error when you have an interest in changing the status quo

23
Calculating the probability of Type II Error

Ho: µ = 8.3
H1: µ < 8.3

Determine the probability of Type II error if µ = 7.4 at 5% significance level. σ = 3.1 and n = 60.

24
Solution:

An error will be made when Z ≥ -1.645, for that will fail to reject Ho.
ᵦ = 0.2729
25
Solving for Type II Errors:
Example

Ho:   12    Zc

X
Ha:   12
c
n
010
.
 12  ( 1645
. )
60
Rejection
Region
 11979
.
=.05
If X  11979
. , reject Ho.
Non Rejection Region
=0 If X  11979
. , do not reject Ho.
Zc  1.645

26
Type II Error for Example with  =11.99 Kg

Reject Ho Do Not Reject


Type I Ho Correct
Error Decision
95%
=.05
Ho is True   Z0

Ho is False
Correct Type II
19.77% =.8023
Decision Error

Z1

X
  

27
28
Type II Error for Demonstration with =11.96 Kg

Reject Ho Do Not Reject Ho


Type Correct
I 95% Decision
Error
=.05
Ho is True  
Z0

Ho is False
Correct =.0708 Type II
Decision 92.92% Error

Z1

  
X
29
30
Hypothesis Testing and Decision Making

• We have illustrated hypothesis testing applications referred to as significance tests

• In the tests, we compared the p-value to a controlled probability of a Type I error, a, which is
called the level of significance for the test

• With a significance test, we control the probability of making the Type I error, but
not the Type II error
• We recommended the conclusion “do not reject H0” rather than “accept H0”
because the latter puts us at risk of making a Type II error

31
Hypothesis Testing and Decision Making

• With the conclusion “do not reject H0”, the statistical evidence is considered inconclusive

• Usually this is an indication to postpone a decision until further research and testing is
undertaken
• In many decision-making situations the decision maker may want, and in some cases may be
forced, to take action with both the conclusion “do not reject H0 “and the conclusion “reject
H0.”

• In such situations, it is recommended that the hypothesis-testing procedure be extended to


include consideration of making a Type II error

32
Power of a test

• The mean response time for a random sample of 40 food-


order is 13.25 minutes
• The population standard deviation is believed to be 3.2
minutes.
• The restaurant owner wants to perform a hypothesis test,
with  =0.05 level of significance, to determine whether the
service goal of 12 minutes or less is being achieved.

33
Calculating the Probability of a Type II Error

Hypotheses are: H0:    and Ha:   

Rejection rule is: Reject H0 if z > 1.645

Value of the sample mean that identifies the rejection region:

We will accept H0 when x < 12.8323

34
Calculating the Probability of a Type II Error

Probabilities that the sample mean will be in the acceptance region:

Values of   1-
14.0 -2.31 .0104 .9896
13.6 -1.52 .0643 .9357
13.2 -0.73 .2327 .7673
12.8323 0.00 .5000 .5000
12.8 0.06 .5239 .4761
12.4 0.85 .8023 .1977
12.0001 1.645 .9500 .0500

35
36
Power of the Test

• The probability of correctly rejecting H0 when it is false is called the power of the test.

• For any particular value of m, the power is 1 – b.

• We can show graphically the power associated with each value of


power curve.
 ; such a graph is called a

37
Power Curve

1.00

Rejecting Null Hypothesis


0.90

Probability of Correctly
0.80
H0 False
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00 
11.5 12.0 12.5 13.0 13.5 14.0 14.5

38
Thank You

39
Hypothesis Testing: Two sample test

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE

1
Hypothesis Testing about the Difference in Two
Sample Means
Population 1
X 1

X X
x 1 2
X  n1
1
X X1 2


x
X 2
n2

X 2

Population 2

2
Two Sample Tests
Two Sample Tests

Population Population
Means, Means, Population Population
Independent Dependent Proportions Variances
Samples Samples
Examples:
Group 1 vs. Same group before Proportion 1 vs. Variance 1 vs.
independent vs. after treatment Proportion 2 Variance 2
Group 2

3
Difference Between Two Means

Population means,
independent samples

σ12 and σ22 known Test statistic is a z value

σ12 and σ22 unknown

σ12 and σ22 assumed equal


Test statistic is a value from the
σ12 and σ12 assumed Student’s t distribution
unequal
4
σ12 and σ12 Known

Population means, Assumptions:


independent samples
 Samples are randomly and
independently drawn

σ12 and σ22 known  both population distributions


are normal
σ12 and σ22 unknown
 Population variances are
known

5
σ12 and σ22 Known

When σx2 and σy2 are known and both


Population means, populations are normal, the variance of 1 – 2
independent is 2
σ1 σ 2
2
samples σ 2X1 X2  
n1 n 2

σ12 and σ22 known …and the random variable


(x1  x 2 )  (μ1  μ 2 )
Z
σ12 σ 22
σ12 and σ22 unknown 
n1 n 2
has a standard normal distribution
6
Test Statistic, σ12 and σ22 Known

Population means, H0 :μ1  μ 2  D0


independent
samples The test statistic for
μ1 – μ2 is:
σ12 and σ22 known
z
 x 1 
 x2  D0

σ12 and σ22 unknown σ12 σ 2 2



n1 n2

7
Hypothesis Tests for Two Population Means

Two Population Means, Independent Samples

Lower-tail test: Upper-tail test: Two-tail test:


H0: μ1  μ2 H0: μ1 ≤ μ2 H0: μ1 = μ2
H1: μ1 < μ2 H1: μ1 > μ2 H1: μ1 ≠ μ2
i.e., i.e., i.e.,
H0: μ1 – μ2  0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0

8
Decision Rules

a a
a/2 a/2

-za za -za/2 za/2


Reject H0 if z < -za Reject H0 if z > za Reject H0 if z < -za/2 or z > za/2

9
Hypothesis Testing about the Difference in Two
Sample Means

X  X2
  
1 2
X   2
1
  2
2
 X2 n n
1
1
1 2

X 1
 X2
X 1
 X 2

10
Sampling Distribution of x1  x2

• Expected Value

• Standard Deviation (Standard Error)

where: 1 = standard deviation of population 1


2 = standard deviation of population 2
n1 = sample size from population 1
n2 = sample size from population 2

11
Interval Estimation of 1 - 2:  1 and  2 Known
• Interval Estimate

where: 1 - a is the confidence coefficient

12
Problem ( 1 and  2 Known)
• A product developer is interested in reducing the drying time of a primer paint.
• Two formulations of the paint are tested; formulation 1 is the standard chemistry, and
formulation 2 has a new drying ingredient that should reduce the drying time.
• From experience, it is known that the standard deviation of drying time is 8 minutes, and this
inherent variability should be unaffected by the addition of the new ingredient.
• Ten specimens are painted with formulation 1, and another 10 specimens are painted with
formulation 2; the 20 specimens are painted in random order.
• The two-sample average drying times are 𝑥1 = 121 minutes and 𝑥2 = 112 minutes,
respectively.
• What conclusions can the product developer draw about the effectiveness of the new
ingredient, using alpha = 0.05?
Source: Applied Probability and statistics for Engineers by Douglas C. Montgomery and George C. Runger John Wiley, 3rd Ed. 2003

13
Problem ( 1 and  2 Known)

14
Problem ( 1 and  2 Known)

15
Problem ( 1 and  2 Known)
Reject H0

t
121  112   0  2.52
.05
0 1.645 t
 1 1 2.52
8   
2

 10 10  Decision:
Reject H0 at a = 0.05
Conclusion:
There is evidence of a difference in
means.

16
Problem ( 1 and  2 Known)

17
Problem ( 1 and  2 Known)

18
σ12 and σ22 Unknown, Assumed Equal

Population means, Assumptions:


independent samples • Samples are randomly and
independently drawn
σ12 and σ12 known • Populations are normally
distributed
σ12 and σ22 unknown
• Population variances are unknown
σ12 and σ12 assumed equal
*
σ12 and σ12 assumed unequal
but assumed equal

19
σ12 and σ22 Unknown, Assumed Equal

• The population variances are assumed equal, so use the two sample
standard deviations and pool them to estimate σ

• use a t value with (n1 + n2 – 2) degrees of freedom

20
Test Statistic, σ12 and σ22 Unknown, Equal
The test statistic for
μ1 – μ2 is:

t
 x 1 
 x2   μ1  μ 2 
s 2p s 2p

n1 n2

Where t has (n1 + n2 – 2) d.f.,


and (n1  1)s12  (n 2  1)s 22
s 
2

n1  n 2  2
p

21
Decision Rules

1 2 1 2 1 2
1 2 1 2 1 2

22
Decision Rules

23
σ12 and σ22 Unknown, Assumed equal
• Two catalysts are being analyzed to
determine how they affect the mean Observation Catalyst 1 Catalyst 2
yield of a chemical process. Number
• Specifically, catalyst 1 is currently in use, 1 91.50 89.19
but catalyst 2 is acceptable. 2 94.18 90.95
• Since catalyst 2 is cheaper, it should be 3 92.18 90.46
adopted, providing it does not change 4 95.39 93.21
the process yield. 5 91.79 97.19
• A test is run in the pilot plant and results 6 89.07 97.04
in the data shown in table. 7 94.72 91.07
• Is there any difference between the 8 89.21 92.75
mean yields?
𝑥 1= 92.255 𝑥 1 = 92.733
• Use 0.05, and assume equal variances.
s1 =2.39 s2 =2.98
24
σ12 and σ22 Unknown, Assumed equal

25
σ12 and σ22 Unknown, Assumed equal

26
σ12 and σ22 Unknown, Assumed equal

27
σ12 and σ22 Unknown, Assumed equal

28
Thank You

29

You might also like