0% found this document useful (0 votes)
22 views31 pages

Session1 QTII 24

Uploaded by

b24093
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views31 pages

Session1 QTII 24

Uploaded by

b24093
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Dr.

Pritha Guha
Quantitative Techniques - II [email protected]
About The Course: Course Material
Text Book

• Statistics for Business Decision Making and Analysis, Stine R., Foster D., Pearson.

Other references
An Introduction to Statistical Learning: with Applications in R (James, Whitten, Hastie, Tibshirani)

• Bowerman B., O’Connell R., Murphy E., Business Statistics in Practice, 8th ed., McGraw Hill Education
(India)
• Applied Business Statistics: Making Better Business Decision, Ken Black, Wiley-India, 7th Edition.
• Statistics for Business and Economics, Anderson, Sweeny Williams, 11th ed., Cengage.
• The Lady Tasting Tea, David Salsburg, Henry Holt \& Company Inc
• Errors, Blunders And Lies, David Salsburg
• Statistics, Freedman, Pisani, Purves, 4th Edition, W. W. Norton \& Company
• Business Analytics for Managers, Wolfgang Jank, (2011), Springer
• An Introduction to Statistical Learning: with Applications in R (James, Whitten, Hastie, Tibshirani)
• Regression Analysis by Example, Chatterjee and Hadi, Wiley
• Slides, data sets and other materials would be uploaded in AIS.
• Some books:
• The Book of R by Davis
• R for Data Science: Import, Tidy, Transform, Visualize, and
About The Model Data by Wickham
• ggplot2: Elegant Graphics for Data Analysis by Wickham
Course: • Introduction to Data Science: Data Analysis and Prediction
R Help Algorithms with R by Irizarry
• R in Action by Kabakoff
Evaluation

About the course: • 2 Quizzes (40%)

Evaluation • 1 Group Assignment (20%)


• End Term (40%)

Exam Dates
All the tests are closed book/notes.
• Quiz 1: 11th November 8:30-9:15AM (45 minutes)
The tests will have some MCQs, fill in the • Quiz 2: 4th December 8:30-9:15AM (45 minutes)
blanks.
You are allowed to bring
Grades
• A scientific calculator(mobile phones are
not allowed inside the exam hall)
• A+: [90-100]; A: [80-90)
• One page (max A4 size) with notes on both
• B+: [70, 80); B: [60, 70)
sides for quizzes and two page (max A4
• C+: [50, 60); C: [40, 50)
size) with notes on both sides for the end
term. • D+:[30-40); D: [20, 30)
• F:[0, 20)
• Come to the class a few minutes early.
• Every absence needs to be approved by the Dean’s office or supported with a medical
certificate submitted by the end of the day. Other absences need to be informed
latest by the end of the day of absence and will be dealt on a case-to-case basis.
• Inform me by the end of the day if you are unable to attend a session. Failure to do so
About The will result in a 2-mark deduction per unexcused absence.
• Keep R open. We would be using R for almost all our calculations.
Course • If you do not understand something I say, you are going to stop me and ask for
clarification.
• Some examples and the template of the group assignment would be shared in the AIS.
Sampling
There Are So Many Questions...

...how to answer those?


How can you evaluate • Every statistical study is motivated by a question
the evidence of global (like one of the above) that directly relates to
warming? How far is it reality.
from the Earth
to the Sun? • To answer that question satisfactorily, we have to
design a suitable study that produces data (or
Does Aspirin information).
reduce the
chance of heart • Statistical techniques are then applied to make
attack? sense of the data and the conclusions are drawn
Which vaccine is most
(by using those statistical methods on the dataset)
affective against the
which help us to answers the questions we started
corona virus?
off with.
Population, Sample And Related Concepts

Population: All members of a Sample: A subset of the population.


relevant group Statistic: Calculated from sample and
Parameter: A number associated used to make inferences about the
with the population population.

A sample should be a good representation of the population!


Random Sampling

• Simple Random Sampling


• Systematic sampling
• Stratified sampling
Choosing a • Cluster sampling

Sample Non-Random Sampling

• Convenience Sampling
• Snowball Sampling
• Judgmental Sampling
• Quota Sampling
Choosing a Sample: Random Sampling
Random Sampling:
• Before the sample is drawn, it has to be possible to calculate the probability
with which each member of the population will be included in the sample.
• This probability does not have to be the same for all members of the
population!

Random Sampling Techniques:


• Simple Random Sampling (SRS)
• Systematic Sampling
• Cluster Sampling
• Stratified Sampling
Random Sampling: Simple Random Sampling

• Each particular sample of size n has equal chance of getting selected.


• Two types of Simple Random Sampling:
• Simple Random Sampling With Replacement (SRSWR)
• Simple Random Sampling Without Replacement (SRSWOR)
Example
• Suppose our population is: a, b, c, d, e
• We want to have a sample of size 2.
• List all possible samples when doing sampling with replacement
(SRSWR) and sampling without replacement (SRSWOR).
Random Sampling: Simple Random Sampling
Simple Random Sampling With Replacement (SRSWR)

• Any unit may appear more than once in the sample.


• Number of possible samples: 𝑁 𝑛 , where N = population size, n= sample
size.

Simple Random Sampling Without Replacement


(SRSWOR)
• Same unit will not appear more than once in the sample.
𝑁
• Number of possible samples: NPn(for ordered N P n) and 𝑛
(for
unordered), where N = population size, n= sample size.
In R

• x = c(1:100)
• For SRSWR:
• sample(x, 5, replace = TRUE)
• For SRSWOR:
• sample(x, 5, replace = FALSE)
Random Sampling:
Simple Random Sampling With and Without Replacement

• Think of sampling from a very large population (Example: Entire nation; visitors to a
public place etc.)
• WR/WOR are practically equivalent: being selected twice is almost impossible. The
population can be considered infinite for all practical purposes.
• Simple random sampling scheme for such 'infinite' population is defined as
• Each element of the sample should come from the population.
• The elements are selected independently of each other.
Random Sampling: Systematic Sampling

• Population elements are ordered in a sequence.


𝑁
• Population Size: N, Sample Size: n, choose 𝑘 = (systematic sampling interval)
𝑛
• The first sample element is selected randomly from the first 𝑘 population elements.
• Thereafter, sample elements are selected at a constant interval, 𝑘, from the ordered
sequence frame.
• Example: Suppose a supermarket wants to study buying habits of their customers, then
they can select a sample using systematic sampling.
Random Sampling: Cluster Sampling

• Population is divided into non-overlapping clusters or areas.


• Each cluster is a miniature version of the population.
• A subset of the clusters is selected randomly for the sample.
• Assumptions:
• Spread of data inside the clusters are heterogeneous and similar to that of the
population.
• The clusters are similar entities.
• Example: When sampling for age distribution, households can be considered as
clusters. (But not for income distribution.)
Random Sampling: Stratified Sampling

• Population is divided into strata which are distinct and mutually exclusive subsets of
the populations.
• Each stratum is homogeneous within itself and heterogeneous with other strata.
• Sample is selected from each stratum.
• It ensures representation of individuals or items across the total population.
Random Sampling: Stratified Sampling

• Let N= Population size, n= Sample size,


𝑁ℎ = Size of stratum h, 𝑛ℎ = Sample size from stratum h, 𝜎ℎ = sd of stratum h
• What should be the size of the sample to be chosen from each strata?
• The allocation of total sample between different strata may be either proportionate or
disproportionate in stratified random sampling.
𝑁ℎ
• Proportional Allocation: 𝑛ℎ = 𝑛 ×
𝑁
𝑁ℎ 𝜎ℎ
• Neyman Allocation: 𝑛ℎ = 𝑛 × σℎ 𝑁ℎ 𝜎ℎ

• Example of Stratified Random Sampling?


Difference Between Cluster and Stratified Sampling

• In cluster sampling the sampling units are clusters, in stratified sampling the sampling
units are individuals inside each strata.
• In a cluster sample we usually have all members of some clusters. In a stratified sample
we have some members from each stratum.
• Clusters have more variability within than between. Strata have less variability within
than between.
Random
Sampling
Techniques
Choosing A Sample: Non-random Sampling

▪ Non-Random Sampling Techniques:


▪ Convenience Sampling
▪ Judgement Sampling
▪ Quota Sampling
▪ Snowball Sampling
Non-random
Sampling:
▪ Non-probability sampling procedure, involving no restrictions.
Convenience ▪ The investigators have the freedom to choose whomsoever they
Sampling find conveniently as their sample.
▪ Convenient and relatively cheaper to undertake
▪ Does not ensure precision due to lack of control mechanisms.
▪ Useful when study being undertaken is exploratory in nature.
▪ Need to be very careful about extrapolation to the whole
population.
Non-random
Sampling:
Judgement
Sampling ▪ Sample elements are selected by the judgement of the
researcher.
▪ The elements, chosen according to some criterion, are such
that as a group they will adequately represent the parent
population.
▪ The sample is as good as the judgement of the researcher.
Non-random
Sampling:
Quota Sampling ▪ Population is divided into subclasses (similar to a strata).
▪ Sample elements are selected from the subclasses until the
quota requirements are satisfied (not randomly).
▪ Useful when no data frame is available.
▪ Less expensive, fast data collection method, preparatory work
is minimal.
Non-random
Sampling: ▪ Survey subjects are selected based on referral from other survey
respondents.
Snowball Sampling ▪ Useful when sampling for a rare characteristic.
▪ “Hard-to-reach” population: Identify Argentinean immigrant
entrepreneurs in Spain (administratively invisible in national
statistics because they have double nationality (non-EU and EU). )
▪ Used Facebook as data frame!
▪ Ref: Baltar, F., & Brunet, I. (2012). Social research 2.0: virtual
snowball sampling method using Facebook. Internet research.
Choosing A
Sample:
Non-random
▪ Problem with Non-Random Sampling: Data from non-random
Sampling samples are not appropriate for analysis by inferential
statistical methods.
▪ Why Non-Random Sampling may be necessary?
▪ Time constraint
▪ Resource constraint
▪ Non-response
▪ Regulatory issues
Non-random
Sampling
Techniques
Why Should We Do Random Sampling?
The selection of sample units at random is a guard against investigator
biases, even unconscious ones.

Random sampling techniques make possible the calculation of an


estimate of error due to sampling.

In designing a sampling scheme, it is frequently possible to determine


the sample size necessary to obtain a prescribed error level.
Problem
An online florist offers three different sizes for Mother’s Day
bouquets: a small arrangement costing Rs. 80, a medium-sized one for
Rs. 100, and a large one with a price tag of Rs. 120. If 20% of all
purchasers choose the small arrangement, 30% choose medium, and
50% choose large (because they really love Mom!).
Suppose each customer choses only one flower arrangement and the
choices of the customers are independent of each other.
a) Obtain the distribution/probability distribution of the average
amount spent by a single customer on that day.
b) Obtain the distribution/probability distribution of the average
amount spent by two customers on that day.
c) Obtain the probability distribution of the average amount spent by
100 customers on that day.
House Price Data (HW)
• House prices along with house characteristics are
provided in HousePrices.xlsx dataset.
• It consists a sample of house prices (and associated
house characteristics) for a major US metropolitan area.
• Data information:
• house’s ID
• its selling price (in US$)
• its size (in square feet)
• the number of bedrooms and bathrooms
• the number of offers it has received while being on
the market
• whether or not it has brick walls
• the neighborhood where it is located.
• Perform an exploratory data analysis using R!

You might also like