0% found this document useful (0 votes)
19 views49 pages

Lecture#2 - Probability & Statistics - Fall24

Uploaded by

awaisishfaqf23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views49 pages

Lecture#2 - Probability & Statistics - Fall24

Uploaded by

awaisishfaqf23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

PROBABILITY & STATISTICS

MT220
(FALL-2024)

Random Sampling
Lecture#2

By,
Dr. Mehwish Manzur
LAYOUT OF LECTURE

• Collection of Data

• Populations and Samples

• Random Sampling
STEPS INVOLVED IN ANY STATISTICAL RESEARCH

1. Topic and significance of the study


2. Objective of your study
3. Methodology for data-collection
Source of your data
Sampling methodology
Instrument for collecting data

4. Analysis of the collected data


5. Results and conclusions
6. Recommendations based on your study
As far as the objectives of your research are concerned, they should be
stated in such a way that you are absolutely clear about the goal of your
study ---

EXACTLY WHAT IT IS THAT YOU ARE TRYING TO FIND OUT?


As far as the methodology
for DATA-COLLECTION is concerned, you need to consider:
 Source of your data
(the statistical population)
 Sampling Methodology
 Instrument for collecting data
COLLECTION OF DATA

The most important part of statistical work is perhaps the collection of data.
 Statistical data are collected either by a COMPLETE enumeration of the whole
field, called CENSUS, which in many cases would be too costly and too time
consuming as it requires large number of enumerators and supervisory
staff, or by a PARTIAL enumeration associated with a SAMPLE which saves
much time and money.
COLLECTION OF DATA

 Data that have been originally collected (raw data) and have not
undergone any sort of statistical treatment, are called PRIMARY data.
 Data that have undergone any sort of treatment by statistical methods at
least ONCE, i.e. the data that have been collected, classified, tabulated or
presented in some form for a certain purpose, are called SECONDARY data.
5 Traditional Methods of Primary Data Collection
One or more of the following methods are employed to collect primary data:
i) Direct Personal Investigation.
ii) Indirect Investigation.
iii) Collection through Questionnaires.
iv) Collection through Enumerators.
v) Collection through Local Sources.
DIRECT PERSONAL INVESTIGATION
In this method, an investigator collects the information personally from the
individuals concerned. Since he interviews the informants himself, the
information collected is generally considered quite accurate and complete.

However, it is useful for laboratory experiments or localized inquiries. Errors


are likely to enter the results due to personal bias of the investigator.
INDIRECT INVESTIGATION
Sometimes the direct sources do not exist or the informants hesitate to respond for some
reason or other. In such a case, third parties or witnesses having information are
interviewed.
As some of the informants are likely to deliberately give wrong information, so the
reliance is not placed on the evidence of one witness only.
This method is useful when the information desired is complex or there is reluctance or
indifference on the part of the informants. It can be adopted for extensive inquiries.
COLLECTION THROUGH QUESTIONNAIRES
• A questionnaire is an inquiry form comprising of a number of pertinent
questions with space for entering information asked.
• The questionnaires are usually sent by mail, and the informants are
requested to return the questionnaires to the investigator after doing the
needful within a certain period.
• This method is cheap, fairly expeditious and good for extensive inquiries.
But the difficulty is that the majority of the respondents (i.e. persons who are
required to answer the questions) do not care to fill the questionnaires in, and to
return them to the investigators. Sometimes, the questionnaires are returned
incomplete and full of errors.
Students, in spite of these drawbacks, this method is considered as the STANDARD
method for routine business and administrative inquiries.
 It is important to note that the questions should be few, brief, very simple,
easy for all respondents to answer, clearly worded and not offensive to certain
respondents.
COLLECTION THROUGH ENUMERATORS
Under this method, the information is gathered by employing trained
enumerators who assist the informants in making the entries in the
schedules or questionnaires correctly.
This method gives the most reliable information if the enumerator is
well-trained, experienced and tactful.
Students, it is considered the BEST method when a large-scale
governmental inquiry is to be conducted. This method can generally
not be adopted by a private individual or institution as its cost would
be prohibitive to them.
COLLECTION THROUGH LOCAL SOURCES
In this method, there is no formal collection of data but the agents or local
correspondents are directed to collect and send the required information, using
their own judgement as to the best way of obtaining it.
This method is cheap and expeditious, but gives only the estimates.
COLLECTION OF SECONDARY DATA
It is usually obtained from the following sources:
i) Official, e.g. the publications of the Statistical Division, Ministry of Finance, the
Federal and Provincial Bureaus of Statistics, Ministries of Food, Agriculture,
Industry, Labour, etc.
ii) Semi-Official, e.g., State Bank of Pakistan, Railway Board, Central Cotton
Committee, Boards of Economic Inquiry, District Councils, Municipalities, etc.
iii) Publications of Trade Associations, Chambers of Commerce, etc.
iv) Technical and Trade Journals and Newspapers.
v) Research Organizations such as universities, and other institutions.
Let us now consider the POPULATION from which we
will be collecting our data.
But, the problem is that very rarely do we have access to
the entire population.

In fact, the goal of the science of Statistics is to draw


conclusions about large populations on the basis of
information contained in small samples.
POPULATION
It includes all the elements from the data set and measurable
characteristics of the population such as mean and standard
deviation are known as a parameter. For example, All people
living in Pakistan indicates the population of Pakistan.

 There are different types of population. They are:


i. Finite Population
ii. Infinite Population
iii. Existent Population
iv. Hypothetical Population
TYPES OF POPULATION
Finite Population
The finite population is also known as a countable population in
which the population can be counted. Examples of finite
populations are employees of a company, potential consumer in
a market.
Infinite Population
The infinite population is also known as an uncountable
population in which the counting of units in the population is not
possible. Example of an infinite population is the number of
germs in the patient’s body is uncountable.
TYPES OF POPULATION
Existent Population
The existing population is defined as the population of concrete
individuals. In other words, the population whose unit is
available in solid form is known as existent population.
Examples are books, students etc.
Hypothetical Population
The population in which whose unit is not available in solid
form is known as the hypothetical population. A population
consists of sets of observations, objects etc. Examples are an
outcome of rolling the dice, the outcome of tossing a coin.
Sampled Population And The Target Population

1.Target Population
 The target population (also called the universe) refers to the entire group of individuals,
objects, or events that a researcher is interested in studying. This is the group from which you
ideally want to generalize your findings.
 It’s the population you want to draw conclusions about.
 In practice, it's often too large or difficult to study every single member of the target
population, so a sample is taken instead.

Example:
If a researcher wants to study college students' attitudes toward online learning, the target
population could be all college students in the world.
2. Sampled Population
 The sampled population refers to the group from which the sample is actually drawn.
It is a subset of the target population that the researcher can realistically access.
 The sampled population may differ from the target population due to practical
limitations such as time, cost, geography, or other constraints.
Example:
In the same study of college students' attitudes toward online learning, the sampled
population could be college students from a few universities or a specific region
(if the researcher cannot access all students worldwide).
Examples:
1. Health Survey Example:
1. Target Population: All adults in a country.
2. Sampled Population: Adults in selected cities (due to budget constraints,
the survey is only conducted in specific cities).
2. Customer Feedback Example:
1. Target Population: All customers who purchased a product in the past year.
2. Sampled Population: Customers who responded to an email survey (only
those who saw the survey and chose to participate).
Question
Suppose we desire to know the opinions of the college students in the Punjab
regarding the present examination system. Comment on Sampled and Targeted
Population.
Thus, our population will consist of the total no. of students in all the colleges in
the Punjab.
For example, suppose on account of shortage of resources or of time, we are able to
conduct such a survey on only 5 colleges scattered throughout the province.
In this case, the students of all the colleges will constitute the target population
whereas the students of those 5 colleges from which the sample of students will be
selected will constitute the sampled population.
SAMPLING FRAME

A sampling frame is a complete list of all the elements in the population from
which the sample is drawn.
Examples:
• The complete list of the BCS students of NUTECH on September 23 ,
2024.
• If a researcher is conducting a survey of university students, the sampling
frame could be a list of all enrolled students at that university, which might
be obtained from the registrar's office.
• If a company wants to survey its customers, the sampling frame could be a
customer database containing the contact information of all customers.
The Sampling frame should be free from various types of defects:
 does not contain inaccurate elements

 is not incomplete

 is free from duplication

 is not out of date.


Sample

A sample is only a part of a statistical population, and hence it can


represent the population only to some extent.
 The larger the sample, the more likely it is to represent the
population.
Sampling & Non-Sampling Errors
1. Sampling Error:
Sampling Error is the difference between the estimate derived from the
sample (i.e. the statistic) and the true population value (i.e. the
parameter) is technically called the sampling error.
Sampling error = X 
 Sampling error arises due to the fact that a sample cannot exactly
represent the population, even if it is drawn in a correct manner.
2. Non-Sampling Error:
Besides sampling errors, there are certain errors which are not
attributable to sampling but arise in the process of data collection, even if a
complete count is carried out.
Main sources of non sampling errors are:

1. The defect in the sampling frame.


2. Faulty reporting of facts due to personal preferences.
3. Negligence or indifference of the investigators
4. Non-response to mail questionnaires.
In fact, the longer the time period over which the
survey is conducted, the greater will be the potential
VARIATIONS in attitudes and opinions of the
respondents. Hence, a well-defined cut-off date generally
needs to be established.
Non- Random Sampling

‘Nonrandom sampling’ implies that kind of sampling in which the


population units are drawn into the sample by using one’s personal
judgement.

This type of sampling is also known as purposive sampling.


Example:
If a researcher wants to study the eating habits of college students and
decides to survey students in the cafeteria at a specific university, that
would be a convenience sample. The sample might not represent all college
students, especially those who do not eat in the cafeteria or attend different
universities.
Random Sampling

The theory of statistical sampling rests on the assumption


that the selection of the sample units has been carried out in a
random manner.

By random sampling we mean sampling that has been done


by adopting the lottery method.
Types of Random Sampling

 Simple Random Sampling


 Stratified Random Sampling
 Systematic Sampling
 Cluster Sampling
 Multi-stage Sampling etc.
1. Simple Random Sampling
•Definition: Every individual in the population has an equal chance of
being selected.
•How it works: You randomly choose individuals without any specific
pattern (like picking names from a hat).
•Example: You want to survey students in a school, so you randomly
pick 50 names from the entire student list.
2. Stratified Random Sampling
•Definition: The population is divided into groups (strata) based on a
shared characteristic, and random samples are drawn from each group.
•How it works: You first divide the population into subgroups (e.g., male
and female) and then randomly select individuals from each group in
proportion to their size in the population.
•Example: You want to survey both male and female students. If the
school has 60% male and 40% female students, you randomly select
participants from both groups in those proportions.
3. Systematic Random Sampling
•Definition: You select every k-th individual from a list,
starting from a random point.
•How it works: After choosing a random starting point, you
pick every k-th element (e.g., every 10th name) in the
population list.
•Example: You have a list of 1,000 people and need a sample
of 100. You randomly pick a starting number and select every
10th person from that point on.
4. Cluster Random Sampling
•Definition: The population is divided into clusters and entire
clusters are randomly selected for the sample.
•How it works: Instead of selecting individuals, you randomly choose
clusters of the population and then collect data from every individual
in the selected clusters.
•Example: You want to survey households in a city. You randomly
pick 5 neighborhoods (clusters) and survey all households within
those neighborhoods.
5. Multi-Stage Sampling
•Definition: A combination of multiple sampling methods, often used
for large populations.
•How it works: You randomly select clusters (as in cluster sampling), and
then within those clusters, you perform another random sampling method
like simple or stratified random sampling.
•Example: For a national survey, you randomly select cities (clusters),
then randomly select schools within those cities, and finally randomly
select students within those schools.
Key Differences:

 Simple: Equal chance for everyone.

 Stratified: Group-based sampling.

 Systematic: Based on a pattern (every k-th person).

 Cluster: Groups are randomly selected, and everyone in the

group is surveyed.

 Multi-Stage: Combination of different methods.


Question
Example

The following frequency table of distribution gives the ages of a population of 1000
teen-age college students in a particular country. Select a sample of 10 students using the
random numbers table. Find the sample mean age and compare with the population mean
age.

Age (years) Frequency


17 100
18 200
19 350
20 250
21 100
Total 1000
Step 1: Frequency Distribution Table

Age (years) Frequency


17 100
18 200
19 350
20 250
21 100
Total 1000
Step 2: Cumulative Frequency Table
Let’s compute the cumulative frequencies for these age groups:

Age Cumulative
Frequency
(years) Frequency
17 100 100
18 200 300
19 350 650
20 250 900
21 100 1000
Step 3: Random Numbers Generation
Now, we need 10 random numbers from 1 to 1000 to select our
sample. If we had a random number table (or a random number
generator), we could generate numbers like this:

•Generated Random Numbers: 75, 231, 512, 650, 789, 904, 112,
333, 480, 699
Step 4: Select the Sample Using Cumulative Frequencies
Using these random numbers and matching them to the cumulative frequency table:
•75 falls in the 17-year group (since 1-100 corresponds to 17 years old). Age Freque Cumulative
•231 falls in the 18-year group (since 101-300 corresponds to 18 years old). (years) ncy Frequency

•512 falls in the 19-year group (since 301-650 corresponds to 19 years old). 17 100 100
18 200 300
•650 falls in the 19-year group (since 301-650 corresponds to 19 years old).
19 350 650
•789 falls in the 20-year group (since 651-900 corresponds to 20 years old).
20 250 900
•904 falls in the 21-year group (since 901-1000 corresponds to 21 years old).
21 100 1000
•112 falls in the 18-year group (since 101-300 corresponds to 18 years old).
•333 falls in the 19-year group (since 301-650 corresponds to 19 years old).
•480 falls in the 19-year group (since 301-650 corresponds to 19 years old).
•699 falls in the 20-year group (since 651-900 corresponds to 20 years old).
So the sample we’ve selected based on these random
numbers is:
•1 student aged 17
•2 students aged 18
•4 students aged 19
•2 students aged 20
•1 student aged 21
Step 5: Sample Mean Age
Now, we calculate the mean age of the sample:

So, the sample mean age is 19 years.


Step 6: Population Mean Age
To calculate the population mean age, we use the frequency distribution table:

So, the population mean age is 19.05 years.


Step 7: Comparison
•Sample Mean Age: 19 years
•Population Mean Age: 19.05 years

The sample mean age is very close to the population mean age, which
shows that our random sampling method has worked well in
approximating the population characteristics
Sampling Error:

Sampling Error = 19 − 19.05 = −0.05

 In this case, the sampling error is -0.05 years, meaning


that the sample underestimated the population mean by
0.05 years. The error is small but still exists due to the
randomness of selecting only a subset of students.

You might also like