0% found this document useful (0 votes)
14 views

RM Module 2

1. A sample design is a plan for selecting a sample from a population. It involves defining the population, sampling unit, sample size, and sampling method. 2. Probability sampling methods give every unit in the population an equal chance of being selected. Common probability methods include simple random sampling, systematic sampling, stratified random sampling, cluster sampling, and multistage sampling. 3. In simple random sampling, each sample has an equal chance of selection and the population is small and homogeneous. Systematic sampling selects elements at regular intervals after a random start. Stratified random sampling divides the population into groups and randomly samples from each.

Uploaded by

sumitsuman732
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

RM Module 2

1. A sample design is a plan for selecting a sample from a population. It involves defining the population, sampling unit, sample size, and sampling method. 2. Probability sampling methods give every unit in the population an equal chance of being selected. Common probability methods include simple random sampling, systematic sampling, stratified random sampling, cluster sampling, and multistage sampling. 3. In simple random sampling, each sample has an equal chance of selection and the population is small and homogeneous. Systematic sampling selects elements at regular intervals after a random start. Stratified random sampling divides the population into groups and randomly samples from each.

Uploaded by

sumitsuman732
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

2/13/2024

Sample Design
Quite often we select only a few items from the universe for
our study purposes. The items so selected constitute what is
SAMPLE DESIGN, MEASUREMENT technically called a sample.
AND DATA COLLECTION
The researcher must decide the way of selecting a sample or
what is popularly known as the sample design. In other words,
a sample design is a definite plan determined before any data
are actually collected for obtaining a sample from a given
population.
Dr. Swarnambuj Suman
Assistant Professor  Researcher must select/prepare a sample design which
should be reliable and appropriate for his research study.
Mechanical Engineering Department
NIT Patna
35 36

1
2/13/2024

Steps in Sample Design Steps in Sample Design Cont…


3. Source list
1. Type of universe • It is also known as ‘sampling frame’ from which sample is to be
• The first step in developing any sample design is to clearly define
drawn.
the set of objects, technically called the Universe, to be studied. • It should be comprehensive, correct, reliable and
• The universe can be finite or infinite.
appropriate.
• It is extremely important for the source list to be as
representative of the population as possible.
2. Sampling unit
• A decision has to be taken concerning a sampling unit before
selecting sample. 4. Size of sample
• This refers to the number of items to be selected from the
• Sampling unit may be a geographical one such as state, district,
village, etc. or a construction unit such as house, flat, etc., or it
universe to constitute a sample.
may be a social unit such as family, club, school, etc., or it may be • The size of sample should neither be excessively large, nor too
an individual. small. It should be optimum.
• The researcher will have to decide one or more of such units that • An optimum sample is one which fulfills the requirements of
he has to select for his study. efficiency, representativeness, reliability, and flexibility.
37 38

2
2/13/2024

Steps in Sample Design Cont… Types of sampling


5. Parameters of interest
• In determining the sample design, one must consider the
question of the specific population parameters which are
of interest.
Sampling
6. Budgetary constraint
• Cost considerations

7. Sampling procedure Non-


• Finally, the researcher must decide the type of sample he Probability
Probability
will use i.e., he must decide about the technique to be Sampling
used in selecting the items for the sample.
Sampling

39 40

3
2/13/2024

Probability Sampling
Keeping this in view we can define a simple random
 Probability
sampling is also known as ‘random sampling’ or sample (or simply a random sample) from a finite
‘chance sampling’. population as a sample which is chosen in such a way
that each of the NCn possible samples has the same
probability, 1/ NCn, of being selected.
A probability sampling scheme is one in which every unit in
the population has a chance (greater than zero) of being selected
in the sample, and this probability can be accurately determined.

When every element in the population does have the same


probability of selection, this is known as an 'equal probability of
selection' (EPS) design. Such designs are also referred to as 'self-
weighting' because all sampled units are given the same weight.

41 42

4
2/13/2024

Methods used in probability sampling SIMPLE RANDOM SAMPLING

• Applicable when the population is small,


homogeneous & readily available.

• All subsets of the frame are given an


equal probability. Each element of the
Probability Sampling frame thus has an equal probability of
selection.

• It provides for greatest number of


Simple
Systematic Stratified Cluster Multi-Stage
possible samples. This is done by
Random assigning a number to each unit in the
sampling Sampling Sampling Sampling
Sampling sampling frame.

• A table of random number or lottery


system is used to determine which units
are to be selected.
43 44

5
2/13/2024

SYSTEMATIC SAMPLING Stratified Random Sampling


• Systematic sampling relies on arranging the target population according
to some ordering scheme and then selecting elements at regular intervals
through that ordered list. • The population is divided into two or more groups called
Systematic sampling involves a random start and then proceeds with the strata, according to some criterion, such as geographic
selection of every kth element from then onwards.
In this case, k=(population size/sample size). location, grade level, age, or income, then subsamples are
Example would be to select every 2nd person from the telephone directory. randomly selected from each strata.

45 46

6
2/13/2024

Stratified Random Sampling Cont… Stratified Random Sampling Cont…


 The following three questions are highly relevant in the context of Disproportionate stratified sampling: The no. of sampling
stratified sampling: units drawn from each strata is based on the analytical
(a) How to form strata? consideration, but not in proportion to the size of the population of
that strata.
(b) How should items be selected from each stratum?
(c)How many items be selected from each stratum or how to allocate
the sample size of each stratum? where σ1, σ2 , ... and σk denote the standard deviations of the k strata,
N1, N2,…, Nk denote the sizes of the k strata and n1, n2,…, nk denote
Proportionate stratified sampling: The no. of sampling units drawn the sample sizes of k strata. This is called ‘optimum allocation’ in the
from each strata is in proportion to the population size of that strata. context of disproportionate sampling. The allocation in such a
situation results in the following formula for determining the sample
sizes different strata:

47 48

7
2/13/2024

Numerical

49 50

8
2/13/2024

Cluster Sampling
The population is divided into subgroups (clusters) like families. A
simple random sample is taken of the subgroups and then all
members of the cluster selected are surveyed.

51 52

9
2/13/2024

Stratified Sampling Vs Cluster Sampling Multistage Sampling

This technique is meant for big inquiries extending to a


considerably large geographical area like an entire country.
Under multi-stage sampling the first stage may be to select large
primary sampling units such as states, then districts, then towns
and finally certain families within towns.
If the technique of random-sampling is applied at all stages, the
sampling procedure is described as multi-stage random
sampling.

53 54

10
2/13/2024

Non-Probability Sampling Non-Probability Sampling


Non-probability sampling is also known by different names such as Itis a sampling method where some elements of population have no
deliberate sampling, purposive sampling and judgement sampling. chance of selection (these are sometimes referred to as 'out of
coverage'/'under covered'), or where the probability of selection can't
be accurately determined.
In this type of sampling, items for the sample are selected
deliberately by the researcher; his choice concerning the items
remains supreme. The selection of elements is non-random

Non probability sampling does not allows the estimation of sampling


In such a design, personal element has a great chance of entering errors.
into the selection of the sample.

55 56

11
2/13/2024

Non Probability Sampling QUOTA SAMPLING


 The population is first segmented into mutually exclusive sub-groups,
Non Probability Sampling

just as in stratified sampling.

Then judgment used to select subjects or units from each segment based on
a specified proportion.
Quota Sampling
 Forexample, an interviewer may be told to sample 200 females and 300
males between the age of 45 and 60.
Convenience Sampling/
Snowball Sampling

Purposive Sampling/ Judgmental


Sampling

57 58

12
2/13/2024

CONVENIENCE SAMPLING CONVENIENCE SAMPLING


For example, if the interviewer want to conduct a survey at a
Sometimes known as grab or shopping center early in the morning on a given day, the people that
opportunity sampling or he/she could attend interview would be limited to that given time,
accidental or haphazard sampling. which would not represent the views of other members of society in
such an area.
When population elements are
selected for inclusion in the sample Itmay give biased result particularly when the population is not
based on the ease of access, it can be homogeneous.
called convenience sampling.

A type of non probability sampling


which involves the sample being
drawn from that part of the
population which is close to hand.
That is, readily available and
convenient.

59 60

13
2/13/2024

Judgmental sampling or Purposive sampling


Data Classification
• The researcher chooses the sample based on who they think Data classification is the process of organizing data into categories for its most
would be appropriate for the study. effective and efficient use.
• This is used primarily when there is a limited number of people that The data can be categorized broadly into two groups as following:
have expertise in the area being researched 1. Linguistic data
2. Numeric data
Linguistic Data
Linguistic data is any content that can be analyzed and presented to further
linguistic analysis.
This can include, but is by no means limited to, naturalistic observations fixed
as audio or video recordings or notes, surveys, intuitions and examples.
Numeric Data
The data that is in the form of numbers, and not in any language or descriptive
form. They are also called quantitative data.
61 62

14
2/13/2024

Measurement Scales Nominal Scale


• Nominal scale is simply a system of assigning number symbols to
Measurement Scales

events in order to label them.


Nominal Scale
• Such numbers cannot be considered to be associated with an ordered
scale for their order is of no consequence; the numbers are just
convenient labels for the particular class of events and as such have
no quantitative value.
Ordinal Scale
• Nominal scales provide convenient ways of keeping track of people,
objects and events.

Interval Scale • Example: Player Jersey number, marital status, “Yes or No” answers
to a question as “1” and “0”.

• It is widely used in surveys and other ex-post-facto research when


Ratio Scale data are being classified by major sub-groups of the population.

63 64

15
2/13/2024

Ordinal scale Interval scale


• In those situations when we cannot do anything except set up
• When in addition to setting up inequalities we can also form
inequalities, we refer to the data as ordinal data. differences, we refer to the data as interval data.
• For instance, if one mineral can scratch another, it receives a higher
hardness number and on Mohrs’ scale the numbers from 1 to 10 are • Temperature readings (in degrees Fahrenheit): 58°, 63°, 70°,
assigned respectively to talc (1), gypsum (2), calcite (3), fluorite (4), 95°, 110°, 126° and 135°.
apatite (5), feldspar (6), quartz (7), topaz (8), sapphire (9), and diamond
(10).
• In this case, 100° > 70° or 95° < 135° which simply means that
• With these numbers we can write 5 > 2 or 6 < 9 as apatite is harder than 110° is warmer than 70° and that 95° is cooler than 135°.
gypsum and feldspar is softer than sapphire, but we cannot write for
example 10 – 9 = 5 – 4, because the difference in hardness between
• And, 95° – 70° = 135° – 110°, it sense that the same amount of
diamond and sapphire is actually much greater than that between heat is required to raise the temperature of an object from 70° to
apatite and fluorite. 95° or from 110° to 135°.

65 66

16
2/13/2024

Interval scale Ratio scale


• On the other hand, it would not mean much if we said that 126°F is • When in addition to setting up inequalities and forming differences we
twice as hot as 63°F, even though 126°/63° = 2. can also form quotients (i.e., when we can perform all the customary
• To show the reason, we have only to change to the centigrade scale, operations of mathematics), we refer to such data as ratio data.
where The first temperature becomes 5/9 (126 – 32) = 52°, the • Ratio data includes all the usual measurement (or determinations) of
second temperature becomes 5/9 (63 –32) = 17° and the first figure is
length, height, money amounts, weight, volume, area, pressures, etc.
now more than three times the second.
• This difficulty arises from the fact that Fahrenheit and Centigrade
scales both have artificial origins (zeros) i.e., the number 0 of neither
scale is indicative of the absence of whatever quantity we are trying to
measure.

67 68

17
2/13/2024

Methods of Data Collection Primary Data Collection


• We collect primary data during the course of doing experiments in an
experimental research.
• The task of data collection begins after a research problem has
been defined and research design/plan chalked out.
• But, if we do research of the descriptive type and perform surveys, whether
sample surveys or census surveys, We can obtain primary data either through
Types of data: observation or through direct communication with respondents in one form
or another or through personal interviews.
• Primary data : Which are collected afresh and for the
first time, and thus happen to be original in character. • There are several methods of collecting primary data, particularly in
surveys and descriptive researches. Important one are:
(i) Observation method,
• Secondary data: Those which have already been collected (ii) Interview method,
by someone else and which have already been passed through (iii) Through questionnaires,
the statistical process. (iv) Through schedules
69 70

18
2/13/2024

Observation Method
Interview Method
 Under the observation method, the information is sought by way  The interview method of collecting data involves presentation of
of investigator’s own direct observation without asking from the oral-verbal stimuli and reply in terms of oral-verbal responses.
respondent. This method can be used through personal interviews and, if
possible, through telephone interviews.
 Advantage: Subjective bias is eliminated, if observation is done
accurately. Secondly, the information obtained under this method  Personal interview method requires a person known as the
relates to what is currently happening; it is not complicated by interviewer asking questions generally in a face-to- face
either the past behaviour or future intentions or attitudes. Thirdly, contact to the other person or persons. At times the interviewee
this method is independent of respondents’ willingness to may also ask certain questions and the interviewer responds to
respond these, but usually the interviewer initiates the interview and
collects the information.
 Limitations: Firstly, it is an expensive method. Secondly, the
information provided by this method is very limited. Thirdly,
sometimes unforeseen factors may interfere with the
observational task. 71 72

19
2/13/2024

Interview Method Through Questionnaires


 The method of collecting information through personal interviews
is usually carried out in a structured way. As such we call the
 This method of data collection is quite popular, particularly in
interviews as structured interviews. Such interviews involve the use
case of big enquiries.
of a set of predetermined questions and of highly standardized
techniques of recording.
 It is being adopted by private individuals, research workers,
private and public organisations and even by governments.
 While Telephonic Interview method ; collecting information
consists of contacting respondents on telephone itself. It is not a
 In this method a questionnaire is sent to the persons concerned
very widely used method, but plays important part in industrial
with a request to answer the questions and return the
surveys, particularly in developed regions.
questionnaire.

73 74

20
2/13/2024

Through Schedules DIFFERENCE BETWEEN QUESTIONNAIRES


 This method of data collection is very much like the collection of data through AND SCHEDULES
questionnaire, with little difference which lies in the fact that schedules (proforma Both questionnaire and schedule are popularly used methods of collecting
containing a set of questions) are being filled in by the enumerators who are data in research surveys. There is much resemblance in the nature of these
specially appointed for the purpose. two methods. But from the technical point of view there is difference between
the two. The important points of difference are as under:
 This method requires the selection of enumerators for filling up schedules or The questionnaire is generally sent through mail to informants to be
assisting respondents to fill up schedules and as such enumerators should be answered as specified in a covering letter without further assistance from
very carefully selected. The enumerators should be trained to perform their job well the sender. The schedule is generally filled out by the research worker or
and the nature and scope of the investigation should be explained to them the enumerator, who can interpret questions when necessary.
thoroughly so that they may well understand the implications of different questions
put in the schedule. To collect data through questionnaire is relatively cheap and economical
since we have to spend money only in preparing the questionnaire and in
 This method of data collection is very useful in extensive enquiries and can lead to mailing the same to respondents. Here no field staff required. To collect
fairly reliable results. It is, however, very expensive and is usually adopted in data through schedules is relatively more expensive since considerable
investigations conducted by governmental agencies or by some big organisations. amount of money has to be spent in appointing enumerators and in
Population census all over the world is conducted through this method. importing training to them. Money is also spent in preparing schedules.

75 76

21
2/13/2024

DIFFERENCE BETWEEN QUESTIONNAIRES Secondary Data Collection


AND SCHEDULES Secondary data means data that are already available i.e., they refer to
Non-response is usually high in case of questionnaire as many people do the data which have already been collected and analysed by someone
not respond and many return the questionnaire without answering all else.
questions. Non-response is generally very low in case of schedules because
these are filled by enumerators who are able to get answers to all questions.
In case of questionnaire, it is not always clear as to who replies, but in case
Researcher must be very careful in using secondary data. He must
of schedule the identity of respondent is known. make a minute scrutiny because it is just possible that the secondary
The questionnaire method is likely to be very slow since many respondents data may be unsuitable or may be inadequate in the context of the
do not return the questionnaire in time despite several reminders, but in case problem which the researcher wants to study.
of schedules the information is collected well in time as they are filled in by
enumerators.
Wider and more representative distribution of sample is possible under the
Due to the above mentioned reason, before using a secondary data one
questionnaire method, but in respect of schedules there usually remains the must see that they possess following characteristics i.e. reliability,
difficulty in sending enumerators over a relatively wider area. suitability, and adequacy. If answer comes yes then only move
further with the data obtained.

77 78

22
2/13/2024

Secondary Data Collection Secondary Data Collection


Reliability Adequacy
The reliability can be tested by finding out such things about the said data: If the level of accuracy achieved in data is found inadequate for the purpose
Who collected the data? of the present enquiry, they will be considered as inadequate and should not
What were the sources of data? be used by the researcher.
Were they collected by using proper methods The data will also be considered inadequate, if they are related to an area
which may be either narrower or wider than the area of the present enquiry.
At what time were they collected?
Was there any bias of the compiler?
What level of accuracy was desired?
Was it achieved ?
Suitability
The data that are suitable for one enquiry may not necessarily be found suitable in
another enquiry. The researcher must very carefully scrutinize the definition of various
terms and units of collection used at the time of collecting the data from the primary
source. Similarly, the object, scope and nature of the original enquiry must also be
studied. If the researcher finds differences in these, the data will remain unsuitable for
the present enquiry and should not be used.

79 80

23
2/13/2024

Guidelines for Constructing Questionnaire /


Schedule Steps in Questionnaire Design
There are no hard-and-fast rules about how to design a questionnaire,
but there are a number of points that can be borne in mind:

1. A well-designed questionnaire should meet the research objectives.


2. It should obtain the most complete and accurate information possible.
The questionnaire designer needs to ensure that respondents fully
understand the questions and are not likely to refuse to answer, lie to the Choose
interviewer or try to conceal their attitudes. the Put
Check
Decide Define method(s question
Decide Develop the Pre-test Develop
3. A good questionnaire is organized and worded to encourage the
informati
the
target
) of
reaching
on the
s into a
meaningf length of the the final
question question the question survey
respondents to provide unbiased information. on
required.
responde
nts.
your
target
content. wording.
ul order
and
question naire. form.
naire.
responde format.
4. A well-designed questionnaire should make it easy for respondents to nts.
give the necessary information and for the interviewer to record the
answer, and it should be arranged so that sound analysis and
interpretation are possible.
5. It would keep the interview brief and to the point and be so arranged
that the respondent(s) remain interested throughout the interview.

81 82

24
2/13/2024

Steps in Data Pre Processing Steps in Data Pre Processing


Data Pre-processing refers to the cleaning, transforming, and integrating of data (b). Noisy Data:
in order to make it ready for analysis. The goal of data preprocessing is to
improve the quality of the data and make it more suitable for the specific data Noisy data is a meaningless data that can’t be interpreted by algorithms.
analysis. It can be generated due to faulty data collection, data entry errors etc. It
can be handled in following ways :
Steps Involved in Data Preprocessing:
Binning Method:
1. Data Cleaning: This method works on sorted data in order to smooth it. The whole data is
divided into segments of equal size and then various methods are
The data can have many irrelevant and missing parts. To handle this part, data performed to complete the task. Each segmented is handled separately.
cleaning is done. It involves handling of missing data, noisy data etc. One can replace all data in a segment by its mean or boundary values
(a). Missing Data: can be used to complete the task.
This situation arises when some data is missing in the data. It can be handled in
various ways.
Regression:
Some of them are:
Here data can be made smooth by fitting it to a regression function. The
Ignore the tuples: regression used may be linear (having one independent variable) or
This approach is suitable only when the dataset we have is quite large and multiple (having multiple independent variables).
multiple values are missing within a tuple.
Fill the Missing values:
Clustering:
There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value. This approach groups the similar data in a cluster. The outliers may fall
outside the clusters.
83 84

25
2/13/2024

Steps in Data Pre Processing Steps in Data Pre Processing


2. Data Transformation: • Feature Extraction: This involves transforming the data into a lower-
This step is taken in order to transform the data in appropriate forms dimensional space while preserving the important information. Feature
suitable for analysis process. This involves following ways: extraction is often used when the original features are high dimensional and
Normalization: complex. It can be done using techniques such as PCA, linear discriminant
It is done in order to scale the data values in a specified range (-1.0 to analysis (LDA), and non-negative matrix factorization (NMF).
1.0 or 0.0 to 1.0) • Sampling: This involves selecting a subset of data points from the dataset.
Sampling is often used to reduce the size of the dataset while preserving
Attribute Selection: the important information. It can be done using techniques such as random
In this strategy, new attributes are constructed from the given set of sampling, stratified sampling, and systematic sampling.
attributes to help the analysis.
• Clustering: This involves grouping similar data points together into clusters.
3. Data Reduction: Clustering is often used to reduce the size of the dataset by replacing
Data reduction is a crucial step in the data mining process that involves similar data points with a representative centroid. It can be done using
reducing the size of the dataset while preserving the important
information. This is done to improve the efficiency of data analysis and to techniques such as k-means, hierarchical clustering, and density-based
avoid overfitting of the model. Some common steps involved in data clustering.
reduction are:
Feature Selection: This involves selecting a subset of relevant features
from the dataset. Feature selection is often performed to remove
irrelevant or redundant features from the dataset. It can be done using
various techniques such as correlation analysis, mutual information, and
principal component analysis (PCA).
85 86

26

You might also like