0% found this document useful (0 votes)
6 views

Data Processing and Univariate

The document discusses research methods for business, including editing raw data to ensure accuracy, coding responses for analysis, and univariate data analysis. It covers editing collected data to check for errors or omissions. Coding involves assigning numeric codes to qualitative responses to categorize the data for analysis. Univariate analysis examines one variable at a time through descriptive statistics like frequencies, percentages, and measures of central tendency.

Uploaded by

lâm nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Data Processing and Univariate

The document discusses research methods for business, including editing raw data to ensure accuracy, coding responses for analysis, and univariate data analysis. It covers editing collected data to check for errors or omissions. Coding involves assigning numeric codes to qualitative responses to categorize the data for analysis. Univariate analysis examines one variable at a time through descriptive statistics like frequencies, percentages, and measures of central tendency.

Uploaded by

lâm nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Research Methods for Business

Contents...
1. Editing
2. Coding
3. Data entry
4. Univariate data analysis

 To detect errors and omissions, to correct them


when possible.
 The purpose is to guarantee that data are:
Accurate
Consistent with the intent of the question and other
information
Uniformly entered
Complete
Arranged to simplify coding and tabulation

1
Research Methods for Business

 By the field supervisor, soon after the data have


been collected.
 Especially in a personal interview or paper-and-
pencil observation.
 Callback should be made rather than guessing
 To validate the results by reinterviewing some
(#10%) of the respondents, verifying that they have
participated and that the interviewer performed
adequately.

 By researcher or senior office supervisor

 After all data have been collected

 Sometimes answers can be detected by reviewing other


information in the data set (ex: marital status)

 Editing can also help detect fake interviews by checking


open-ended questions.

 Edited data should be identifiable by using different color


pen/ pencil

2
Research Methods for Business

Two categories of DK:


 Legitimate DK responses (when the respondent
does not know the answer):
◦ If the information is meaningful (by researcher’s design) 
keep as one category
◦ If the questions for which a DK response is unsatisfactory
 redesign it.
 Failure to get the needed information (respondent
does not want to answer)
◦ Motivate respondents to provide more usable answers (in
direct interview).
◦ Modify the way to ask to avoid sensitive questions

 Occur when participants accidentally skip, refuse to


answer, or do not know the answer to an item.
 In longitudinal studies, missing data may result from
participants dropping out of the study, or being
absent for one or more data collection periods.

3
Research Methods for Business

 Assign the midpoint in the scale (interval-scaled item)


 Ignore the blank responses
 Assign the mean value of the responses of all those who
have responded to that item
 Assign the mean value of the responses of the
respondent to all other questions measuring the variable
 Give a random number within the range for that scale
 The best way to enhance the validity of the study: omit
the case (big sample size)

Coding = assigning numbers/symbols to answers so


that the responses can be grouped into categories.

Ex: If the variable is gender, the partitions are male &


female.

◦ Categorization: using rules to partition a body of data.

◦ sacrifices data detail but is necessary for analysis.

◦ Statistical programs work more efficiently in numeric


mode.

◦ Instead of entering male or female, we would use


codes 0 for male and 1 for female.

4
Research Methods for Business

 A codebook, or coding scheme, contains each variable in the


study and specifies the application of coding rules to the
variable.
 Used by researcher for data entry and analysis process
 In many statistical programs (i.e. SPSS), the coding scheme
is integral to the data file.
 Most codebooks contain the:
• Question number and Variable name
• Descriptors for the response options
• Specification as to whether the variable is alphabetic or
numeric.

 Responses to closed questions include scaled items


for which answers can be anticipated.

 It is possible to precode the questionnaire during the


design stage.

 Precoding is helpful for data entry

10

5
Research Methods for Business

 Use open-ended questions when being unable to


prepare response categories in advance.

 Researchers must categorize responses after the


data are collected.

 The variety of answers can be staggering,


hampering categorization.

11

The categories within a single variable should be:


 Appropriate (for testing hypotheses/showing
relationships/comparing).
 Exhaustive (fully capture all possible responses)
 Mutually exclusive (each specific answer can only
be placed in one category)
 Derived from one classification principle (every
option in the category set is defined in terms of one
concept or construct. Ex: “unemployed
salesperson”

12

6
Research Methods for Business

 Remember to reverse some negatively worded


questions
 All answers should be in the same direction
 Example with Likert scale

13

 Converts gathered data to a medium for viewing


and analysis.
 Keyboarding remains a main measure for creating a
data file and store it.
 Means of data entry for statistical packages: SPSS
or SAS.

14

7
Research Methods for Business

Example of SPSS data file

15

16

8
Research Methods for Business

 Create your data file on SPSS

17

Selecting the analysis method:


 No of variables to be analyzed at the same time?
 Purpose: sample description or generalization
 Scale of variables

18

9
Research Methods for Business

Start

One More than 2


variable Number of variables
variables

2 variables

Univariate Bivariate Multivariate


Analysis Analysis Analysis

19

20

10
Research Methods for Business

Nominal / ordinal scales


Response Frequency Percentage Cum. Percentage
Category
1(Benthanh) 1 - -
2 (Foster) 3 1 1
3 (Saigon) 45 18 19
4 (Heineken) 120 46 65
5 (Tiger) 92 35 100
261 100%

 Present the distribution of a nominal or ordinal scaled variable.


 Help detect mistakes during data coding/entry.
 Compare with the required distribution of variables.
 Provide a base for data transformation
 Sampling check.
21

N (sample size) 215

Minimum 1

Maximum 5

Mean Statistic 2.25


Std. Error (e) 0.06
Std. Deviation Statistic 0.83
Skewness Statistic 0.57
Std. Error (e) 0.17
Kurtosis Statistic 0.45
Std. Error (e) 0.33 22

11
Research Methods for Business

Ex: The average age of 100 students in a sample


Ā = 24 (s=5).

The researcher wants to test this estimate in the


population:

Null hypothesis Ho: μ = 23


Alternative hypothesis H1: μ ≠ 23

23

S1: Statements of Ho and H1


S2: Select the statistical test
S3: Select α (significance level, normally 0.05)
S4: Calculate the critical value at α sig. (one/two tailed)
S5: Calculate the test statistics from the data set.
S6: If test statistics > critical value  reject Ho

24

12
Research Methods for Business

Use t Test or Z Test to test the population mean given the


sample mean.

Z test is employed when:


+ Std. deviation σ of the population and any sample size.
+ Do not know Std. deviation σ of the population and
sample size > 30
T test is employed when:
+ Do not know Std. deviation σ of the population and any
sample size.
+ When Sample size > 30, t distribution = Z distribution

25

Average age of 100 students in the sample: Ā = 24 (s=5).


The researcher wants to test the estimate in the population:
Null hypothesis Ho: μ = 23
Alternative hypothesis H1: μ ≠ 23

n = 100 >30  employ Z test


Adopted sig. level α = 0.05 (two tailed)  Zc=1.96
Calculate Z with σ unknown : Z = (Ā – μ) n1/2/s
= (24-23) x 1001/2/5 = 2
Z = 2 > Zc = 1.96  reject Ho
 Cannot conclude with 95 confidence that μ = 23
For proportion variable: Z = (p-π)/(pq/n)1/2
26

13
Research Methods for Business

Employ Chi-square test to compare the distribution of categories in the


sample with the expected distribution in the population.

S1: Statements of H0, H1


S2: Determine  and dF.
dF = k – 1 (k = no. of categoriesï)
S3: Calculate Critical Chi-Square  x2c
S4: Calculate Chi-square test value
S5: Accept/reject H0 :
“reject H0 if x2 (test value) > x2c(critical value)”.
There is a statistically significant difference between the sample
distribution and the population distribution of the variable in study.
Chi-Square test is not reliable if the count in any cell <5.

27

Occupation Oi Ei Oi – Ei (Oi - Ei)2 (Oi - Ei)2/ Ei


(1) (2) (3) (4) (5) (6)
Labour 15 25 -10 100 4
Office staff 20 25 -5 25 1

Manager 25 25 5 25 1
Student 35 25 10 100 4

Total 100 100 x2=10

(1): Categories of occupation of internet users


(2): Observed count (3) Expected count
Ho: There is no difference in the occupation of internet users
K=4 categories  dF = k-1 = 3
x2c= 6.25 (  = 10%)
x2=10 > x2c = 6.25  reject H0
28

14

You might also like