0% found this document useful (0 votes)

6 views83 pages

Week 12 - Data Analysis

The document outlines a lecture on data analysis in epidemiological research, focusing on evaluating and preparing datasets for analysis, data cleaning, and statistical tests for categorical data. It details the steps of data analysis, methods for checking invalid values in datasets, and various statistical tests applicable to categorical data. Additionally, it provides examples of using SAS procedures for data analysis and cleaning.

Uploaded by

KinSparkin'

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views83 pages

Week 12 - Data Analysis

Uploaded by

KinSparkin'

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 83

Week 12- Data Analysis

PHEB 689: SAS PROGRAMMING FOR

EPIDEMIOLOGICAL RESEARCH
Xiaohui Xu, Ph.D.
Department of Epidemiology and Biostatistics
Part 1- Evaluate and
Prepare datasets for
analysis
Lecture Outlines
 Understanding the steps of data analysis
 Check invalid values for character variables
o Proc Freq
o Data step
 Check invalid values for numeric variables
o Proc means
o Proc univariates
o Data step
 Data cleaning and check data accuracy again
 Checking data normality and data transformation if necessary
Part 1. 1 Data
analysis steps
Sources of research data
• Primary sources
• The researcher or team of researchers designs, collects, and
analyzes the data, for the purpose of answering a research
question
• Secondary Data Sources
• Existing data collected for another purposes, that you use to
answer your research question
Steps for data Analysis

 Understand the dataset;

 Data cleaning;
 Data analysis;
 Sharing results;
Understand the dataset

The information includes:

 The variables' names, types, and attributes (including formats, informats,
and labels)
 Definitions, valid values and units of variables
 How many observations are in the dataset
 How many variables are in the dataset
 When the dataset was created
Access the information from two ways

• Receiving metadata or data dictionary;

• Running SAS procedure to get the information
Content procedure

• The CONTENTS procedure shows the contents of a SAS data set and prints the directory
of the SAS library
PROC CONTENTS <option-1 <...option-n>>;
Content procedure
Part 1.2 Data
cleaning
Data cleaning
• Data cleaning is the process of editing, correcting, and structuring data within a data set
so that it’s generally uniform and prepared for analysis
• Data cleaning is one of the important processes involved in data analysis
Activities in data cleaning
 Removal of Unwanted Observations
 Duplicate Observations
 Irrelevant Observations
 Fix Data Structure
 Define missing
 Filter out data outliers
 Removal of invalid values
 Data transformation
Demonstration using Patient.txt

Note: Cody Data cleaning 101

2.1 Checking For Invalid Character
Values
• A very simple approach to identifying invalid character values in this file is to use PROC
FREQ to list all the unique values of these variables.
2.1.1. Syntax-
FREQ
Procedure
PROC FREQ < options > ;
BY variables ;
EXACT statistic-options < /
computation-options > ;
OUTPUT <OUT=SAS-data-
set > output-options ;
TABLES requests < /
options > ;
TEST options ;
WEIGHT variable < / option
>;
2.1.2 Invalid data in Patients

Gender
PROC FREQ DATA=PATIENTS; Gender Frequency Percent Cumulative Cumulative
Frequency Percent
TITLE "FREQUENCY COUNTS";
1 1 1.02 1 1.02
TABLES GENDER AE / NOCUM
F 52 53.06 53 54.08
NOPERCENT; M 43 43.88 96 97.96
RUN; f 1 1.02 97 98.98
x 1 1.02 98 100.00
Frequency Missing = 3
proc freq data=patients;
table _character_;
run;
2.1.3 DATA _NULL_;
INFILE ‘…\Patients.txt' PAD;

Using A FILE PRINT;

TITLE "LISTING OF INVALID DATA";
input @1 Patno $3.
Data Step @11 Gender
@31 Dx
$1.
$7.

To @38 AE
***CHECK GENDER;
1.;

IF GENDER NOT IN ('F','M',' ') THEN

Identify PUT PATNO= GENDER=;
***CHECK DX;
Invalid IF VERIFY(DX,' 0123456789') NE 0
THEN PUT PATNO= DX=;
Characte ***CHECK AE;
IF AE NOT IN ('0','1',' ') THEN PUT

r Values
PATNO= AE=;
RUN;
2.2 Checking For Invalid Numerical
Values
• PROC MEANS and PROC UNIVARIATE can be useful as a first step in data cleaning for
numeric variables.
• Data step;
2.2.1 Mean Procedure
PROC MEANS <option(s)> <statistic-keyword(s)>;
BY <DESCENDING> variable-1 <<DESCENDING> variable-2 …>
<NOTSORTED>;
CLASS variable(s) </ option(s)>;
FREQ variable;
ID variable(s);
OUTPUT <OUT=SAS-data-set> <output-statistic-specification(s)>
<id-group-specification(s)> <maximum-id-specification(s)>
<minimum-id-specification(s)> </ option(s)> ;
TYPES request(s);
VAR variable(s) </ WEIGHT=weight-variable>;
WAYS list;
WEIGHT variable;
Proc means statistic-keyword(s)
Part 2- Categorical data
analysis
Lecture Outlines
 Choosing the correct statistical test for categorical data
 Categorical data analysis -one sample
o Binomial test
o Chi-square goodness of fit test
 Categorical data analysis-two samples
o Two independent samples (2x2 table)
 Chi-square test (large sample)
 Fisher’s exact test (small sample)
 Crude Odds Ratio and Relative risk
o Two matched samples
 Paired samples- McNemar test and kappa coefficient
 Odds Ratio estimation matched pair case-control study
 Chi-square test for trend (2*N table)
 Stratified tables analysis - the cohort Mantel-Haenszel statistics
Acknowledgement
• The lecture slides are developed based on several
different resources including:
• Online SAS helper;
• Books: Applied Statistics and the SAS Programming language 5th
Edition
• Online reference to UCLA Institute for Digital Research
&Education
2.1 Choosing correct
statistical tests
Types of variable
• Categorical or nominal
• Gender (male vs. female);
• Ordinal
• Age group (<20, 20-39, 40-59, 60+)
• Smoking (0, 1-5, 5-9, 10+ cigarettes per day)
• Interval (also called numerical)
• Cholesterol levels (mg/dL)
• Weight (pounds)
Statistical tests for categorical
data
Number of Nature of Independent Nature of Test(s)
Dependent Variables Dependent
Variables Variable(s)*
1 0 IVs (1 population) categorical (2
binomial test
categories)
Categorical (2+ Chi-square goodness-
categories) of-fit
1 IV with 2 levels Categorical
(independent groups) (large sample Chi-square test
size)
Small sample
Fisher’s exact test
size
1 IV with 2 or more
levels (independent categorical Chi-square test
groups)
Statistical tests for categorical
data 2
Number of Nature of Independent Nature of Test(s)
Dependent Variables Dependent
Variables Variable(s)*
1 IV with 2 levels
(dependent/matched categorical McNemar test
groups
1 IV with 2 or more
levels categorical (2 Conditional logistic
(dependent/matched categories) regression
groups)
2 or more IVs categorical (2+
logistic regression
(independent groups categories)
1 interval IV categorical simple logistic
regression
1 or more interval IVs categorical
multiple logistic
and/or 1 or more
regression
Variable Description
name
female Gender of students

Leture dataset ses Social economic status(1=low 2=middle

3=high)
• HSB data file race Ethnic background (1=hispanic 2=asian
• This data file 3=african-amer 4=white)
contains 200 schtyp type of school (1=public 2=private)
observations from
prog type of program (1=general 2=academic
a sample of high
3=vocational)
school students
with demographic read Reading scores on standardized tests
information about
write Writing scores on standardized tests
the students.
math Mathematics scores
science Science scores
socst Social studies scores
2.2. Test for One
Sample Population
2.1 Binomial test (two categories)
• A one sample binomial test allows us to test whether the proportion of successes on a
two-level categorical dependent variable significantly differs from a hypothesized value.

• For example
• H0: P=0.5
Example: Binominal test
•In the HSB data, test whether the proportion of females
(female) differs significantly from 50%, i.e., from .5.
• We will use the exact statement to produce the exact p-
values.
proc freq data = wk11.hsb2;
tables female / binomial(p=.5);
exact binomial;
run;
Table Statement
 TABLES requests </ options> ;
• The TABLES statement requests one-way to n-way frequency and crosstabulation tables and statistics for those
tables.
•Options:
• BINOMIAL <(binomial-options)>
BIN <(binomial-options)>
• requests the binomial proportion for one-way tables. When you specify this option, by default PROC FREQ
provides the asymptotic standard error, asymptotic Wald and exact (Clopper-Pearson) confidence limits,
and the asymptotic equality test for the binomial proportion.
EXACT statement
 EXACT statistic-options </ computation-options> ;
• The EXACT statement requests exact tests and confidence limits for selected statistics. The statistic-
options identify which statistics to compute, and the computation-options specify options for computing exact
statistics.

• Statistic options
BINOMIAL /BIN
• requests an exact test for the binomial proportion (for one-way tables).
Output
• The results indicate that there is
no statistically significant
difference (p = .2292). In other
words, the proportion of
females in this sample does not
significantly differ from the
hypothesized value of 50%.
2.2 Chi-square goodness of fit (2+
categories)
• A chi-square goodness of fit test allows us to test whether the observed proportions for a
categorical variable differ from hypothesized proportions.
Example
• let’s suppose that we believe that the general population consists of 10% Hispanic, 10%
Asian, 10% African American and 70% White folks. We want to test whether the observed
proportions from our sample differ significantly from these hypothesized proportions.

proc freq data = wk11.hsb2;

tables race / chisq testp=(10 10 10 70);
run;
Options of Table statement
•CHISQ <(chisq-options)>
• For one-way tables, the CHISQ option provides the Pearson chi-square goodness-of-fit
test. You can also request the likelihood ratio goodness-of-fit test for one-way tables by
specifying the LRCHI chisq-option in parentheses after the CHISQ option.

• chisq-options
• TESTP=(values)| SAS-data-set
• specifies null hypothesis proportions for the one-way chi-square goodness-of-fit
tests.
Output
• These results show that racial
composition in our sample does
not differ significantly from the
hypothesized values that we
supplied (chi-square with three
degrees of freedom = 5.0286, p
= .1697).
2.3. R x C table
analysis
3.1 Two + independent sample populations

• Chi-square test (large samples)

• Fisher’s exact test (Small samples)
• Crude Odds Ratio and 95%CI
• Crude Relative Risk estimations and 95%CI
3.1.1 Chi-square test (large
samples)
• H0:
• There is no association between the row variable and the column variable
• Methods include:
• Pearson chi-square (Large sample size: differences between the observed and expected frequencies)
• Continuity-adjusted chi-square (Small sample size; similar to Fisher’s exact test)
• Likelihood-ratio chi-square (based on the ratio of the observed to the expected frequencies)

• Mantel-Haenszel chi-square (Linear association; Ordinal categorical variable)

Grouping syntax In table statement
Example

• Using the hsb2 data file, let’s see if there is a relationship between the type of school
attended (schtyp) and students’ gender (female).

proc freq data = wk11.hsb2;

tables schtyp*female / chisq;
run;
CHISQ- table statement option
• CHISQ <(chisq-options)>
• For two-way tables, the chi-square tests include the Pearson chi-square,
likelihood ratio chi-square, and Mantel-Haenszel chi-square tests. The chi-
square measures include the phi coefficient, contingency coefficient, and
Cramér’s V.
Output
• These results indicate
that there is no
statistically significant
relationship between
the type of school
attended and gender
(chi-square with one
degree of freedom =
0.0470, p = 0.8283).
3.1.2 Fisher’s exact test (Small
Samples)
• The Fisher’s exact test is used when you want to conduct a chi-square test, but
one or more of your cells has an expected frequency of less than five.
• Fisher’s exact test does not depend on any large-sample distribution
assumptions,
• It is appropriate even for small sample sizes and for sparse tables.
Example
• Using the hsb2 data file, let’s see if there is a relationship between the type of school
attended (schtyp) and students’ race (Race).

proc freq data = wk11.hsb2;

tables schtyp*race / fisher;
run;
Fisher- Table statement option
• FISHER
• Requests Fisher’s exact test for tables that are larger than 2x2
• For 2x2 table, the CHISQ option provides Fisher’s exact test.
• These results suggest that there is not a
statistically significant relationship between
race and type of school (p = 0.5975). Note
that the Fisher’s exact test does not have a
“test statistic”, but computes the p-value
directly
Output
3.1.3 Crude Odds Ratio
• A unmatched case control study was conducted to examine the association between brain
tumor and benzene exposure. The data is listed as below.

Cases Control
s
Exposur Yes 50 20
e No 100 130
Using the code to generate the
dataset
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
CASE 1-YES 50
CASE 2-NO 100
CONTROL 1-YES 20
CONTROL 2-NO 130
;
Code for OR calculations

PROC FREQ DATA=ODDS;

TITLE "Program to Compute an Odds Ratio";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
Options in table statement
•CMH <(cmh-options)>
• For 2*2 tables, the CMH option provides the adjusted Mantel-Haenszel and logit estimates of the odds ratio
and relative risks, together with their confidence limits

•OR <(CL=type | (types )> ODDSRATIO <(CL=type | (types)>

• requests the odds ratio and confidence limits for tables.
COMMON ODDS RATIO AND RELATIVE RISKS

Statistic Method Value 95% Confidence Limi

Output ts
Odds Ratio Mantel- 3.2500 1.8189 5.8070
• The OR=3.25, 95% CI is 1.8189-
5.807. Thus Odds of benzene Haenszel
exposure in cases is 3.25 times
Logit 3.2500 1.8189 5.8070
higher than the odds of benzene
exposure in controls. Relative Mantel- 1.6429 1.3331 2.0246
Risk
(Column 1)
Haenszel

Logit 1.6429 1.3331 2.0246

Relative Mantel- 0.5055 0.3432 0.7446
Risk
(Column 2)
Haenszel

Logit 0.5055 0.3432 0.7446

3.1.4 Crude RR estimations
• A prospective cohort study is conducted to investigate the effect of Cholesterol on heart
attacks (MI). The data was listed as below.

Heart Attack
Yes No
Choleste High 20 80
rol Low 15 135
Create the dataset using the
table
DATA RR;
LENGTH GROUP $ 9;
INPUT GROUP $ OUTCOME $ COUNT;
DATALINES;
HC Y 20
HC N 80
LC Y 15
LC N 135
;
Code for analysis

proc sort data=RR;

by GROUP descending OUTCOME;
run;
PROC FREQ DATA=RR ORDER=DATA;
TITLE "Program to Compute a Relative
Risk";
TABLES GROUP*OUTCOME / CMH;
WEIGHT COUNT;
RUN;
Output
The Relative Risk is
2.00, 95%CI 1.076-
3.717
3.2 Matched sample
populations
• McNemar test and kappa coefficient
• Odds Ratio for matched pair case-control study
3.2.1 McNemar test and kappa
coefficient
• McNemar test
• A non-parametric test used to analyze paired nominal data.
• To determine if the proportions of categories in two related groups significantly differ from
each other.
• kappa coefficient
• A measure of interrater agreement
• When there is perfect agreement between the two ratings, the kappa coefficient equals +1.
• Here is one possible interpretation of Kappa.
• Poor agreement = Less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
PROC FORMAT;
VALUE $OPINION 'P'='Positive'
'N'='Negative';
Example RUN;
DATA MCNEMAR;
LENGTH AFTER BEFORE $ 1;
To check the change of altitude before INPUT AFTER $ BEFORE $ COUNT;
and after an educational innervation for
FORMAT BEFORE AFTER $OPINION.;
the same individual.
DATALINES;
N N 32
N P 30
P N 15
P P 23
;
Code for the analysis

PROC FREQ DATA=MCNEMAR;

TITLE "McNemar's Test for Paired
Samples";
TABLES BEFORE*AFTER / AGREE ;
WEIGHT COUNT;
RUN;
Agree- Table statement option
• AGREE <(agree-options)>
• requests tests and measures of classification agreement for square tables.
• This option provides the simple and weighted kappa coefficients along with their
standard errors and confidence limits.
• This option provides McNemar’s test;
Output
• The altitude has
significantly changed
before and after an
educational
innervation for the
same individual
(p=0.0253).
Output-Kappa coefficient
Practice
• Two radiologists read the x-ray to make a diagnosis of a disease. The result is listed in the
table

Radiologis Radiologist 2
t1 no yes

no 25 3
yes 5 50
3.2.2 Odds Ratio for matched pair
case-control study
• A matched pair case control study was conducted to examine alcohol consumption on liver
disease. The data was listed as below

Cases
prese absent
nt
Contro prese 15 5
ls nt
absen 20 60
Create the data using do loop
data a;
do case = 'present','absent';
do control = 'present','absent';
input count @@;
output;
end;
end;
datalines;
15 20
5 60
;
Wrong code for OR

proc freq order=data;

weight count;
table case * control / agree relrisk;
run;
data indiv;
set a;
retain id 0;
do id=id+1 to id+count;
factor=case; response='case'; output;
Data transformation
factor=control; response='control';
output;
end;
keep id factor response;
run;

proc freq order=data;

table id*factor*response / cmh noprint;
run;
Output
• The correct estimate
of the odds ratio from
this matched pairs
data is 4.0 which is
provided by the
Mantel-Haenszel
estimate from the
CMH option in
PROC FREQ
3.3 Chi-square test for trend
• When the group variable is an ordinal categorical factor, chi-square test for trend is used
to detect if the proportion is linearly increasing or decreasing across the N levels of this
ordinal categorical variable.
Example
• Test if the proportions of “fail” in groups A through D is linearly increasing?

Group
A B C D
Test Fail 10 15 14 25
Results Pass 90 85 86 75
SAS code for test

DATA TREND;
INPUT RESULT $ GROUP $ COUNT @@;
DATALINES;
FAIL A 10 FAIL B 15 FAIL C 14 FAIL D 25
PASS A 90 PASS B 85 PASS C 86 PASS D 75
;
PROC FREQ DATA=TREND;
TITLE "Chi-square Test for Trend";
TABLES RESULT*GROUP / CHISQ;
WEIGHT COUNT;
RUN;
Output
• There is a significant
linear trend in
proportions of Fail from
group A through D.
3.4 stratified tables
• Sometimes we need to stratify the study group into several subgroups based on one or
two factors. Then we examine 2*2 tables or R*C tables in each subgroup and generate a
summary across the subgroups.
Stratification-Why?
• To better control the confounding effects of a third factor, we need to examine the
association between an exposure and a disease in each stratum of this third variable.
• Objectives of this analysis include
 Determine if the association between an exposure and a disease in each stratum of this third variable are
similar or statistically significantly different.
 If different, the estimates of an exposure on disease in each stratum of this third variable are required to
present.
 If not different, the cohort Mantel-Haenszel statistics can provide the summary statistics across all the strata.
Example
• We examine the relationship between hours of sleep and the chance of failing a test by
gender.
TEST
Fail Pass
Boys
Sleep Low 20 100
High 15 150

TEST
Fail Pass
Girls Sleep Low 30 100
High 25 200
DATA ABILITY;
INPUT GENDER $ RESULTS $ SLEEP $
COUNT;
Code to create the DATALINES;
BOYS FAIL 1-LOW 20
dataset
BOYS FAIL 2-HIGH 15
BOYS PASS 1-LOW 100
BOYS PASS 2-HIGH 150
GIRLS FAIL 1-LOW 30
GIRLS FAIL 2-HIGH 25
GIRLS PASS 1-LOW 100
GIRLS PASS 2-HIGH 200
;
Code for analysis
PROC FREQ DATA=ABILITY;
TITLE "Mantel-Haenszel Chi-square Test";
TABLES GENDER*SLEEP*RESULTS/ALL;
WEIGHT COUNT;
RUN;

ALL
requests all tests and measures that are produced by the CHISQ, MEASURES, and CMH
options.
Output
• The Breslow-day test
suggests that the
relationships
between sleep hours
and fail of a test are
not different between
boys and girls
Step 2: provide the summary
statistics across all the strata

Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
203 pages
Apollo 11 Technical Air-To-Ground Voice Transcription
100% (1)
Apollo 11 Technical Air-To-Ground Voice Transcription
626 pages
Exampleof Thesis
No ratings yet
Exampleof Thesis
23 pages
Data Management - NN - KO
No ratings yet
Data Management - NN - KO
30 pages
Proc Freq
No ratings yet
Proc Freq
57 pages
SPSS - UNit 4 - SEC
No ratings yet
SPSS - UNit 4 - SEC
104 pages
Applied Statistics and The SAS Programming 5th Edition
0% (2)
Applied Statistics and The SAS Programming 5th Edition
44 pages
SPSS Practical
No ratings yet
SPSS Practical
31 pages
Postmidterm Session PPTs
No ratings yet
Postmidterm Session PPTs
442 pages
The Optimal Design of Pressure Swing Adsorption Systems
No ratings yet
The Optimal Design of Pressure Swing Adsorption Systems
27 pages
Freq PDF
No ratings yet
Freq PDF
207 pages
SPSS Data Analysis
100% (6)
SPSS Data Analysis
47 pages
Participant Workbook 10916
100% (1)
Participant Workbook 10916
630 pages
Sas Ron Cody
No ratings yet
Sas Ron Cody
35 pages
Phi Tham So - Dofile
No ratings yet
Phi Tham So - Dofile
5 pages
Nonparametric Statistical Methods
No ratings yet
Nonparametric Statistical Methods
110 pages
BOT 315 Chi Square Lecture
No ratings yet
BOT 315 Chi Square Lecture
13 pages
Tian Statistics Lesson 4 Frequency Distribution Definition and Properties of Probability
No ratings yet
Tian Statistics Lesson 4 Frequency Distribution Definition and Properties of Probability
54 pages
EES401-Quantitative Data Analysis in Earth and Environmental Sciences
No ratings yet
EES401-Quantitative Data Analysis in Earth and Environmental Sciences
12 pages
7 - Data Analysis and Interpretation (Part 1) PPNCKH p7
No ratings yet
7 - Data Analysis and Interpretation (Part 1) PPNCKH p7
41 pages
C747 Transcripts Part1
No ratings yet
C747 Transcripts Part1
40 pages
L9 Planning Data Management & Analysis
No ratings yet
L9 Planning Data Management & Analysis
26 pages
Probablity Lab
No ratings yet
Probablity Lab
47 pages
U9 2-ContingencyTables
No ratings yet
U9 2-ContingencyTables
24 pages
Complete SAS Code For Workout
No ratings yet
Complete SAS Code For Workout
52 pages
Chapter 6 - Evaluating Quantitative Data
No ratings yet
Chapter 6 - Evaluating Quantitative Data
21 pages
Guido's Guide To PROC UNIVARIATE: A Tutorial For SAS® Users
No ratings yet
Guido's Guide To PROC UNIVARIATE: A Tutorial For SAS® Users
18 pages
Introduction To SPSS Edited
No ratings yet
Introduction To SPSS Edited
24 pages
Freq Procedure: For Creating Frequency Tables & Contingency Tables
No ratings yet
Freq Procedure: For Creating Frequency Tables & Contingency Tables
23 pages
Topic: Generating Reports
No ratings yet
Topic: Generating Reports
15 pages
BRM File
No ratings yet
BRM File
55 pages
Ism Record
No ratings yet
Ism Record
34 pages
Import Xls Sas Code
No ratings yet
Import Xls Sas Code
6 pages
10 Data Preparation
No ratings yet
10 Data Preparation
42 pages
Sas Manual For Introduction To The Practice of Statistics
No ratings yet
Sas Manual For Introduction To The Practice of Statistics
263 pages
EZT User Guide
0% (1)
EZT User Guide
176 pages
Spss Training Manual
No ratings yet
Spss Training Manual
94 pages
BRM Kanishsak
No ratings yet
BRM Kanishsak
36 pages
ADV Res Methods Spss Practice
No ratings yet
ADV Res Methods Spss Practice
46 pages
Kunci Identifikasi Amfibi Dan Reptil
No ratings yet
Kunci Identifikasi Amfibi Dan Reptil
424 pages
Data Cleaning
No ratings yet
Data Cleaning
10 pages
Stsa 3732 Sas Notes 2
No ratings yet
Stsa 3732 Sas Notes 2
13 pages
BRM Lab File
No ratings yet
BRM Lab File
52 pages
2009 Fall Urbp 204a Spss Tutorial
No ratings yet
2009 Fall Urbp 204a Spss Tutorial
8 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
Worksheet 1
No ratings yet
Worksheet 1
5 pages
Unit 4
No ratings yet
Unit 4
14 pages
Labmanual For Mba
No ratings yet
Labmanual For Mba
36 pages
Syllabus MAS202 Sp23
No ratings yet
Syllabus MAS202 Sp23
23 pages
Business Research CH-6
No ratings yet
Business Research CH-6
28 pages
Activity-4&5. XII
100% (1)
Activity-4&5. XII
3 pages
Type Name Syntax Description: Learn More Percentile
No ratings yet
Type Name Syntax Description: Learn More Percentile
12 pages
Gas Pipeline Risk Assessment by Internet Application
100% (1)
Gas Pipeline Risk Assessment by Internet Application
4 pages
Descriptive Statistics Using SAS
No ratings yet
Descriptive Statistics Using SAS
10 pages
Literasi Dalam Bahasa Inggris - SIMULASI UTBK2024
No ratings yet
Literasi Dalam Bahasa Inggris - SIMULASI UTBK2024
3 pages
Lab Record
No ratings yet
Lab Record
59 pages
ML Questions
No ratings yet
ML Questions
9 pages
SPSS Introduction
No ratings yet
SPSS Introduction
46 pages
Saadi Isrt
No ratings yet
Saadi Isrt
15 pages
RM Lab File Harpreet Kaur Combined
No ratings yet
RM Lab File Harpreet Kaur Combined
124 pages
Statistical Packages - SPSS - ABH
No ratings yet
Statistical Packages - SPSS - ABH
68 pages
Oup 9
No ratings yet
Oup 9
26 pages
Add Names For The Following Examples in The Practice Questionnaire: Serial No., Section A. Question 1, Section A. Question 2
No ratings yet
Add Names For The Following Examples in The Practice Questionnaire: Serial No., Section A. Question 1, Section A. Question 2
6 pages
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
No ratings yet
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
71 pages
SIP Action Plan Overview
No ratings yet
SIP Action Plan Overview
2 pages
Checking The Normality of A Dataset
No ratings yet
Checking The Normality of A Dataset
6 pages
Propaganda Techniques in Commercial
No ratings yet
Propaganda Techniques in Commercial
4 pages
Volunteerism and Disasters
No ratings yet
Volunteerism and Disasters
18 pages
Dubai Guide For Built Environment Universal Design 1 Compressed
No ratings yet
Dubai Guide For Built Environment Universal Design 1 Compressed
276 pages
Excel Tips
No ratings yet
Excel Tips
2 pages
DSA Lab Manual Final
No ratings yet
DSA Lab Manual Final
44 pages
Finale Quick Reference Card
No ratings yet
Finale Quick Reference Card
2 pages
Forensic Laboratory Setup Requirements.: Computer Forensics
No ratings yet
Forensic Laboratory Setup Requirements.: Computer Forensics
26 pages
Embedded Controllers (18EC2205) Laboratory Manual 2019-2020: Course Coordinator
No ratings yet
Embedded Controllers (18EC2205) Laboratory Manual 2019-2020: Course Coordinator
13 pages
Thermodynamics and Strength of Materials
No ratings yet
Thermodynamics and Strength of Materials
34 pages
Hdfs Default XML Parameters
No ratings yet
Hdfs Default XML Parameters
14 pages
Comandos Linux PDF
No ratings yet
Comandos Linux PDF
10 pages
Sliding Pr-26 Profilco: JANUARY - 2006
No ratings yet
Sliding Pr-26 Profilco: JANUARY - 2006
47 pages
A Low Voltage To High Voltage Level Shifter Circuit For MEMS Application PDF
No ratings yet
A Low Voltage To High Voltage Level Shifter Circuit For MEMS Application PDF
4 pages
Narboux Thompson Clarke's Tight Rope
No ratings yet
Narboux Thompson Clarke's Tight Rope
36 pages
Tomesh Kumar Jain
No ratings yet
Tomesh Kumar Jain
3 pages
The Continuing Battle For Space The Caribbean Challenge Final Session
No ratings yet
The Continuing Battle For Space The Caribbean Challenge Final Session
7 pages
Affordable Housing Speech 3-2-2024
No ratings yet
Affordable Housing Speech 3-2-2024
3 pages
Text Emaillanguage
No ratings yet
Text Emaillanguage
4 pages
Advanced Techniques for Multivariate Data Analysis Using PYTHON. Predictive Models for Classification and Segmentation
From Everand
Advanced Techniques for Multivariate Data Analysis Using PYTHON. Predictive Models for Classification and Segmentation
César Pérez López
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Week 12 - Data Analysis

Uploaded by

Week 12 - Data Analysis

Uploaded by

Week 12- Data Analysis

PHEB 689: SAS PROGRAMMING FOR

 Understand the dataset;

The information includes:

• Receiving metadata or data dictionary;

Note: Cody Data cleaning 101

Using A FILE PRINT;

IF GENDER NOT IN ('F','M',' ') THEN

Leture dataset ses Social economic status(1=low 2=middle

proc freq data = wk11.hsb2;

• Chi-square test (large samples)

• Mantel-Haenszel chi-square (Linear association; Ordinal categorical variable)

proc freq data = wk11.hsb2;

proc freq data = wk11.hsb2;

PROC FREQ DATA=ODDS;

•OR <(CL=type | (types )> ODDSRATIO <(CL=type | (types)>

Statistic Method Value 95% Confidence Limi

Logit 1.6429 1.3331 2.0246

Logit 0.5055 0.3432 0.7446

proc sort data=RR;

PROC FREQ DATA=MCNEMAR;

proc freq order=data;

proc freq order=data;

You might also like