0% found this document useful (0 votes)

343 views97 pages

HMX7001 Analysis of Data Using SPSS - Advanced Level

This document provides an outline for exercises in analyzing data using SPSS. It includes 8 exercises that demonstrate various SPSS functions including plotting data, descriptive statistics, correlations analysis, t-tests, ANOVA, cluster analysis, regression models, and principal component analysis. The exercises will use example databases and dummy data to demonstrate preprocessing data, exploring data, and performing both initial and multivariate analyses in SPSS. The document also discusses how statistical analysis fits into typical research frameworks and modeling air pollution and health impacts.

Uploaded by

Lim Kok Ping

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

343 views97 pages

HMX7001 Analysis of Data Using SPSS - Advanced Level

Uploaded by

Lim Kok Ping

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

Advanced Research Methodology (HVX8001)

Analysis of Data Using SPSS

– Advanced Level

Dr. Md Firoz Khan

Department of Chemistry, Faculty of
Science, University of Malaya
HP: 0162645381
Outlines: The List of Exercises
Exercise: I
SPSS: Demonstration with an example Database for plotting
Exercise: II
Demonstration: Summary Descriptive Statistics
Exercise: III
Demonstration: Correlations analysis, paired t-test, ANOVA
Exercise: IV
Demonstration: Cluster Analysis
Exercise: V
Demonstration: Multiple regression model
Exercise: VI
Demonstration: PCA procedure
Exercise: VII
Demonstration (PCR): Dummy Data
Exercise: VIII
Demonstration: PCA-APCS
Flow of the data analysis using SPSS
Removal of outlier

Input data Correction the

missing data

Preprocessing
Replacing the data
below detection with
appropriate
Data analysis procedures
(initial and
multivariate)
Convert data
dimension or
Output normalization if
appropriate
Data analysis by SPSS
Exploratory
Data Analysis

Initial analysis Multivariate

analysis

PCA/AP
CA MLR PCR PLS
Correlation CS
analysis,
Time-series
paired t test,
anova etc. CA: Cluster analysis
PCA/APCS: principal component
Basis analysis/absolute principal component
summary scores
statistics
(mean, MLR: multiple linear regression
med, std, PCR: principal component regression
etc.) PLS: partial least square
A typical research framework and
statistical input!!
Air pollution monitoring
Assessment of MM power plant Lung function performances
(PM2.5)
Exp. Set up
Chemical analysis (trace metals,
ionic and carbon compositions) Biological monitoring

Database

Statistical Analysis &

Health risk assessment Toxicity test
Air pollution modeling
(HRA) (cytotoxicity and DNA damage )

Descriptive statistics, Correlation, t-

test, Anova, p value, Cluster PCA-APCS
analysis, Regression PMF
CMB

Validation of the Emission Sources by Bivariate Rose Plot/Potential Source

Contribution Function (PSCF)/Concentration Weighted Trajectory
(CWT)/HYSPLIT density model/wind vector by GrADS

Strategic Mitigation
Establishment of Appropriate Plan for stakeholder
Emission Sources (Hotspots) (TNBR)
Research output Impact
Exercise: I

SPSS: Demonstration with an

example Database for plotting

Basic of the statistics: Practice

from the previous Lecture
Practice with dummy data

Prepare plotting
in SPSS
95% 1.96 x SD’s from the mean

95% of values

P(score > 130) =

0.025

100 130
70
mean − (1.96  SD ) mean + (1.96  SD )
100 − (1.96  15.3) = 70 100 + (1.96  15.3) = 130
95% of people have an IQ between 70 and 130
Example use of lognormal distribution in our published work
Shape of Data

◼ Shape of data is measured by

◼ Skewness
◼ Kurtosis
Skewness
◼ Measures asymmetry of data
◼ Positive or right skewed: Longer right tail
◼ Negative or left skewed: Longer left tail
Let x1 , x2 ,... xn be n observations. Then,
n
n å ( xi - x ) 3
Skewness = i =1
3/ 2
æ n
2ö
ç å ( xi - x ) ÷
è i =1 ø
Kurtosis
◼ Measures peakedness of the distribution of data. The
kurtosis of normal distribution is 0.

Let x1 , x2 ,... xn be n observations. Then,

n
nå ( xi - x ) 4
Kurtosis = i =1
2
-3
æ n 2ö
ç å ( xi - x ) ÷
è i =1 ø
• Positive or right skewed: Longer right tail
• Negative or left skewed: Longer left tail
• Large Kurtosis > peaky distribution.
• Low Kurtosis > ‘flatter’ distribution.
• Data skewness lies ( -1 to 1 ) and Kurtosis (-3 to
+3)
Exercise: II

Demonstration: Summary
Descriptive Statistics
Practice the basic statistics using
dummy data
Correlation
◼ Strength and direction of the relationship
between variables
◼ Scattergrams

Y Y Y
Y Y Y

X X

Positive correlation Negative correlation No correlation

Example use of correlation plots-Khan et al 2017. JGR

Linearity of r value

r > 0 linear + positive

r < 0 linear + negative
r = 0 no linearity
Exercise: III

Demonstration: Correlations
analysis, paired t-test, ANOVA
Practice correlation analysis with dummy data
Paired t test
ANOVA test
Cluster Analysis (CA)
◼ Unsupervised pattern recognition
◼ Could involve: hierarchical clustering & non-
hierarchical clustering
◼ Dimensionality not reduced like PCA
◼ Generally views objects as points in n-
dimensional measurement space
◼ Objects aggregated step-wise according to the
similarity of their features
◼ Searches for the distance between objects in the
measurement space
◼ Developed primarily by biologists to determine
similarities between organisms
CA
The HCA analysis which primary purpose to assemble objects based on the characteristic
they possess was used in this study is perfomed the Ward’s method by using euclidean
distance as a measure of similarity. This most common technique will produce several
number of clusters that can be presented in the form of chart called ‘dendrogram’ or also
known as hierarchical tree.

A number of common numerical measures of similarity is available:

◼Correlation
◼Mahalanobis distance
◼Manhattan distance
◼Euclidean distance (most common)
◼Chebyshev distance
◼Minkowski distance (unifies Euclidean, Manhattan and Chebyshev distances)
Exercise: IV

Demonstration:
Cluster Analysis
Cluster analysis
General Linear Model
◼ Linear regression is actually a form of the
General Linear Model where the parameters
are b, the slope of the line, and a, the
intercept.
y = bx + a +ε
◼ A General Linear Model is just any model that
describes the data in terms of a straight line
An example use of the Linear Model
[Khan et al. 2015]
Multiple regression
◼ Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a
single dependent variable, y
◼ The different x variables are combined in a linear way
and each has its own regression coefficient:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

◼ The a parameters reflect the independent contribution of

each independent variable, x, to the value of the
dependent variable, y.
◼ i.e. the amount of variance in y that is accounted for by
each x variable after all the other x variables have been
accounted for
Multiple Linear Regression
• Regression refers to the value of a response variable as a
function of the value of an explanatory variable.
• A regression model is a function that describes the
relationship between response and explanatory variables.
• Commonly referred to predictor-predictand method in
earth/environmental sciences.
• A simple linear regression has one explanatory variable and
the regression line is straight.
• The linear relationship of variable Y and X can be written as
in the following regression model form
Y= b0 + b1X + e
where, ‘Y’ is the response variable, ‘X’ is the explanatory
variable, ‘e’ is the residual (error), and b0 and b1 are two
parameters. Basically, bo is the intercept and b1 is the
slope of a straight line y= b0 + b1X.
• By linear, we are referring to the parameters, not the
variables.
Multiple Linear Regression
❖ Response variable is normally distributed.
❖ Relationship between the two variables is
linear.
❖ Observations of response variable are
independent.
❖ Residual error is normally distributed with
mean 0 and constant standard deviation.
◼ Y is expressed as a function of X
(deterministic portion, Ŷ ) plus the random
errors εi which should sum to 0.
◼ There are two parameters that need to be
estimated.
◼ α – slope; β – section.
◼ Method: Least squared method (LSM) –
minimized the sum of squared error.

SSE =  (Yi − Yi ) =  (Yi − X i +  )

ˆ 2 2

i i

• Involve solving sets of simultaneous equations

(linear algebra)
Exercise: V

Simple example use of MLR model

Y = A1X1 + A2X2 + A3*X3………+ AnXn + C

[measured PM10 (μg m-3)] = A1 [measured NOx (μg m~3)] + A2 [measured

sulphate (μg m-3)] + C (μg m-3). [Stedman et al. 2001]

Demonstration: Multiple regression model

A simple Linear Regression
model
Output of MLR model

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 14.427 1.124 12.839 .000
SO4 1.313 .174 .341 7.549 .000
NO3 1.908 .359 .240 5.311 .000
a. Dependent Variable: Mass

Thus, the reconstructed MLR model:

[measured PM10 (μg m-3)] = 1.908× [measured NOx (μg m~3)] + 1.313
×[measured sulphate (μg m-3)] + 14.427 (μg m-3). [Stedman et al. 2001]
An example multiple linear regression model

Practice with Dummy mass closure data

P values
◼ P values = the probability that the observed
result was obtained by chance
◼ i.e. when the null hypothesis is true

◼ α level is set a priori (Usually 0.05)

◼ If p < α level then we reject the null

hypothesis and accept the experimental
hypothesis
◼ 95% certain that our experimental effect is genuine
◼ If however, p > α level then we reject the
experimental hypothesis and accept the null
hypothesis
When to use non-parametric method and when not to
use?

Visually normal, use parametric

Moderately skewed, use parametric Severely skewed, use non-parametric
Outliers, use non-parametric
Uniformly distributed, use non-parametric
Data Reduction using SPSS
[To be demonstrated to Advanced Lecture for MSc and PhD students]
Basic about multivariate modeling

Receptor modeling in environmental forensics

involves the inference of sources and their
contributions through analysis of chemical data from
the ambient environment.

The objectives are to determine:

➢ the number of chemical fingerprints in the system;
➢ the chemical composition of each fingerprint;
➢ the contribution of each fingerprint in each sample
Multivariate
Receptor Modeling
1. Positive Matrix Factorization Model for environmental data
analyses
https://www.epa.gov/air-research/positive-matrix-factorization-
model-environmental-data-analyses

2. Chemical Mass Balance (CMB) Model

https://www3.epa.gov/scram001/receptor_cmb.htm

3. Unmix 6.0 Model for environmental data analyses

https://www.epa.gov/air-research/unmix-60-model-environmental-
data-analyses

4. Principal Component Analysis/Absolute Principal

Component Analysis (PCA/APCS)
http://www.sciencedirect.com/science/article/pii/000469818590132
5
Widely Used Other Available
Data Mining/Conversion of
Models Models
large data to smaller Group

PCA/Absolute PCA/ APCS - simplified model EPA‘S Chemical

principal Weighted APCS - deals “zero score” Mass Balance
component (CMB)
score(APCS) but lack of non-negativity requirement

PMF is complicated and robust model Unmix

Positive Matrix PMF - lower uncertainty and stop

Factorization producing zero factor score, requires
(PMF) Artificial Neural
component loadings and scores to be non-
Networks-Source
negative receptor modelling
Capable of identifying sources without
45
any prior knowledge of sources
Principal Component Analysis (PCA)

❑ It is a way of identifying patterns in data, and expressing

the data in such a way as to highlight their similarities
and differences.

❑ Principal component analysis (PCA) is also a technique

used to emphasize variation and bring out strong
patterns in a dataset. It's often used to make data easy
to explore and visualize.

46
Objectives of PCA
a) To transform an original set of variables into a new
set of uncorrelated variables called principal
components
b) To rank components in order of the amount of
variance that they account for
c) To see if the first few components account for most
of the variation in the original data
d) If (c) is true, then to make use of a smaller number
of transformed variables
e) If (c) is true, subsequent data analysis can be
simplified because the data set is smaller
f) To seek an underlying meaning of the first few
components (must be approached with care)
PCA/MLRA

address with the following formula

Measurement error
Normalized data
Source contribution
Source profile

48
Data matrix

Data matrix Source contribution Profiles

49
Factor loading using PCA procedure

❑ A large set of data was

used
❑ Obtained 4 small group
❑ Variables are highly
correlated in the respective
group
❑ Least correlation is
observed among the group
❑ Each of the group indicates
similar properties, nature,
sources etc.

50
PCA

◼ The first PC (PC1) is the best fit straight line in the multi-
dimensional space, the scores represent the distance along the
line and the loadings the angle (direction) of the straight line
◼ PC1 explains the largest amount of data variance & subsequent
PCs explain decreasing amounts of data variance
◼ Lower PC number, the greater the signal & lower the noise.
◼ Each PC describes a portion of the data so that all PCs add up
to 100%
◼ If data reduction is good, you need less PC to explain all the
relevant data
◼ PC plots can simplify large or difficult datasets & show the main
trends and are easier to visualize than tables of numbers
Preparation of database
Common problems:
◼ - systematic bias-analysis by different labs or different
methods
◼ - presence of data below detection limit (DL)
◼ - presence of coelution (non-target analytes that elute at the
same time as a target analyte)
◼ - data entry, identify outliers
◼ Noisy data
◼ Missing data
◼ Exclude variables if missing >50%

52
Preparation of database conti..

- replace data below DL with DL/2

- replace missing data with average value of nearby data,
or simply the average of the variable concentration
- data normalization or conversion of the data into unit
less or zero/centered mean
- Adequate number of data point and variables

53
Adequate number of data set

◼ No of data point must be more than no of variables

◼ No of data point should be 5 times of variables
◼ N > or = 100 samples (PK Hopke)
◼ N>(30+p+3)/2 (Henry et al 1984)
◼ N=50 (source unknown)
◼ N=30 (magic number!)
◼ Suitability test (KMO and Bartlett’s test): Our suggestions!!

54
Optimization of factor number

◼ >1 Eigen value

◼ Variance (%) ~ 10 or >10
◼ Interpretable factor profiles
◼ At least one variables should response
significantly
◼ Exclude variable if doesn’t response to any
factor either!

55
Exercise: VI

Activities: PCA procedure

-Follow the example data and use them into PCA to reduce the data into
a small group and least correlation is observed among the group

Demonstration: PCA procedure

PCA – PCR – APCS - MLR
STEP BY STEP
Step 1: Get Data

◼ Suitable data (N)

◼ Missing value
Step 2: Normalize the Data in Excel
Step 3: Upload the normalised data into SPSS
Upload data into SPSS

Upload
File
Step 4: Make Sure Data in Numeric
Step 5: Suitability of the Data
◼ KMO and Bartlett’s test
Step 6: Check KMO Value in Output File
Step 7: Run PCA for Normalised Data
Run PCA

Check all
important
info one
by one
Select Co Varian Method
Varimax
PCA Results

Eigen value > 1

7 Component!
PCA Results – Unrotated Factor Loading
PCA Results –Rotated Factor Loading

Important
Info
Step 8: Explanation of Factor Loading

◼ Factor loading > 0.7

◼ Explain based of significant variable
◼ Need to refer published paper to explain
the sources – need a lot of reading
Step 9: Copy and paste the Factor Scores
in a Excel Sheet
Principal component regression (PCR)
Principal component regression (PCR) analysis is a combination between
PCA (principal component analysis) and OLS (Ordinary Least Squares
regression). The PCR analysis is one of the best approaches to study the
statistical relationships between the air pollutants and meteorological
factors. PCR analysis can reduce the multicollinearity in the datasets
because the presence of multicollinearity among the independent variables
will produce the invalid results in terms of the model’s predictions and
determination of the significant independent variables. The factors with
eigenvalues more than 1.0 is choose in order to fully understanding of the
correlation relationship between the variables as the factors is considered
a significant factors. Then, the significant factors consisted of independent
variables obtained from the PCA were regressed against the dependent
variables using OLS regression analysis.
Exercise: VII

Demonstration (PCR): Dummy Data

Limitation of PCR
Factor Loadings
Factor Scores for
PC1, PC2,
for PC1, PC2,
PC3….
PC3…. PCA

MLR: PC1, PC2,

Rotation by Varimax PC3….vs a
Input data:
to obtain meaningful dependent
normalizatio
PC variable
n

Limitation: appear
negative mass
Execution of concentration
Calculate PCA (unrealistic)
APCS for each
PC
Corrections
for PCA
Determine the
Regress APCS contribution of
Induction an artificial
against the each PC with
samples with zero
dependent variable less
concentration for the
uncertainty
variables
value
APCS-MLR Step by Step

Step 10: Prepare a New Raw Data Set

Adding a Zero Sample at the End of the
Row
Step 11: Normalised the zero samples

= (X-Mean)/SD

Use “$” for Average and Standard Deviation

Paste formula e.g. = (H3-H$632)/H$633
Step 12: Run PCA for the Second Time
Exercise: VIII

Demonstration: PCA-APCS
Step 13: Copy and paste the Factor Scores (0 Sample)
in a Excel Sheet from Step 9
Step 14: Subtract the Factor Score for Zero Sample
(Step 13) from the Each Sample in Step-9

◼ The revised factor scores are recognized here

APCS (Step 9-Step 13)
Minus “Zero Factor Loading” =
APCS
Step 15: Run MLR using PM2.5 mass as Dependent
Variables and Each of the APCS is Independent
Variable.
Step 16: Convert the APCS into Factor Mass by
Multiplying the Respective Regression
Coefficient
Conversion of APCS into Mass Concentration

APCS X Regression Coefficient (B Column)

Delete “Negative Mass” from Data
Set
A correlation of input and predicted PM2.5 mass
% Distribution of PM2.5 mass contributed by F1, F2, F3, F4, F5, and F6
Assignment:
A review on current perspectives of principal component analysis followed by
an absolute principal component analysis in environmental application

Thank you for your attendance

Any further inquiry, please contact me:

[email protected], [email protected]
Acknowledgement

www.utsc.utoronto.ca/~phanira/WebResearchMet
hods/
https://www.nemoursresearch.org/open/StatCla
ss/January200
https://www.stat.auckland.ac.nz/~balemi/Multivar
iate

Model Building Through
No ratings yet
Model Building Through
21 pages
Student Performance Analysis Using Machine Learning
No ratings yet
Student Performance Analysis Using Machine Learning
40 pages
Business Statistics
50% (4)
Business Statistics
500 pages
Chapter 05 - Multicollinearity
100% (1)
Chapter 05 - Multicollinearity
26 pages
Analysis of Mine Haul Truck Fuel Consumption Report
No ratings yet
Analysis of Mine Haul Truck Fuel Consumption Report
24 pages
Nepalese Journal of Hospitality and Tourism Management
No ratings yet
Nepalese Journal of Hospitality and Tourism Management
76 pages
Bloom 1995
No ratings yet
Bloom 1995
11 pages
Qsar Stastistical Method in Drug Design
No ratings yet
Qsar Stastistical Method in Drug Design
54 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Logit Probit and Tobit Models For Catego PDF
No ratings yet
Logit Probit and Tobit Models For Catego PDF
19 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Predicting Production Performance of A Field With Complex Reservoir Heterogeneities Undergoing Water Injection - A Case Study of A Niger-Delta Field
No ratings yet
Predicting Production Performance of A Field With Complex Reservoir Heterogeneities Undergoing Water Injection - A Case Study of A Niger-Delta Field
14 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
Regression 2
No ratings yet
Regression 2
27 pages
2021 - Lopez-Martinez - Overview of Global Status of Plastic Presence in Marine Vertebrates
No ratings yet
2021 - Lopez-Martinez - Overview of Global Status of Plastic Presence in Marine Vertebrates
27 pages
Day 2-Data Science
No ratings yet
Day 2-Data Science
16 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Updated Lecture 7
No ratings yet
Updated Lecture 7
29 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
3.1 Multivariate Analysis
No ratings yet
3.1 Multivariate Analysis
32 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Quantitative Anaysise Solomon
No ratings yet
Quantitative Anaysise Solomon
51 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Regression Illustrations PDF
No ratings yet
Regression Illustrations PDF
5 pages
Statistical Analysis Using SPSS and R - Chapter 5 PDF
No ratings yet
Statistical Analysis Using SPSS and R - Chapter 5 PDF
93 pages
Sample: International Geographic Salary Differentials
No ratings yet
Sample: International Geographic Salary Differentials
36 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
No ratings yet
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
50 pages
Lecture 6 Regression Analysis
No ratings yet
Lecture 6 Regression Analysis
35 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Section 2
No ratings yet
Section 2
22 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Copilot Horse Racing
No ratings yet
Copilot Horse Racing
2 pages
Multiple Regression Analysis 1
No ratings yet
Multiple Regression Analysis 1
57 pages
Week 9
No ratings yet
Week 9
23 pages
Note 13 - Linear Regression
No ratings yet
Note 13 - Linear Regression
25 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Lecture 12 - Adv. Correlation and Multiple Regression
No ratings yet
Lecture 12 - Adv. Correlation and Multiple Regression
32 pages
Bivariate Statistical
No ratings yet
Bivariate Statistical
51 pages
Research Prososal Group 12 Finall
No ratings yet
Research Prososal Group 12 Finall
28 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Level 2 r12 Multiple Regression
No ratings yet
Level 2 r12 Multiple Regression
29 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Chapter 4 Multiple Regression Model
No ratings yet
Chapter 4 Multiple Regression Model
31 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
No ratings yet
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
9 pages
RGRSSN Assgnmnt
No ratings yet
RGRSSN Assgnmnt
11 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
2021 - Salerno - Microplastics and The Functional Traits of Fishes - A Global Meta Analysis
No ratings yet
2021 - Salerno - Microplastics and The Functional Traits of Fishes - A Global Meta Analysis
41 pages
2020 - Marti - The Colours of The Ocean Plastics
No ratings yet
2020 - Marti - The Colours of The Ocean Plastics
39 pages
Regression 101
No ratings yet
Regression 101
18 pages
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
No ratings yet
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
6 pages
Week 2 and Week 3
No ratings yet
Week 2 and Week 3
14 pages
2019 - Van Aert - Publication Bias Examined in Meta-Analyses From Psychology and Medicine - A Meta-Meta-Analysis
No ratings yet
2019 - Van Aert - Publication Bias Examined in Meta-Analyses From Psychology and Medicine - A Meta-Meta-Analysis
32 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
No ratings yet
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
13 pages
2019 - Akdogan - Microplastics in The Environment - A Critical Review of Current Understanding and Identification of Future Research Needs
No ratings yet
2019 - Akdogan - Microplastics in The Environment - A Critical Review of Current Understanding and Identification of Future Research Needs
24 pages
A Sample Mid-Term Examination of Econometrics Multiple Choice
No ratings yet
A Sample Mid-Term Examination of Econometrics Multiple Choice
8 pages
The Effects of Green Marketing Mix On Consumer Behavior in Danang City
No ratings yet
The Effects of Green Marketing Mix On Consumer Behavior in Danang City
6 pages
2019 - Erni-Cassola - Distribution of Plastic Polymer Types in The Marine Environment A Meta-Analysis
No ratings yet
2019 - Erni-Cassola - Distribution of Plastic Polymer Types in The Marine Environment A Meta-Analysis
20 pages
Week 3 and 4
No ratings yet
Week 3 and 4
19 pages
Noakhali Science and Technology University
No ratings yet
Noakhali Science and Technology University
28 pages
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
No ratings yet
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
8 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Team8 Lab3
No ratings yet
Team8 Lab3
12 pages
2019 - Xu - Microplastics in Aquatic Environments - Occurrence, Accumulation, and Biological Effects
No ratings yet
2019 - Xu - Microplastics in Aquatic Environments - Occurrence, Accumulation, and Biological Effects
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
2017 - Andrady - The Plastic in Microplastics - A Review
No ratings yet
2017 - Andrady - The Plastic in Microplastics - A Review
11 pages
1984.huesmann Lagerspetz Etal - interveningVariablesintheTeleViol AggRel - Developpsych
No ratings yet
1984.huesmann Lagerspetz Etal - interveningVariablesintheTeleViol AggRel - Developpsych
30 pages
An Efficient Method For Paired-Comparison: D. Amnon Silverstein Joyce E. Farrell
No ratings yet
An Efficient Method For Paired-Comparison: D. Amnon Silverstein Joyce E. Farrell
15 pages
2021 - Rodrigues - Microplastics and Plankton - Knowledge From Laboratory and Field Studies To Distinguish Contamination From Pollution
No ratings yet
2021 - Rodrigues - Microplastics and Plankton - Knowledge From Laboratory and Field Studies To Distinguish Contamination From Pollution
14 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
2020 - Buathong - Accumulation of Microplastics in Zooplankton From Chonburi Province, The Upper Gulf of Thailand
No ratings yet
2020 - Buathong - Accumulation of Microplastics in Zooplankton From Chonburi Province, The Upper Gulf of Thailand
12 pages
2021 - Ranjani - Assessment of Potential Ecological Risk of Microplastics in The Coastal Sediments of India - A Meta-Analysis
No ratings yet
2021 - Ranjani - Assessment of Potential Ecological Risk of Microplastics in The Coastal Sediments of India - A Meta-Analysis
12 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Abiyot Research Methods Worksheet Yom Revised
No ratings yet
Abiyot Research Methods Worksheet Yom Revised
4 pages
2015 - Barboza - Microplastics in The Marine Environment - Current Trends and Future Perspectives
No ratings yet
2015 - Barboza - Microplastics in The Marine Environment - Current Trends and Future Perspectives
8 pages
6 The Effect of Entrepreneurship Education On Students Entrepreneurial Intentions
No ratings yet
6 The Effect of Entrepreneurship Education On Students Entrepreneurial Intentions
12 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
2018 - Lorenzo-Navarro - Automatic Counting and Classification of Microplastic Particles
No ratings yet
2018 - Lorenzo-Navarro - Automatic Counting and Classification of Microplastic Particles
7 pages
2015 - Shim - Microplastics in The Ocean
No ratings yet
2015 - Shim - Microplastics in The Ocean
4 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Regression
No ratings yet
Regression
18 pages
National University of Modern Languages Lahore Campus Topic
No ratings yet
National University of Modern Languages Lahore Campus Topic
4 pages
Assignment 2 Question
No ratings yet
Assignment 2 Question
4 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
Mba 1-1
No ratings yet
Mba 1-1
7 pages
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

HMX7001 Analysis of Data Using SPSS - Advanced Level

Uploaded by

HMX7001 Analysis of Data Using SPSS - Advanced Level

Uploaded by

Advanced Research Methodology (HVX8001)

Analysis of Data Using SPSS

Dr. Md Firoz Khan

Input data Correction the

Initial analysis Multivariate

Statistical Analysis &

Descriptive statistics, Correlation, t-

Validation of the Emission Sources by Bivariate Rose Plot/Potential Source

SPSS: Demonstration with an

Basic of the statistics: Practice

P(score > 130) =

◼ Shape of data is measured by

Let x1 , x2 ,... xn be n observations. Then,

Positive correlation Negative correlation No correlation

r > 0 linear + positive

A number of common numerical measures of similarity is available:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

◼ The a parameters reflect the independent contribution of

SSE =  (Yi − Yi ) =  (Yi − X i +  )

• Involve solving sets of simultaneous equations

Simple example use of MLR model

Y = A1*X1 + A2*X2 + A3*X3………+ AnXn + C

[measured PM10 (μg m-3)] = A1 [measured NOx (μg m~3)] + A2 [measured

Demonstration: Multiple regression model

Thus, the reconstructed MLR model:

Practice with Dummy mass closure data

◼ α level is set a priori (Usually 0.05)

◼ If p < α level then we reject the null

Visually normal, use parametric

Receptor modeling in environmental forensics

The objectives are to determine:

2. Chemical Mass Balance (CMB) Model

3. Unmix 6.0 Model for environmental data analyses

4. Principal Component Analysis/Absolute Principal

PCA/Absolute PCA/ APCS - simplified model EPA‘S Chemical

PMF is complicated and robust model Unmix

Positive Matrix PMF - lower uncertainty and stop

❑ It is a way of identifying patterns in data, and expressing

❑ Principal component analysis (PCA) is also a technique

address with the following formula

Data matrix Source contribution Profiles

❑ A large set of data was

- replace data below DL with DL/2

◼ No of data point must be more than no of variables

◼ >1 Eigen value

Activities: PCA procedure

Demonstration: PCA procedure

◼ Suitable data (N)

Eigen value > 1

◼ Factor loading > 0.7

Demonstration (PCR): Dummy Data

MLR: PC1, PC2,

Step 10: Prepare a New Raw Data Set

Use “$” for Average and Standard Deviation

◼ The revised factor scores are recognized here

APCS X Regression Coefficient (B Column)

Thank you for your attendance

Any further inquiry, please contact me:

You might also like

Y = A1X1 + A2X2 + A3*X3………+ AnXn + C