Gea Cheatsheet

Uploaded by

26YiJie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views3 pages

Gea Cheatsheet

Uploaded by

26YiJie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

lOMoARcPSD|49555067

GEA1000 BEST Cheatsheet

Quantitative reasoning with data (National University of Singapore)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by 26_YiJie ([email protected])
lOMoARcPSD|49555067

3 types of research questions: Generalisability Criteria: (to generalise findings from sample to
1) Makes an estimate about the population population)
2) Tests a claim about the population 1) Good sampling frame (= or > than population)
3) Compares 2 sub-populations/investigates a r/s between 2 2) Probability-based sampling (to minimise selection bias)
variables in the population 3) Use large sample size (to reduce variability/random errors in
sample)
Types of Biases and Errors associated w diff sampling 4) Minimise non-response rate (reduce )
mthds
1) Selection bias (imperfect sampling frame, non-prob sampling) Categorical variable:
2) Non-response bias (disinterested, inconvenient, sensitive info) 1) Ordinal (using numbers to represent ordering)
2) Nominal (no intrinsic ordering)
**impt that every unit in the sampling frame has a known
non-zero probability of being selected (but dunnid same)
Numerical variables:
4 main types of probability sampling: 1) Discrete 2) Continuous
1) Simple Random Sampling Simpson’s Paradox:
● Units selected randomly without replacement with equal A phenomenon where a trends in > half the groups
chance of being selected disappears/reverses when the groups are combined (direction of
● Diff samples selected from same sampling from using SRS wld association reversed)
be diff, any variability due to chance
Pro: good rep of population **simpson’s paradox = cfm confounder but not the other way ard
Con: subject to non-response; accessibility of information
Summary statistics: Confounders:
2) Systematic sampling Spread of data/measure of dispersion: S.D., variance, IQR A third variable associated to both independent and dependent
Assuming you want sample size k Central tendencies: mean, mode, median variable whose relationship we are investigating
●SRS a starting from the interval k
Types of study design: **to show association btwn suspected confounder the IV/DV, test
1) Experimental design rate ( |suspected confounder) (alw change the one behind!!!)
Blinding: subjects don’t know whether they are in treatment or control
grp Effects of confounders on a study:
Pro: simpler than SRS as dunnid to know no. of sampling units Double-blinding: ^^ same but both subjects and assessors are blinded Must measure a variable to check if it is a confounder
Con: potentially under-rep population if list is not random = BUT need collect data on lots of variables to identify THE one
Placebo effect: response observed when subjects receive a placebo
(potential selection bias) = not feasible (costly & difficult to analyse)
treatment, but still show some positive effects
= may still have hidden confounders affecting results of the study
3) Stratified Sampling
2) Observational studies (still got control & treatment grps) (able to obtain limited conclusion)
●Break down population into strata (each similar in nature but
may vary in size), sample generated through doing SRS for Researcher’s don’t directly manipulate one variable to cause an effect
each stratum (may need to take weighted average) on the other Data slicing: (used for observational studies ONLY)
Pro: good representative sample for each stratum **obv studies do not provide convincing evidence of causal ● Sample size will not change because of slicing
Con: need info on sampling frame & stratum relationship, only can provide evidence for association ● Simply categorising data using 3rd variable (sus confounder)
● Slicing to investigate presence of simpson’s paradox
4) Cluster Sampling
●Break down population into clusters, then SRS a fixed number To remove effects of confounder, use randomised assignment to
of clusters remove association btwn confounder and either dependent or
Pro: simpler, less time-consuming, less costly
independent variables
Con: HIGH VARIABILITY due to dissimilar clusters/small
number of clusters

2 main types of non-probability sampling:

1) Convenience sampling
2) Volunteer sampling (unlikely to be able to rep population of
interest)

Symmetry rule
lOMoARcPSD|49555067

Data visualisation for One variable EDA (univariate data) Using Linear Regression to predict one variable based on the
1) Histograms other for NON-LINEAR models
2) Boxplots

Histograms
● Shape of graph: peaks (eg uni/bimodal), skewness ● 0 ≤ P(E) ≤ 1
● P(S)=1 if S is the entire sample space
● Mutually Exclusive events E and F: P(a ∪ b)= P(A) + P(B)
● Uniform Probability = 1/size of sample space
● Conditional Probability:
Correlation Coefficient (r)
●A measure of linear association(direction and strength)
● Strength of association: 0 to 0.3 (weak), 0.3 to 0.7
● Center: mean, median, mode (moderate), 0.7 to 1.0 (strong) ● Given that E and F are mutually exclusive events that
● Spread about central tendency: range, stdev ● As r approaches -1 or 1, data points fall more closely to the make up the entire sample space,
● Deviations from the pattern: outliers regression line
Histograms vs Bar Graphs *r=0 does not necessarily imply that there is no association b/w the
1) Histograms show the distribution of a numerical variable 2 variables. r=0 or a small r value could be due to non-linear r/s
across a number line while bar graphs makes comparisons b/w variables
across categories of a variable
2) Ordering of histogram cannot be changed unlike bar graphs Properties of Correlation Coefficient (r)
3) No gaps b/w bars for histograms unlike that of bar graphs r value is not affected by:
1) Interchanging the x and y axis (swapping the 2
Boxplots (features of a boxplot: min,Q1,median,Q3,max) variables)
2) Adding a constant to all variables ● Independent Events A and B:
3) Multiplying a positive number to all values of a variable Definition 1:
Independent
Definition 2: Events
Limitations of Correlation Coefficient (r)
1) Correlation b/w 2 variables can only suggest a
Confidence Intervals
statistical relationship(can use the value of one variable
Interpreting the confidence
to obtain the AVERAGE value of the other variable),
interval: (eg)
NOT a causal relationship
95%CI: 0.254 +- 0.0191
2) r value does not measure non-linear association b/w 2
“We are 95% confident
variables. r value could be very small in cases where
(=/= chance) that the
association b/w 2 variables is non-linear
population parameter we
A point is considered an outlier if it is < Q1-1.5*IQR or > 3) Outliers may increase OR decrease the strength of the
are looking for lies within
Q3+1.5*IQR correlation. Sometimes, removing outliers may have
the confidence interval”
minimal effect on r value.
Data visualisation for Two variable EDA (bivariate data) Linear Regression
For each simple random
1) Scatter plots (to get an idea of pattern formed b/w 2 variables) ● Using one variable to predict another
sample you construct a
2) Correlation Coefficient (to check for linear relationship) ● Models the relationship b/w variables X and Y by a straight
confidence interval.
3) Linear Regression (to fit a line to data for making predictions) line Y=mX+b
Use least squares method to determine best fit line for our data.
Confidence interval differs
Scatter Plots (Least Squares Method)
from sample to sample as
● Direction: positive, negative, neutral Define the i-th residual of the observation: 𝑒𝑖= difference between the the sample proportion is diff
● Form: linear vs non-linear observed and predicted outcome. for each sample
● Strength of linear r/s between 2 variables (RELATED to Want to minimise where 𝑛 =no of data points
gradient of regression line but NOT EQUIVALENT)
Properties of Regression Line
● Deviations from the pattern: outliers
1) Regression line always passes through

Properties of confidence intervals

1) smaller sample size, larger random error, larger/wider
confidence interval
Downloaded by 26_YiJie ([email protected]) 2) higher confidence level, larger/wider confidence interval

Module 3 Chapter 3.
No ratings yet
Module 3 Chapter 3.
14 pages
College Statistics Cheat Sheet
100% (2)
College Statistics Cheat Sheet
2 pages
CHYS 3P15 Final Exam Review
No ratings yet
CHYS 3P15 Final Exam Review
7 pages
Gea1000 Finals Cheatsheet
No ratings yet
Gea1000 Finals Cheatsheet
2 pages
Math1041 Study Notes For UNSW
No ratings yet
Math1041 Study Notes For UNSW
16 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Case Problem 2 Harbor Dunes Golf Course
No ratings yet
Case Problem 2 Harbor Dunes Golf Course
2 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
AP Statistics 1st Semester Study Guide
No ratings yet
AP Statistics 1st Semester Study Guide
6 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
4 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
GEA1000 Helpsheet v2
No ratings yet
GEA1000 Helpsheet v2
2 pages
Gea1000 Cheatsheet Finals
No ratings yet
Gea1000 Cheatsheet Finals
3 pages
AP Stats Study Guide 1 1 1
No ratings yet
AP Stats Study Guide 1 1 1
21 pages
GEA1000 Finals Cheatsheet
No ratings yet
GEA1000 Finals Cheatsheet
2 pages
Gea1000 Cheatsheet Summary Made
No ratings yet
Gea1000 Cheatsheet Summary Made
6 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Gea1000 Cheatsheet
No ratings yet
Gea1000 Cheatsheet
2 pages
G's GEA1000 Cheatsheet
No ratings yet
G's GEA1000 Cheatsheet
2 pages
Cheatsheet Summary Made
No ratings yet
Cheatsheet Summary Made
3 pages
Applied Statistics Summary
No ratings yet
Applied Statistics Summary
9 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
AP Stat Review
No ratings yet
AP Stat Review
23 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
GEA1000 Final CS
No ratings yet
GEA1000 Final CS
3 pages
Statistics Cheatsheet
No ratings yet
Statistics Cheatsheet
3 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
Correlation New
No ratings yet
Correlation New
37 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Psych Stats Prelim
No ratings yet
Psych Stats Prelim
4 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Data and Statistical Notes
No ratings yet
Data and Statistical Notes
10 pages
Psy 3 - M
No ratings yet
Psy 3 - M
3 pages
YMS Topic Review (Chs 1-8)
No ratings yet
YMS Topic Review (Chs 1-8)
7 pages
2021 Stat Notes
No ratings yet
2021 Stat Notes
162 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
W7 Dmitriy-Zinovev Descriptive Stats
0% (1)
W7 Dmitriy-Zinovev Descriptive Stats
19 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Biostatistics 2021-22 Part 2 8th Sem
No ratings yet
Biostatistics 2021-22 Part 2 8th Sem
13 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
Stats 201
No ratings yet
Stats 201
5 pages
Statistics CH 1-2
No ratings yet
Statistics CH 1-2
7 pages
Statistics През
No ratings yet
Statistics През
46 pages
3 4 Research 8 2
No ratings yet
3 4 Research 8 2
54 pages
Statistics Notes
No ratings yet
Statistics Notes
18 pages
AP Stats - Vocab List
No ratings yet
AP Stats - Vocab List
28 pages
Discussion For Today: Probability Sampling Non Probability Sampling Questionnaire
No ratings yet
Discussion For Today: Probability Sampling Non Probability Sampling Questionnaire
31 pages
A Brief (Very Brief) Overview of Biostatistics: Jody Kreiman, PHD Bureau of Glottal Affairs
No ratings yet
A Brief (Very Brief) Overview of Biostatistics: Jody Kreiman, PHD Bureau of Glottal Affairs
56 pages
5.basic Statistics
No ratings yet
5.basic Statistics
43 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
IPS 333 - Quantitative Data Analysis-1
No ratings yet
IPS 333 - Quantitative Data Analysis-1
28 pages
RSU - Statistics - Lecture 1 - Final - myRSU
100% (1)
RSU - Statistics - Lecture 1 - Final - myRSU
44 pages
AP Review Packet 1 - Important Concepts Not On The AP Statistics Formula Sheet
No ratings yet
AP Review Packet 1 - Important Concepts Not On The AP Statistics Formula Sheet
16 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Elements Of Clinical Study Design, Biostatistics & Research
From Everand
Elements Of Clinical Study Design, Biostatistics & Research
Aditya Patel
No ratings yet
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Biodiversity and Biogeographic Patterns in Asia-Pacific Region I: Statistical Methods and Case Studies
From Everand
Biodiversity and Biogeographic Patterns in Asia-Pacific Region I: Statistical Methods and Case Studies
Youhua Chen
No ratings yet
Purchase Intention and Buying Behavior Towards Laptops: A Study of Students in
No ratings yet
Purchase Intention and Buying Behavior Towards Laptops: A Study of Students in
9 pages
TYBBI-Sem V-Research Methodology-Sonal S
50% (2)
TYBBI-Sem V-Research Methodology-Sonal S
12 pages
3-Prediction Modeling For The Estimation of Youngs Modulus
No ratings yet
3-Prediction Modeling For The Estimation of Youngs Modulus
18 pages
ANN and ANFIS Performance Prediction Models For Hydraulic Impact Hammers
No ratings yet
ANN and ANFIS Performance Prediction Models For Hydraulic Impact Hammers
7 pages
Resilience of Organizations in The Construction Industry in The Face of COVID-19 Disturbances
No ratings yet
Resilience of Organizations in The Construction Industry in The Face of COVID-19 Disturbances
17 pages
QAM 4th Module Assessment PDF
No ratings yet
QAM 4th Module Assessment PDF
7 pages
Accuracy Assessment: Geography 581 Mather P245-249 Jenny Mckay
No ratings yet
Accuracy Assessment: Geography 581 Mather P245-249 Jenny Mckay
5 pages
Sim Et Al 2021 Sample Size Requirements For Simple and Complex Mediation Models
No ratings yet
Sim Et Al 2021 Sample Size Requirements For Simple and Complex Mediation Models
31 pages
Formula Sheet Biostatistics
No ratings yet
Formula Sheet Biostatistics
2 pages
Statistics and Psychology
100% (14)
Statistics and Psychology
385 pages
J.E. Kennedy, H. Kanthamani and John Palmer - Psychic and Spiritual Experiences, Health, Well-Being, and Meaning in Life
No ratings yet
J.E. Kennedy, H. Kanthamani and John Palmer - Psychic and Spiritual Experiences, Health, Well-Being, and Meaning in Life
27 pages
Sampling Techniques
No ratings yet
Sampling Techniques
3 pages
Mba Zc417 Course Handout
No ratings yet
Mba Zc417 Course Handout
8 pages
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
No ratings yet
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
8 pages
WQD7005 Final Exam - 17219402
No ratings yet
WQD7005 Final Exam - 17219402
12 pages
Duong BANA3050 Section# MS Excel Practicum1
No ratings yet
Duong BANA3050 Section# MS Excel Practicum1
22 pages
1981 Trends in Identification
No ratings yet
1981 Trends in Identification
15 pages
Statistics and Probability Q4 - M2 - LAS
No ratings yet
Statistics and Probability Q4 - M2 - LAS
3 pages
F-15e and F-16c Bombing Skills
100% (2)
F-15e and F-16c Bombing Skills
41 pages
Multiple-Intelligence (2) (Autorecovered)
No ratings yet
Multiple-Intelligence (2) (Autorecovered)
73 pages
Data Analysis Hypothesis Testing Printable
No ratings yet
Data Analysis Hypothesis Testing Printable
23 pages
NUREG 2300 PRA Procedures Guide 1983
No ratings yet
NUREG 2300 PRA Procedures Guide 1983
500 pages
Atp Examples
No ratings yet
Atp Examples
42 pages
TI-Statistical Process Control-Training Part 1
100% (1)
TI-Statistical Process Control-Training Part 1
48 pages
BRM - Chapter 17 - Multiple Regression
No ratings yet
BRM - Chapter 17 - Multiple Regression
1 page
Pols 856 Final Report
No ratings yet
Pols 856 Final Report
30 pages
Basic Concepts of Statistical Sampling Methods
No ratings yet
Basic Concepts of Statistical Sampling Methods
6 pages
Acculturation: Role of Student-University Alignment For International Student Psychological Adjustment
No ratings yet
Acculturation: Role of Student-University Alignment For International Student Psychological Adjustment
20 pages

Gea Cheatsheet

Uploaded by

Gea Cheatsheet

Uploaded by

lOMoARcPSD|49555067

GEA1000 BEST Cheatsheet

Quantitative reasoning with data (National University of Singapore)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

2 main types of non-probability sampling:

Properties of confidence intervals

You might also like