Dependence and Interdependence Methods

Uploaded by

carucast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views3 pages

Dependence and Interdependence Methods

Uploaded by

carucast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DEPENDENCE AND INTERDEPENDENCE METHODS

Introduc6on to dependence and interdependence methods

1. Supervised and unsupervised learning

• Supervised learning:
§ Supervised learning is a type of machine learning where the model is trained on labeled data. In this context,
the data used for training includes both the input features (independent variables) and the correct output
labels (dependent variables). The goal of supervised learning is to learn a mapping from inputs to the correct
outputs (predic@ons).
§ Features:
o Labeled data: the training dataset includes both the input data and the corresponding output labels.
o Predic@on task: the model is trained to predict the output based on input features.
o Goal: to learn a rela@onship between the input and the output so that the model can make accurate
predic@ons on unseen data.
§ Focuses on predic@ng an outcome (dependent variable) from input features (independent variables), making
it inherently related to dependence methods.

• Unsupervised learning:
§ Unsupervised learning is a type of machine learning where the model is trained on unlabeled data. In this
case, the training data only includes the input features, but no corresponding output labels. The model tries
to iden@fy paGerns, structures, or rela@onships within the data without explicit guidance about what the
correct output should be.
§ Features:
o Unlabeled data: the training dataset contains only the input data without any known output labels.
o PaGern discovery: the goal is to explore the data and iden@fy underlying paGerns or groupings (data
structures).
o Goal: to ﬁnd structure or rela@onships within the data, such as grouping similar data points or reducing
the dimensionality of the data.
§ Focuses on ﬁnding paGerns or structures in data without a dependent variable, and oIen deals with
independence methods (e.g., clustering, feature reduc@on) or discovering the rela@onships among features.

2. Dependence methods

• These methods analyze causal or associa@ve rela@onship between variables à supervised learning.
• There are two types of variables:
§ Independent variable.
§ Dependent variable (which value depends on the independent variable value).
• Works for doing predic@on (of the dependent variable in func@on of the independent variable values).
• Types or dependence methods:
§ Regression:
o Linear regression: examines linear regression between two con@nuous (numerical) variables, one
dependent variable and one independent variable.
o Logis@c regression: examines egression between one categorical (not numerical) dependent variable
and one con@nuous (numerical) independent variable.
o Mul@ple regression: examines the rela@onship between one con@nuous (numerical) dependent
variable and two or more independent variables (numerical and/or categorical).
o Analysis of variance ANOVA: examines how one or more independent categorical (not numerical)
variables aﬀect a con@nuous (numerical) dependent variable.

§ Hypothesis tes@ng methods (sta@s@cal inference)

o T-student: determine if there is sta@s@cally significant difference between the means of two groups.
o Analysis of variance ANOVA: determine if there is sta@s@cally significant difference between the means
of three or more groups:
- One-way ANOVA: compares the means of more than two independent groups based on one factor.
- Two-way ANOVA: compares the means of more than two independent groups based on two
factors.
- Repeated measures ANOVA: used when the same subjects are tested mul@ple @mes (for example
before, during, and aIer a treatment).
§ Probability based methods related to logis@c regression:
o Probability scale: refers to the range of values that probabili@es can take, typically between 0 and 1. It is
a way of expressing the likelihood of an event occurring, being 0 an event that never occur and 1 an
event that always occur. Probability scale is oIen used in dependence methods, par@cularly when the
dependent variable is a binary outcome (such as success/failure, yes/no).
o Odds scale: The odds scale is used to express the likelihood of an event occurring in rela@on to it not
occurring. It is oIen used in the context of binary outcomes (such as success/failure, yes/no, etc.). It’s
used to measure or model the rela@onship between a dependent variable (usually a binary outcome)
and one or more independent variables.
o Odds ra@o (OD): is a measure of associa@on that compares the odds of an event occurring in one group
to the odds of it occurring in another group. It is commonly used in the analysis of binary outcomes
(like success/failure or yes/no) to quan@fy how the odds of an event differ between two groups or
condi@ons. It’s used to measure or model the rela@onship between a dependent variable (usually a
binary outcome) and one or more independent variables.
§ Survival analysis: the dependent variable is typically @me to an event (also called survival @me), and it oIen
involves censoring, which means that some individuals may not have experienced the event by the end of the
study period. The “event” typically refers to an outcome like death, disease occurrence, equipment failure, or
any other event of interest. The primary goal of survival analysis is to model and analyze the @me it takes for
this event to occur, and to iden@fy factors that influence this dura@on. Survival analysis is generally considered
a dependence method. In survival analysis, the focus is oIen on understanding how independent variables
(predictors) influence the dependent variable (survival @me). The independent variables may include factors
such as age, gender, treatment type, and other covariates that are hypothesized to affect the survival @me.
o Kaplan-Meier curve: A non-parametric method for es@ma@ng the survival func@on.
o Parametric models of survival analysis are con@nuous func@ons that are obtained from the preexisted
data, then more accurate apprecia@on of the curve is achieved, and inferences are easily determined:
- Weibull models.
- Logarithmical models.
- Exponen@al models.
o Cox Propor@onal Hazards model: A semi-parametric regression model used to explore the rela@onship
between the survival @me and one or more predictor (independent) variables.
§ Classifica@on methods: are dependence methods because they are used to predict the value of a dependence
variable based on one or more independent variable (predictors). It is a supervised learning, the model is
trained using labeled data, meaning that each training example has a corresponding target or dependent
variable (the output or label). In other words, from a known data, which the events present a known
type/class we generate an algorithm that tries to classify the rest of the events looking for paGerns in the
known data (data that we know their classifica@on), then we introduce our unknow data to the model
algorithm which is going to try to classify our unknown data based on the previous classifica@on that has
performed with the known data:
o K-Nearest Neighbors (KNN): calculates the distance between the query point and training data points.
Based on the majority class of the nearest neighbors, it assigns the class label to the query point.
o Bayesian methods: classifies useing the probability distribu@on of the features and their rela@onship to
the class labels to predict the most likely class for new data points.
o Decision trees: the tree is constructed by selec@ng the feature that best separates the data at each
node. It con@nues to split the data based on feature values un@l it reaches a leaf node that corresponds
to a predicted class label.
o Random forest: The model creates many decision trees (each trained on a random subset of features
and data) and aggregates their predic@ons through a majority vote for classifica@on (or averaging for
regression).
3. Interdependence methods

• These methods do not assume causality (associa@ve rela@onship between the variables).
• All variables are equally important (not dependent and independent variables).
• Are used to explore data architecture and iden@fy paGerns or associa@ons that allows us to structure the data à
unsupervised learning.
• Types or independence methods:
§ Independence methods used to analyze linear regression:
o Correla@on (r): measures the strength and direc@on of the rela@onship between two variables without
assuming causality or establishing a dependent rela@onship between them. It’s used in linear
regression.
o Goodness of fit (measures how well a model’s predicted values match with the observed values, how
accurately predicts the dependent variables based on the independent variables):
- Determina@on coefficient (R²): measures the percentage (%) of independent variable (Y) that you
can explain with the theore@cal linear model that you have built doing linear regression between
you independent and dependent variables.
- Adjusted R-squared: is a varia@on of R² that adjusts for the number of predictors (independent
variables) in the model. It’s used to avoid overes@ma@ng for goodness of fit when mul@ple
independent variables are included.
- Sum of Squared Errors (SSE): is a measure of the total error in a regression model. It is the sum of
the squared differences between the actual (observed) vales and the predicted values.
- Root Mean Squared Error (RMSE): measure of the average magnitude of the residuals (errors)
between the observed values and the predicted values. t is the square root of the average of the
squared errors.
- Null model (or intercept-only model): s a baseline model that only predicts the mean (average) of
the dependent variable for all observa@ons. It assumes that the independent variables have no
effect on the dependent variable.
§ Probability based methods:
o Chi-Square (χ²): determine if there is a significant associa@on or rela@onship between two categorical
variables. It compares the observed frequencies of data to the frequencies we would expect if the
variables were independent. It tests whether the distribu@on of one variable is related to the
distribu@on of the other variable. Chi-square test of independence is considered an independence
method because it is specifically designed to test whether two categorical variables are independent of
each other, in other words, this method is used to test whether two categorical variables are
independent or dependent, not if we can predict or modeling a rela@onship between a dependent
variable and one or more independent variables.
§ Conglomerate analysis: these are not methods that try to classify, they try to understand data structure and
validate that it’s not an artefact (the events inside the groups should present high intragroup homogeneity).
Pretends to group data based on their homogeneity within the sample heterogeneity, not to predict the
classifica@on a data event in a certain group (classifica@on methods):
o K-means clustering: it is a par@@oning method, meaning that it divides the data into a set number of
clusters (denoted by k), where each data point belongs to one and only one cluster. The main goal of K-
Means is to par@@on the data into k groups such that the data points within each group (or cluster) are
as similar as possible. It could be visualized by heatmap.
o Hierarchical clustering: s a clustering method that builds a tree-like structure called a dendrogram to
represent the hierarchical rela@onships between clusters. Unlike K-Means, hierarchical clustering does
not require the number of clusters to be specified in advance. The algorithm produces a hierarchy of
clusters, which can be cut at any level to obtain a desired number of clusters. It could be visualized by
heatmap.
§ Reduc@onal dimensionality: it’s used to reduce the number of features or variables in a dataset while
retaining as much of the important informa@on as possible.
o Principal component analysis (PCA): it’s a linear dimensionality reduc@on method that iden@fies the
direc@ons (called principal components) along which the data varies the most and projects the data
into a lower-dimensional space along those direc@ons. PCA axis explains the variability between the
samples.
o t-Distributed Stochas@c Neighbor Embedding (t-SNE) and Uniform Manifold Approxima@on and
Projec@on (UMAP): they’re a non-linear dimensionality reduc@on method.

CCA Selected Answers
83% (6)
CCA Selected Answers
84 pages
BME 301 HW #1 Solutions
No ratings yet
BME 301 HW #1 Solutions
2 pages
Elementary Surveying
75% (8)
Elementary Surveying
36 pages
Concept Map
100% (1)
Concept Map
1 page
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Regression Bayesian SVM Notes
No ratings yet
Regression Bayesian SVM Notes
6 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
No ratings yet
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
14 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
23 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
Module 3-1
No ratings yet
Module 3-1
7 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
19 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Classification
No ratings yet
Classification
74 pages
Mutivariate and Baysian
No ratings yet
Mutivariate and Baysian
21 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Unit 1
No ratings yet
Unit 1
15 pages
Module 3
No ratings yet
Module 3
25 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Machine Learning Complete-Course-Notes Polimi
No ratings yet
Machine Learning Complete-Course-Notes Polimi
107 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Chapter 9 PDF
No ratings yet
Chapter 9 PDF
25 pages
ML Assignment
No ratings yet
ML Assignment
3 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
SML
No ratings yet
SML
8 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
Ch2 Statistical Learning
No ratings yet
Ch2 Statistical Learning
51 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
15 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
10 Statistical Techniques
No ratings yet
10 Statistical Techniques
9 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
Information Retrieval Important Questions
No ratings yet
Information Retrieval Important Questions
20 pages
Attachment LogR
No ratings yet
Attachment LogR
10 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
49 pages
MLP RL1
No ratings yet
MLP RL1
6 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
Section 4
No ratings yet
Section 4
40 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
33 pages
Unit-2: Machine Learning Techniques (KCS-055) Module-2
No ratings yet
Unit-2: Machine Learning Techniques (KCS-055) Module-2
199 pages
DWDM 4 Unit Notes
No ratings yet
DWDM 4 Unit Notes
21 pages
Unit 2
No ratings yet
Unit 2
133 pages
Examen Innovation I
No ratings yet
Examen Innovation I
6 pages
GSEA
No ratings yet
GSEA
7 pages
Distribution
No ratings yet
Distribution
7 pages
Early Sequence Aligment
No ratings yet
Early Sequence Aligment
14 pages
Tema 10 Leukocyte Migration
No ratings yet
Tema 10 Leukocyte Migration
36 pages
Po 390
No ratings yet
Po 390
264 pages
Vogue Scandinavia - How To Find Out Your Viking Birth Runes Astrology
No ratings yet
Vogue Scandinavia - How To Find Out Your Viking Birth Runes Astrology
1 page
Davies 1932
No ratings yet
Davies 1932
22 pages
Response of Newly Collected Acetobacter Isolates in Sweet Corn (Zea Mays L. Saccharata)
No ratings yet
Response of Newly Collected Acetobacter Isolates in Sweet Corn (Zea Mays L. Saccharata)
5 pages
Ipa Schedulingguide
100% (2)
Ipa Schedulingguide
44 pages
E40M RC Filters: M. Horowitz, J. Plummer, R. Howe 1
No ratings yet
E40M RC Filters: M. Horowitz, J. Plummer, R. Howe 1
22 pages
Betriebsanleitung Schieber CA Engl
No ratings yet
Betriebsanleitung Schieber CA Engl
9 pages
1 Plant Physiology
100% (1)
1 Plant Physiology
11 pages
A Visual Security Multi Key Selection Image Encryption Algorithm Based On A New Four Dimensional Chaos and Compressed Sensing
No ratings yet
A Visual Security Multi Key Selection Image Encryption Algorithm Based On A New Four Dimensional Chaos and Compressed Sensing
22 pages
Foundation II Lesson 2
No ratings yet
Foundation II Lesson 2
16 pages
Arm Position and Blood Pressure Readings The ARMS Crossover Randomized Clinical Trial. JAMA Internal Medicine 2024
No ratings yet
Arm Position and Blood Pressure Readings The ARMS Crossover Randomized Clinical Trial. JAMA Internal Medicine 2024
7 pages
Why So Few Women On The Street at Night - Sarona Abuaker
No ratings yet
Why So Few Women On The Street at Night - Sarona Abuaker
118 pages
QUESTION Paper
No ratings yet
QUESTION Paper
3 pages
IENG300 Assignment - 1 Solution
No ratings yet
IENG300 Assignment - 1 Solution
6 pages
CQ 02 February 1980
No ratings yet
CQ 02 February 1980
100 pages
On The Borderline - The EU CBAM and Its Place in The World of Trade
No ratings yet
On The Borderline - The EU CBAM and Its Place in The World of Trade
74 pages
PIC16 (L) F15354 - 55 Data Sheet 40001853C-1314298
No ratings yet
PIC16 (L) F15354 - 55 Data Sheet 40001853C-1314298
539 pages
Hepatitis
100% (1)
Hepatitis
8 pages
Eth - 7000025687 - vol-II Civil Boq Centenary Tower
No ratings yet
Eth - 7000025687 - vol-II Civil Boq Centenary Tower
39 pages
Composite Chassis 2 PDF
No ratings yet
Composite Chassis 2 PDF
11 pages
Crime Scene Investigation
No ratings yet
Crime Scene Investigation
4 pages
Keys - Lesson 8. Matching (Cont)
No ratings yet
Keys - Lesson 8. Matching (Cont)
6 pages
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
100% (8)
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
15 pages
Đề Hsg Lớp 8 Thanh Hoá 2023-2024
No ratings yet
Đề Hsg Lớp 8 Thanh Hoá 2023-2024
207 pages
Ring Gasket Sizing Chart VF
No ratings yet
Ring Gasket Sizing Chart VF
11 pages
Complexity of Matrix Rank and Rigidity
No ratings yet
Complexity of Matrix Rank and Rigidity
1 page

Dependence and Interdependence Methods

Uploaded by

Dependence and Interdependence Methods

Uploaded by

DEPENDENCE AND INTERDEPENDENCE METHODS

Introduc6on to dependence and interdependence methods

§ Hypothesis tes@ng methods (sta@s@cal inference)

You might also like