0% found this document useful (0 votes)

15 views8 pages

Ml2 Summary

Machine learning summary

Uploaded by

Mariela Ls

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views8 pages

Ml2 Summary

Machine learning summary

Uploaded by

Mariela Ls

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Topic 1 Error rate = (7+6+3+4+3+1) / 100 = 24 / 100 = 0.

24
P value: less than 0.05 means that the null hypothesis is rejected and there is a Optimal threshold
strong relationship between the y dependent variables and the x variable Y = 1 not repaid(costly) Y = 0 repaid

Residuals: the difference between the actual value and the value predicted by the FP= L(0,1)L(0,1) = L(L(truth: repaid,
model predicted: default)) = 10'000 → rejecting a
loan (predicting default) that would have been repaid.

n rows ⟺⟺ n observations columns ⟺⟺ q variables

Data Matrix: A table with variables in columns and observations in rows
FN = L(1,0)L(1,0) = L(L(truth: default, predicted: repaid)) = 100'000 → giving a
loan (predicting to get repaid) but customer defaults.
Y = Response Variable. X = Predictor Variables ROC Curve-checking all possible thresholds.
Linear Regression: It assumes a linear relationship and your dependent variable True Positive Rate/Recall = TP / (TP + FN) the higher the better
is normally distributed, this is violated for binary classification. False Positive Rate = FP / (FP + TN). The lower the better
Y = β0 + β1x1 + ... + βKxK + ϵ
EX: At threshold 0, the classifier labels every instance as positive (1)
EX ->. Score = β0 + β1× Hours Studied + ε Precision = TP / (TP + FP) the higher the better
β0 = intercept, this is the expected test score when hours studied is zero
β1 = coefficient of hours studied, how much the test score is expected to change AUC% to classify data correctly between + and -. The larger the AUC, the better.
with each additional hour of study
ε = error term, this account for other factors affecting the test score (natural AUPRC The larger the AUPRC, the better.
aptitude, difficulty...)
EX: 0.8 the classifier ranks two randomly chosen test data points correctly with a
Logistic Regression- probability of 80%.
Bernoulli
η(x)=β0+β1x1+...+βpxp = linear predictor
Transforms a linear combination of input variables into a value between 0 and 1.
Calculate the linear predictor
EX -> β0= - 1.75 β1 = 0.011 x1= 50 -> −1.75 + 0.011 × 50 = -1.2
e = 2.71828
p = 1/(1+2.71828 (-(-1.2))) = 0.231 → 23.1%
the probability of not winning an Oscar since is less than
50%
There's a 50% predicted probability of y being 1.
Probability of winning an Oscar -> P(y=WinsOscar∣X=x) >δ
where δ=0.5
AIC = the lower the better, how well a model fits the data it ROC: it compares the false positives to the total number of true negatives (true 0’s)
was generated from + -> more relevant PR: it compares the false positives to the total number of
Confusion Matrix - careful with imbalanced data predicted positives (predicts 1’s). -> imbalanced data or costs for double-checking
Truth = 0 (-) Truth = 1(+) Truth = 2 predicted positives
Pred = 0 (-) 23 (TN) 7 (FN) 6
Pred = 1 (+) 3 (FP) 27(TP) 4 EX: medical diagnosis (more 0’s than 1’s) the PR curve is likely more relevant than
Pred = 2 3 1 26 the ROC curve due to the imbalanced data and the high cost of false positives.
Error (misclassification) rate = (wrongly classified) / (number of samples)
Building Classification Trees Building the partition is done analogously as
for regression trees. However, both in the greedy algorithm for finding the partition
Topic 2 and in the pruning step, instead of the RSS, one uses the total cost where the
Root Nodes -> Internal Nodes mean square error is replaced by another impurity measure Gini. -> the lower the
-> Leaf or Terminal Node better (0,1), evaluates the quality of a division. Classification: m=p and minimum
node size nmin=1
Regression Trees: (Predict
Numbers) Continuous or
Fomulas
quantitative outcomes. The
output: numerical value. It
reduced the variation of the
target variable within each
node.MSE

Binary tree labels (for 2 classes) Negative (left) and Positive (right).
Ternary tree labels (for 3 classes) written as
"A/B/C"

If condition is true, go to the left.

Assume α= 0.5
Classification Trees: (Categorize items into
classes) Categorical outcomes. The output is a MCE / Probability of class "no side effect" for "old": 10 / (10+50) = 0.167
class label. Ideally, each node should contain
data points that belong to a single class.
GINI&MCE Probability of class "side effect" for "old": 50 / (10+50) = 0.83

Tuning parameter α balances the model goodness of fit and the model size Gini(old): 10/60*(1-10/60) + 50/60*(1-50/60) = 0.167*(1-0.167) + 0.833*(1-0.833)
(complexity). α is chosen by cross-validation. = 0.28

Building Regression Tree The goal is to minimize the residual sum of squares Total Cost: 60 * 0.28 + 40 * 0 = 16.8
(RSS) the sum of the squared differences between the observed values and the
values predicted by the model. Variance the spread of a set of data points, how far Best split variable: 16.8(age) < 47.5(sex) → split on age
each number in the set is from the mean. 1.-Selecting the best split with the most
similar target variable where the outcomes (like having a side effect or not) are
more consistent within each group 2.- Evaluating Split Quality, the reduction in Cost-complexity Cα(T): Left, 3 leaves-> 40*0 + 50*0 + 10*0 + 0.5*3 = 1.5
variance or mean squared error(MSE) 3.- Creating Terminal Nodes the predicted Right, 2 leaves-> 60 * 0.167 + 40 * 0 + 0.5 * 2 = 11
value for each leaf is the mean of the target variable for the observation within that
leaf 4.- Recursively Splitting the process is applied to each of the resulting node, Should we prune: 1.5 < 11.0 → don't prune
when the criteria are met for all node 5.- Pruning: when you make your tree
shorter, for instance because you want to avoid overfitting can be guided by cross- OverFitting: 1.- Complexity of the Tree: If the tree has many splits, it may be too
validation to find the optimal tree sizes Note: Only a finite number of α’s needs to finely tuned to the training data. 2.- Leaf Size: If there are many leaves with very
be considered since there are only finitely many relevant subtrees, and it is chosen few instances in them 3.-Performance on Validation Data: If the model performs
by cross validation. Regression: m=p/3 and minimum node size nmin=5. significantly better on the training data than on validation data 4.- Lack of
Pruning: Pruning is a technique used to reduce the size of a tree to prevent
overfitting. If the tree is not pruned (as indicated by cp = 0 which means no
complexity penalty for adding another split), it may overfit. 5.- Depth of the Tree: Boosting often results in
refers to the length of the longest path from the tree's root node to a leaf. It the highest predictive
measures how many "levels" of decision nodes exist in the tree. accuracy for “structured /
tabulated” data. Warning:
Cost complexity pruning to avoid overfitting. By increasing the alpha value, you In contrast to Random
increase the penalty for complexity. The goal is to find the alpha that minimizes Forest, boosting will overfit
impurity without sacrificing too much predictive accuracy. This is typically done by if you add too many trees.
cross-validation, where the tree is pruned at each alpha, and the one that results in Tuning Parameters:
the best cross-validated performance is chosen. Number of trees MM (most
importanparameter),
Random Forest to build many trees and then aggregate the different trees, i.e. Learning rate vv (usually the
make an average, 1000 slightly different trees, with slightly different predictions smaller the better, ≤ 0.1),
Advantages: high accuracy, robustness, and low bias feature importance. σ2 is Tree related parameters
the variance of one tree, ρ is the correlation between two trees → how similar two (Maximal depth of trees, Minimal number of samples per leaf)
trees are.
Bootstrap:drawing(N)samples from the original dataset with replacement. Topic 3 if one (or several) variable has much higher variance than the other
variables, it will dominate the analysis. I.e., the first PC will essentially just be
High Variance:sensitive to training data, potentially this variable with large variance. Does the clustring change if you run it on
capturing too much noise and specific details, scaled data?Yes, it does change. Because clustering algorithms use distances
erforming well on training data but poorly on new or and these depend on the scaled dimensions.
unseen data (overfitting).High Bias: make simplistic
assumptions about the data structure, which can PCA: to analyze large dataset containing a high number of q
lead to generally incorrect predictions (dimensions/features) per observation. Goal: to reduce the dimensionality of the
(underfitting).Low Variance: less sensitive to training data and therefore more data by keeping only the principal components. Many variables are not feasible q >
stable or consistent, but may fail to capture complexities in the data. Low Bias: 3 Collinearity (high correlated variables) or more variables than observations can
makes correct predictions, as it is not overly simplified and tries to capture the true cause problems.
underlying relationship in the data. Reduce the dimension while accounting for as much as possible of the variation (=
information) in the data. Covariance: to understand if two variables are link to
The trees are not independent and reducing the correlation between the trees each other. Positive: if one increases the other one as well. Negative: if one
gives better accuracy. For each split, consider only m randomly selected variables increases the other decreases. Null: there is no relationship, one does not affect
to reduces the correlation by increasing the diversity in the forest. A high the other one.
correlation means that the trees tend to make very similar decisions or have
similar structures, while a low correlation indicates that the trees make more
In PCA (Principal Component Analysis), when two variables have the same
independent decisions from each other.
distance but in opposite directions, the covariance between them is indeed 0
OOB error rate is an estimate of the model's performance on unseen data and is
computed during the training process.It is calculated based on the predictions Normalization (0-1) & mean centering, the mean is zero. The first principal
made by each tree in the Random Forest ensemble on the out-of-bag samples component has the largest variance, the second component the second largest
(samples not used for training that tree). variance and so on. The vectors are called loadings, and the values of the data points
are called scores. Loadings describe variable contributions to components, while
The error rate calculated from the confusion matrix is based on the predictions scores reveal data point positions in component space.
made by the trained model on the entire dataset.
Steps to perform PCA: 1.-Standardization, 2.-Covariance Matrix, 3.-Eigen
Decomposition, 4.-Sort By Eigen Values, 5.-Choose your Principal Component
We call the data scaled if all variables have mean 0 and variance 1. positively correlated, the first principal component is often some kind
of average of the variables.

Then the other principal components give important information about the
remaining patterns or shapes

Comp1: factor that affects in a positive way both variables since it is positive and
high.
Comp2: opposite signs mean the difference between the behavior of these two. If
nestle has a high performance, Novartis would have a low performance.

The signs of the PC loadings are arbitrary. the sign of the loadings, which are
the coefficients assigned to each variable in the principal component, is arbitrary.
This means that the direction in which the eigenvector—the principal component—
points in the multidimensional space is not important. Whether an eigenvector has
a positive or negative sign, it still represents the same axis of variation within the
data.

Topic 4
How many PC’S? Goal of MDS: Represent qq-dimensional data in a low-dimensional space
1.- The cumulative proportion should be at last 80%
while preserving distancesbetween points as much as possible.
2.- Keep only PC’S before the elbow
Key difference between PCA and MDS: PCA use quantitive data, MDS
quantitive and qualitive; PCA reduce the number of variables in the data while
retaining as much info(variance) as possible. MDS aims to visualize the
structure of the data by representing the the distances or dissimilarities
between points.

Distances for continuous data

Euclidean distance serves as a way to maintain the distances between data

points when mapping them from a higher-dimensional space to a lower-
dimensional space such as 2D ideally. In summary, we reconstructed the
points X from the distances between all points. Straight lines between two
points.

It is calculated as the square root of the sum of the squares of the differences
When all
between corresponding values of the two sets of point
variables are
Manhattan distance: It is the sum of the absolute differences of their Cartesian The stress criterion SS is the goodness-of-fit statistic that MDS tries to minimize
coordinates. and varies between 0 and 1, with values near 0 indicating better fit.

Rule of thumb for judging the fit of non-metric MDS:

 SS = 0%: perfect
 SS = 5%: good
R: function "dist"  SS = 10%: fair
 SS = 20%: poor
Python: scipy.spatial.distance.pdist
Simple matching distance (SMD) It is useful for determining how similar two data
Classical MDS: Ex: Maps sets are. An SMD value close to 1 indicates high similarity, while a value close to 0
indicates little or no similarity.

Binary data/ Nominal data

 Approximately preserves Euclidean distances. Closely related to SMD = variables in which units disagree/ total number of variables
PCA.
 R: cmdscale Jaccard distance (JD)Used in situations where 0 and 1 are not equally important /
 Python: sklearn.manifold.MDS(metric=True) informative (asymmetry)Only mutual presence (both = 1) is counted as a match

we only keep the q~q~ largest eigenvalues and corresponding

eigenvectors Boolean data / Binary data

JD= variables in which units disagree ignoring(0,0’s)/ total number of variables

How to choose the dimension q? an alternative method for choosing by looking
at a scree plot, which plots the
When to use JD over SMD:Jaccard Distance is more suitable when dealing
eigenvalues in descending
with sets of varying sizes. It accounts for the relative size of the intersection
order or 4 dimensions.
and union, making it robust to variations in set sizes.If you are interested in
emphasizing the common elements between sets rather than the overall
proportion of matching elements, Jaccard Distance is a better choice.
Example: Comparing customers by the products they bought. 1 = bought, 0 =
Non_metric MDS: did not buy. 1s are more important.

 Approximately preserves ranking of distances. Ordinal Data :

 low-dimensional representation
 Slower than classical MDS but relies on less assumptions.
 R: isoMDS
 Python: sklearn.manifold.MDS(metric=False)
 Emphasis on preserving small distances. Good for visualization of
highdimensional data. Clustering
 R: Rtsne in package Rtsne
 Python: sklearn.manifold.TSNE
 If pp = qq (equal) then it’ll be log(1) = 0, so cost function becomes
zero.
 Large pp value (similar high-dim) and small qq value (dissimilar low-
dim) → big penalty value of cost function
 Small pp value (dissimilar high-dim) and large qq value (similar low-
dim) → small penalty value of cost function
 Because of this reason, KL divergence is known as asymmetric.

MDS vs t-SNE

Gower distance to measure how different two records are. The records may t-SNE: It's like arranging items on a board so that similar things are close
contain combination of logical, categorical, numerical or text data. The distance together. Great for visualizing and clustering data with complex patterns, like
is always a number between 0 (identical) and 1 (maximally dissimilar) grouping similar images or text documents.Classical MDS: It's like placing
items on a board so that their distances match real-world relationships. Useful
R: function "daisy" in package "cluster". when you want to keep the original distances between data points, like in
geographic mapping or network analysis
Python: gower.gower_matrix(df)

binary_diff = 2 → variables in which units disagree

categorical_diff = 2 → variables in which units disagree
continuous_diff = abs(4.0-5.5) / 4.1 + abs(2.2 - 4.0) / 4.3 = 0.3659 + 0.4186 =
0.7845

Gower distance = (binary_diff + categorical_diff + continuous_diff ) /

cnt_variables = 4.7845 / 8 = 0.5980625

t-SNE : t-distributed stochastic neighbor embedding (t-SNE) is a dimension

reduction technique that is useful for visualizing high-dimensional data.
It focuses more on the local structure while still trying to keep (parts) of the
global structure. Perplexity between 5 and 50- the separation.
Scaling and distances DIstance between observations
Bottom line is: Scaling (or not) depends on the context.  Euclidean distance (continuous data)
 Manhattan distance (continuous data)
 Simple matching distance (discrete data)
If variables are not scaled:Variable with largest range has most weight.  Jaccard distance (discrete data)
Distances depend on the scales of the variables.  Gower distance (mixed data)
Distances between clusters
Scaling gives every variable equal weight.
1.- Distance between two clusters
Scale if:Variables measure different units (kg, meter, sec,…).You explicitly want to = minimal distance of all point pairs of
have equal weight for each variable. both clusters. Suitable for finding
stretched-out clusters 2.- Distance
Don’t scale if: Units are the same for all variables.
between two clusters

Often: Better to scale. = maximal distance of all point pairs of

both clusters. Suitable for finding compact but not necessarily well separated
clusters. 3. Average linkage. Distance between two clusters = average distance of
all point pairs of both clusters. Compromise between complete and single linkage.
Perplexity 5-10
Data type:
Agglomerative /Hierarchical Clustering)clustering: Why should we cat at the
large drop in the dendrogram? Because there the separation/distance between
Continuous data E.g. 3.2 cm, 10.08 cm, 4.8 cmCategorical data: Binary: 0 or 1, the clusters is the biggest → best clustering
A feature is present or not. Nominal: E.g. red, blue, green. Several categories
without an ordering. Ordinal: E.g. 1st, 2nd, 3rd. The ordering matters (e.g. 1st is
closer to 2nd than to 3rd, but we don’t know anything about how much closer). Interpreting clusters: Look at position of cluster centers (=means of variables per
cluster) or cluster representatives. If scaled data is used, it is often better for
interpretability to look at the original data instead of the scaled data. Apply a
Topic 5
dimension reduction technique (such as PCA or t-SNE) and plot the reduced
Cluster analysis: Summarizing data
dimensional data.Label / color the points according to the cluster they belong. In k-
means clustering, the mean is like the middle point of each group (cluster). It tells
Focus-> Agglomerative: Build up clusters from individual observations. Divisive: you where the center of the group is.
Start with whole group of observations and split off clusters.

First, make a hierarchical sequence of nested clusters. Start with nn separate Silhouette Plot
clusters (nn = number of observations) and merge clusters until only one cluster is S(i) large: well clustered.
left.Then, choose number of clusters.
S(i) small: badly clustered.
 Observations that are grouped together at some point cannot be separated S(i) negative: assigned to
anymore later. wrong cluster
 By cutting the tree at a certain height, one obtains a number of clusters. Average above 0.5 is
 Results depend on how we measure distances between observations and acceptable.
between clusters.
K- means: The cluster centers are the means. These can be arbitrary points in 3. For non-metric
space, i.e. do not have to be observations.k-medoids: The cluster centers are multidimensional
observations. Medoid: O a specific data point in a cluster that has the smallest scaling, using scaled
average distance to all other points in the cluster .Partitioning around medoids data gives the same results as using the original (un-scaled) data. (Yes)
(PAM) is the most common method. 4. For k-means, the cluster centers are necessarily close to data points. (No)

What happens to the WGSS if the numbers of clusters becomes larger? 5. When applying t-distributed stochastic neighbor embedding (t-SNE), points that
WGSS gets smaller.Kmean speed: Agglomerative are far apart in the original space have a large weight in the Kullback-Leibler
Clustering: O(n3)O(n3) K-Means divergence which is minimized by t-SNE. (No)
Clustering: O(n∗k∗i∗d)O(n∗k∗i∗d)
6. We assume that the variance of one variable is a lot larger than the variances of
all other variables. In this case, if we use partitioning around medoids (PAM) for
Use the number of clusters after last big drop in
clustering, the variable with the largest variance will dominate the analysis. (Yes)
WGSS-> Before 4 clusters the WGSS drops a lot
with every cluster we add.
7. In binary classification, linear discriminant analysis and logistic regression are
After 4 clusters the WGSS only gets marginal
called “linear” methods since the decision boundary depends linearly on the
smaller with every new cluster added, as we start breaking up clusters.
predictor variables. (Yes)
PAM vs K-means: PAM is better at handling outliers than k-means.
always has a 𝑛 depth of 2 where 𝑛 denotes the number of samples. (No)
8. A dendrogram obtained from applying the complete-linkage clustering method
PAM can work with various types of distances, not just Euclidean, unlike k-
means.PAM makes it easier to identify representative objects for each cluster,
which is useful for interpretation.Like k-means, the outcome of PAM also depends
9. A random forest consisting of 50 trees is grown. The split chosen in the root
on the initial values chosen.PAM requires more computational resources than k-
node for every tree must not be the same. (Yes)
means.
10. A benefit of a random forest regressor compared to a single regression tree is
GMM based clustering: new point x is "within" the normal distribution that it reduces the bias of the predictions. (No)
around μ2.
11. When applying PCA to unscaled data, the 1st principal component is likely to
have a high contribution from the variable that has the highest variance in the
DBSCAN: This is a clustering method based on density. It identifies clusters as
dataset. (Yes)
areas of high density, separated by areas of low density. Advantages of DBSCAN:
Arbitrary Shapes: It can find clusters of any shape. Outlier Resistance: DBSCAN is
In logistic regression, when all coefficients (including the intercept term) are 0, the
not easily affected by outliers.No Need for Cluster Number: Unlike other methods,
model predicts the log-odds of the outcome as 0. Since logistic regression uses
you don't have to specify the number of clusters in advance with DBSCAN. small ε:
the logistic function to map predicted log-odds to probabilities, this results in a
All point are noise points.Large ε: You could end up with just one big group
probability of 0.5, not 1.
because most points will be close to each other. All point are core points. it's good
for finding points that don't fit into any group.
The pruning of regression trees reduces their bias and increases their variance.
(Yes)
Classical multidimensional scaling (MDS) minimizes the distances between the
reconstructed locations. (No)

2. The goal of non-metric multidimensional scaling is to respect both the distances

and the ranking of the distances. (No)

Int3209 - Data Mining: Week 5: Classification Model Improvements
No ratings yet
Int3209 - Data Mining: Week 5: Classification Model Improvements
56 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
Module 2 Modified
No ratings yet
Module 2 Modified
67 pages
PWC
No ratings yet
PWC
24 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Longintro
No ratings yet
Longintro
60 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Chap 8
No ratings yet
Chap 8
9 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Chapter 15 - Machine Learning New
No ratings yet
Chapter 15 - Machine Learning New
19 pages
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
No ratings yet
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
22 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
ML Assignment-01
No ratings yet
ML Assignment-01
7 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
FDS Notes
No ratings yet
FDS Notes
6 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Unit IV
No ratings yet
Unit IV
36 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Unit Ivnotes
No ratings yet
Unit Ivnotes
19 pages
ML Cheat
No ratings yet
ML Cheat
9 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
4 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
Unit 2
No ratings yet
Unit 2
20 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Introduction To RPART
No ratings yet
Introduction To RPART
67 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
机器学习
No ratings yet
机器学习
41 pages
Unit II
No ratings yet
Unit II
34 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
22 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Unit 7 PDF
No ratings yet
Unit 7 PDF
15 pages
Strip Plot
No ratings yet
Strip Plot
6 pages
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjeepdf Download
100% (2)
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjeepdf Download
58 pages
Chapter Five Regression
No ratings yet
Chapter Five Regression
12 pages
Cars
No ratings yet
Cars
103 pages
Regression Exercise IAP 2013
No ratings yet
Regression Exercise IAP 2013
20 pages
Course Pack Correlation
No ratings yet
Course Pack Correlation
12 pages
Week 3 Forecasting Homework
No ratings yet
Week 3 Forecasting Homework
4 pages
2024 Chapter 1
No ratings yet
2024 Chapter 1
8 pages
Ch.7 Demand Forecasting
No ratings yet
Ch.7 Demand Forecasting
2 pages
Stata File Draft-1
No ratings yet
Stata File Draft-1
30 pages
Real Statistics Using Excel - Time Series Examples Workbook Charles Zaiontz, 27 July 2018
No ratings yet
Real Statistics Using Excel - Time Series Examples Workbook Charles Zaiontz, 27 July 2018
380 pages
Cole Et Al 2025 Practical Problems Estimating and Reporting Power When Hypotheses Are Embedded in Complex Statistical
No ratings yet
Cole Et Al 2025 Practical Problems Estimating and Reporting Power When Hypotheses Are Embedded in Complex Statistical
17 pages
Stat Set 7 2024
No ratings yet
Stat Set 7 2024
5 pages
Machine Learning and Econometrics EF
No ratings yet
Machine Learning and Econometrics EF
270 pages
LEaggue
No ratings yet
LEaggue
41 pages
Cvms
No ratings yet
Cvms
37 pages
GEED 10053 - Lesson 4.4 To 4.5
No ratings yet
GEED 10053 - Lesson 4.4 To 4.5
71 pages
Dunn Test PDF
No ratings yet
Dunn Test PDF
6 pages
MCQ - MSA Test 1 2025 With Key
No ratings yet
MCQ - MSA Test 1 2025 With Key
6 pages
Aulia Asmarani2308
No ratings yet
Aulia Asmarani2308
13 pages
Data Uji Validitas Dan Reliabilitas
No ratings yet
Data Uji Validitas Dan Reliabilitas
35 pages
Classification - KNN
No ratings yet
Classification - KNN
8 pages
STAT 302 Midterm 1
No ratings yet
STAT 302 Midterm 1
10 pages
HW 05
No ratings yet
HW 05
2 pages
ANOVA Table and Prediction Intervals
No ratings yet
ANOVA Table and Prediction Intervals
7 pages
One-Sample Kolmogorov-Smirnov Test: C C C C C
No ratings yet
One-Sample Kolmogorov-Smirnov Test: C C C C C
6 pages
How To Do ANCOVA Problems in SPSS
No ratings yet
How To Do ANCOVA Problems in SPSS
9 pages
Reliability Scale: All Variables: Case Processing Summary
No ratings yet
Reliability Scale: All Variables: Case Processing Summary
3 pages
Chapter 4 3is
No ratings yet
Chapter 4 3is
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Ml2 Summary

Uploaded by

Ml2 Summary

Uploaded by

Topic 1 Error rate = (7+6+3+4+3+1) / 100 = 24 / 100 = 0.

n rows ⟺⟺ n observations columns ⟺⟺ q variables

If condition is true, go to the left.

Distances for continuous data

Euclidean distance serves as a way to maintain the distances between data

Rule of thumb for judging the fit of non-metric MDS:

Binary data/ Nominal data

we only keep the q~q~ largest eigenvalues and corresponding

JD= variables in which units disagree ignoring(0,0’s)/ total number of variables

 Approximately preserves ranking of distances. Ordinal Data :

binary_diff = 2 → variables in which units disagree

Gower distance = (binary_diff + categorical_diff + continuous_diff ) /

t-SNE : t-distributed stochastic neighbor embedding (t-SNE) is a dimension

Often: Better to scale. = maximal distance of all point pairs of

2. The goal of non-metric multidimensional scaling is to respect both the distances

You might also like