DSEnd
DSEnd
1. In-Sample Risk
Definition:
In-sample risk is the error measured on the training data. It tells how well the model fits the
data it learned from. Low in-sample risk means the model fits training data well. But low in-
sample risk doesn’t always mean good generalization. We must balance it with out-of-
sample risk.
Points:
Definition:
PRESS is used to check how good a regression model is at predicting unseen data. It removes
one data point at a time, fits the model, and predicts the removed one. The squared errors
are added. Lower PRESS means better predictive power.
Points:
1. Form of cross-validation.
9. Computationally expensive.
Definition:
AIC helps compare different regression models. It gives a score based on model accuracy and
number of parameters. Lower AIC is better. It balances accuracy with simplicity. Too many
parameters will increase AIC.
Points:
2. k = number of parameters.
3. L = likelihood function.
4. Penalizes complexity.
Definition:
Categorical variables like gender or city names must be converted into numbers to use in
regression. We use techniques like One-Hot Encoding. This turns categories into 0s and 1s.
This helps the model process them correctly.
Points:
8. Increases dimensionality.
Definition:
R² tells how much of the total variation in output is explained by the input variables. It
ranges from 0 to 1. Higher values mean better fit. R² = 0 means no relation. R² = 1 means
perfect prediction.
Points:
1. Formula: R² = 1 - (RSS/TSS).
5. Can’t go above 1.
Points:
4. Reduces overfitting.
7. Nested Models
Definition:
A nested model is a simpler model inside a complex one. Example: model with fewer
features is nested in full model. They help compare model performance. Used in hypothesis
testing.
Points:
Definition:
Parameter estimation means finding best line (slope and intercept) that fits data. We use
least squares method to minimize errors. These values are then used for predictions.
Points:
3. Formula: Y = β0 + β1X.
9. Influenced by multicollinearity.
9. Bias-Variance Tradeoff
Definition:
Bias means errors due to wrong assumptions. Variance means errors due to too much
sensitivity to data. A good model balances both. High bias = underfitting. High variance =
overfitting.
Points:
Definition:
Python allows building regression models using sklearn. Fit data to find coefficients. Then
predict and check performance using R², MAE, MSE, etc.
Steps/Points:
9. Try cross-validation.
Definition:
A confidence interval (CI) gives a range of values for a prediction or parameter, showing how
uncertain we are. In regression, it shows where the true value is likely to fall. A 95% CI
means there’s a 95% chance the real value is in that range.
Points:
2. Model Validation
Definition:
Model validation checks if the model works well on new data. It ensures the model is not
just memorizing training data. Common methods include train-test split and cross-validation.
Points:
1. Prevents overfitting.
3. Prediction Interval
Definition:
A prediction interval tells where a new data point is likely to fall. It’s wider than a confidence
interval because it includes both model error and random error in new data.
Points:
7. Helps in decision-making.
Definition:
We test if one model is better than the other using F-tests or t-tests. Hypothesis testing
checks if the added variables improve the model significantly.
Points:
9. Reduces overfitting.
5. ANOVA in Regression
Definition:
ANOVA (Analysis of Variance) checks if regression model explains a significant amount of
variance in data. It breaks down total variation into parts: explained and residual.
Points:
2. F-test is used.
Definition:
GLMs are an extension of linear models that can handle different types of data (like binary or
count). They use link functions to connect predictors to response.
Points:
1. Examples: logistic, Poisson regression.
7. Variable Selection
Definition:
Variable selection means choosing only important features for the model. It removes
irrelevant or redundant features, improving model accuracy and simplicity.
Points:
6. Reduces multicollinearity.
Points:
9. Multicollinearity
Definition:
Multicollinearity means that some input variables are highly related. It makes coefficient
estimates unstable and hard to interpret.
Points:
Definition:
Confidence interval gives a range for average prediction. Prediction interval gives a range for
a new single prediction. Prediction interval is always wider.
Points:
7. CI is narrow, PI is wide.
8. Different formulas.
Definition:
Random Forest is an ensemble learning method that uses multiple decision trees to classify
data. Each tree votes, and the majority vote is taken. It improves accuracy and reduces
overfitting.
Points:
Definition:
Bayes’ Rule is a probability-based method. It calculates the chance of a class given the
features. It’s used in Naive Bayes classifier.
Points:
Definition:
Logistic Regression predicts class using a sigmoid function. LDA (Linear Discriminant Analysis)
assumes data is normally distributed and maximizes class separability.
Points:
Definition:
KNN classifies based on closest neighbors using a distance formula. Distance metric decides
how “closeness” is calculated. Common ones are Euclidean and Manhattan.
Points:
Definition:
SVM finds the best line or hyperplane that separates classes with the largest margin. It can
also work with non-linear data using kernels.
Points:
6. Classification Metrics
Definition:
Metrics like Accuracy, Precision, Recall, and F1-score help evaluate classification models.
Each gives different insight into performance.
Points:
1. Accuracy = (TP+TN)/(Total).
4. F1 = 2*(Precision*Recall)/(Precision + Recall).
8. F1 balances both.
Definition:
SoftMax converts outputs into probabilities for each class. It’s used in logistic regression and
neural networks when there are more than two classes.
Points:
1. Outputs sum to 1.
4. Helps in decision-making.
Points:
3. Introduces randomness.
9. sklearn: SGDClassifier.
Definition:
Sklearn allows you to build, train, and compare many classifiers like KNN, SVM, RF, Logistic,
etc., using accuracy and F1-score.
Steps/Points:
5. Compare results.
Definition:
Choosing the right evaluation metric is crucial. For imbalanced data, accuracy can be
misleading. Metrics affect which model looks best.
Points:
Definition:
K-Means groups data by assigning each point to the nearest centroid. Then, new centroids
are calculated. This process repeats. One iteration means assigning and updating once.
Steps:
Definition:
This method merges clusters based on the closest points. In single linkage, the shortest
distance between any two points in two clusters is used.
Steps:
Definition:
K-Means is partitional, Hierarchical is tree-based. K-Means divides data into K groups.
Hierarchical builds a tree of clusters.
Comparison:
3. K-Means is faster.
8. Hierarchical is deterministic.
Definition:
Complete linkage merges clusters based on the farthest distance between any two points.
The dendrogram shows how clusters form.
Steps:
Definition:
Partitioning clustering (like K-Means) divides data into non-overlapping clusters. Hierarchical
builds a tree (dendrogram) by joining points step-by-step.
Points:
3. Partitioning is faster.
Definition:
Each clustering method is suitable for different tasks. K-Means is fast for big data.
Hierarchical is used when structure matters.
Applications:
Definition:
K-Means is a simple and fast clustering method but has some drawbacks like being sensitive
to initial centroids.
Pros:
Cons:
6. Needs number of clusters in advance.
7. Sensitive to initial centroids.
8. Not good for non-spherical clusters.
9. Can converge to local minimum.
10. Doesn’t handle outliers well.
Definition:
Choosing the best number of clusters (k) is critical. Too few clusters oversimplify data; too
many overfit.
Methods:
Definition:
Distance metric defines how similarity is calculated between points. It affects cluster shapes
and results.
Common Metrics:
1. Euclidean (straight-line).
2. Manhattan (grid-based).
3. Cosine (angle-based).
4. Mahalanobis (scale-sensitive).
Types:
1. K-Means (partitional).
2. Hierarchical (agglomerative/divisive).
3. DBSCAN (density-based).
5. Mean-Shift (centroid-based).
Definition:
TD Learning is a method where an agent learns by updating its value estimates based on the
difference between predictions over time. It combines ideas from Monte Carlo and Dynamic
Programming.
Points:
Definition:
Monte Carlo uses full episodes for learning. Dynamic Programming (DP) requires a model of
the environment and solves using value updates.
Comparison:
Definition:
Planning uses a model of the environment to decide actions before taking them. It simulates
outcomes to find better policies.
Points:
4. Dimensions of RL Problems
Definition:
RL problems vary by state space, action types, time horizons, etc. Each dimension defines
the complexity of the learning task.
Key Dimensions:
3. Deterministic vs stochastic.
5. Single-agent vs multi-agent.
9. Model-free vs model-based.
Definition:
Tabular methods store values in a table. Approximate methods use functions to estimate
values when the state space is too large.
Comparison:
6. Eligibility Traces
Definition:
Eligibility traces help credit past states for current rewards. They mix TD and Monte Carlo
learning and speed up learning.
Points:
6. Faster convergence.
Definition:
Both are planning methods using dynamic programming. Value iteration updates values
directly. Policy iteration updates policy based on values.
Comparison:
8. Exploration vs Exploitation
Definition:
Exploration is trying new actions. Exploitation is choosing the best-known action. RL needs
to balance both for learning.
Points:
9. Q-Learning
Definition:
Q-learning is a model-free RL method. It learns action values (Q-values) to find the best
policy. It’s an off-policy algorithm.
Points:
10. Application of RL
Definition:
RL is used where learning from feedback is needed. It helps in decision-making tasks with
trial-and-error approach.
Applications:
8. Industrial automation.