0% found this document useful (0 votes)
19 views30 pages

DSEnd

The document covers key concepts in linear regression and model analysis, including in-sample risk, PRESS, AIC, and the importance of categorical features. It also discusses model validation techniques, confidence intervals, and variable selection methods. Additionally, it introduces classification methods such as Random Forest and Bayes' Rule, comparing logistic regression and LDA.

Uploaded by

rajeev.nair209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

DSEnd

The document covers key concepts in linear regression and model analysis, including in-sample risk, PRESS, AIC, and the importance of categorical features. It also discusses model validation techniques, confidence intervals, and variable selection methods. Additionally, it introduces classification methods such as Random Forest and Bayes' Rule, comparing logistic regression and LDA.

Uploaded by

rajeev.nair209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 1: Linear Regression & Model Analysis

1. In-Sample Risk

Definition:
In-sample risk is the error measured on the training data. It tells how well the model fits the
data it learned from. Low in-sample risk means the model fits training data well. But low in-
sample risk doesn’t always mean good generalization. We must balance it with out-of-
sample risk.

Points:

1. Shows model performance on training data.

2. Lower value means better training fit.

3. Cannot measure future prediction performance.

4. Overfitting happens if this is too low.

5. Useful for initial evaluation of models.

6. Must compare with test data error too.

7. Used with cross-validation to avoid overfitting.

8. Can be calculated as mean squared error.

9. Common in model comparison.

10. Should not be the only metric used.

2. PRESS (Predictive Residual Sum of Squares)

Definition:
PRESS is used to check how good a regression model is at predicting unseen data. It removes
one data point at a time, fits the model, and predicts the removed one. The squared errors
are added. Lower PRESS means better predictive power.

Points:

1. Form of cross-validation.

2. Tests model’s prediction on unseen point.

3. One point left out each time.

4. Squared prediction error is noted.

5. All such errors are summed.


6. Result = PRESS score.

7. Lower score = better model.

8. Detects overfitting or underfitting.

9. Computationally expensive.

10. Used in model comparison.

3. Akaike Information Criterion (AIC)

Definition:
AIC helps compare different regression models. It gives a score based on model accuracy and
number of parameters. Lower AIC is better. It balances accuracy with simplicity. Too many
parameters will increase AIC.

Points:

1. Formula: AIC = 2k - 2ln(L).

2. k = number of parameters.

3. L = likelihood function.

4. Penalizes complexity.

5. Helps avoid overfitting.

6. Lower AIC = better model.

7. Used in regression and time series.

8. Not used alone — just for comparison.

9. Easy to compute with software.

10. Prefers simple but accurate models.

4. Categorical Features in Regression

Definition:
Categorical variables like gender or city names must be converted into numbers to use in
regression. We use techniques like One-Hot Encoding. This turns categories into 0s and 1s.
This helps the model process them correctly.

Points:

1. Regression needs numeric input.


2. Categorical data must be encoded.

3. One-Hot Encoding creates new columns.

4. Example: “Male” and “Female” become 2 columns.

5. Dummy encoding uses n-1 columns to avoid issues.

6. Avoids false relationships.

7. Helps capture category effects.

8. Increases dimensionality.

9. Done with pandas or sklearn.

10. Important step in preprocessing.

5. Coefficient of Determination (R²)

Definition:
R² tells how much of the total variation in output is explained by the input variables. It
ranges from 0 to 1. Higher values mean better fit. R² = 0 means no relation. R² = 1 means
perfect prediction.

Points:

1. Formula: R² = 1 - (RSS/TSS).

2. RSS = residual sum of squares.

3. TSS = total sum of squares.

4. Higher R² = better model.

5. Can’t go above 1.

6. Can be negative for poor models.

7. Not good for non-linear models.

8. Doesn’t tell about overfitting.

9. Use adjusted R² for multiple inputs.

10. Sklearn gives R² directly.

6. Cross-Validation in Linear Models


Definition:
Cross-validation divides data into parts to test model reliability. One part is used to test,
others to train. This helps avoid overfitting. Common type: k-fold cross-validation. It
improves trust in model performance.

Points:

1. Splits data into k equal parts.

2. Trains on k-1 parts, tests on 1.

3. Repeats k times, changes test part.

4. Reduces overfitting.

5. Gives stable performance estimate.

6. Works for small datasets too.

7. Used in model selection.

8. Avoids bias from single test set.

9. Helps tune parameters.

10. Built-in in sklearn.

7. Nested Models

Definition:
A nested model is a simpler model inside a complex one. Example: model with fewer
features is nested in full model. They help compare model performance. Used in hypothesis
testing.

Points:

1. Model A ⊆ Model B (nested).

2. Used to test if adding variables helps.

3. ANOVA can compare nested models.

4. Simpler model is null hypothesis.

5. Useful in stepwise regression.

6. Checks if added variables are useful.

7. If not, avoid complex models.

8. Used to reduce model complexity.


9. Affects p-values in regression.

10. Found in variable selection process.

8. Parameter Estimation in Linear Regression

Definition:
Parameter estimation means finding best line (slope and intercept) that fits data. We use
least squares method to minimize errors. These values are then used for predictions.

Points:

1. Parameters: slope (β1), intercept (β0).

2. Use least squares to minimize error.

3. Formula: Y = β0 + β1X.

4. Solved using matrix algebra.

5. Estimated using training data.

6. Estimates may have confidence intervals.

7. Affects prediction accuracy.

8. Larger data = better estimates.

9. Influenced by multicollinearity.

10. Done using sklearn or statsmodels.

9. Bias-Variance Tradeoff

Definition:
Bias means errors due to wrong assumptions. Variance means errors due to too much
sensitivity to data. A good model balances both. High bias = underfitting. High variance =
overfitting.

Points:

1. High bias = too simple model.

2. High variance = too complex model.

3. Must find balance between both.

4. Goal: low total error.


5. Regularization reduces variance.

6. Complex models need more data.

7. Bias causes systematic errors.

8. Variance causes random errors.

9. Cross-validation helps balance.

10. Key in model selection.

10. Linear Regression in Python

Definition:
Python allows building regression models using sklearn. Fit data to find coefficients. Then
predict and check performance using R², MAE, MSE, etc.

Steps/Points:

1. Import LinearRegression from sklearn.

2. Load data with pandas.

3. Split data with train_test_split.

4. Fit model with .fit(X, y).

5. Predict with .predict(X_test).

6. Measure R², MAE, MSE.

7. Visualize with matplotlib.

8. Check residuals for errors.

9. Try cross-validation.

10. Export model if needed.


UNIT 2: Linear Models & Inference
1. Confidence Interval in Regression

Definition:
A confidence interval (CI) gives a range of values for a prediction or parameter, showing how
uncertain we are. In regression, it shows where the true value is likely to fall. A 95% CI
means there’s a 95% chance the real value is in that range.

Points:

1. Used to estimate prediction accuracy.

2. Wider CI means more uncertainty.

3. Depends on data size and variation.

4. Formula uses standard error.

5. Can be for slope, intercept, or prediction.

6. Common confidence level: 95%.

7. Helps in statistical inference.

8. Important for decision-making.

9. Shows reliability of regression results.

10. Can be computed in Python using statsmodels.

2. Model Validation

Definition:
Model validation checks if the model works well on new data. It ensures the model is not
just memorizing training data. Common methods include train-test split and cross-validation.

Points:

1. Prevents overfitting.

2. Confirms real-world performance.

3. Uses test data not seen by model.

4. Cross-validation improves reliability.

5. Can use metrics like MAE, RMSE, R².

6. Needed before using model in production.


7. Validates assumptions in regression.

8. Can show underfitting or overfitting.

9. Includes visual checks like residual plots.

10. Key for model trustworthiness.

3. Prediction Interval

Definition:
A prediction interval tells where a new data point is likely to fall. It’s wider than a confidence
interval because it includes both model error and random error in new data.

Points:

1. CI = range for average prediction.

2. PI = range for new individual prediction.

3. Wider due to more uncertainty.

4. Depends on confidence level (e.g., 95%).

5. Useful for risk analysis.

6. Computed with standard error and t-score.

7. Helps in decision-making.

8. Shows reliability of future predictions.

9. Can be visualized on scatter plots.

10. Calculated using statsmodels in Python.

4. Compare Two Models Using Hypothesis Testing

Definition:
We test if one model is better than the other using F-tests or t-tests. Hypothesis testing
checks if the added variables improve the model significantly.

Points:

1. Null Hypothesis: simpler model is enough.

2. Alternative Hypothesis: complex model is better.

3. Use F-test to compare.


4. Check if added variables are useful.

5. p-value < 0.05 = reject null.

6. Must use nested models.

7. Used in feature selection.

8. Helps simplify model.

9. Reduces overfitting.

10. Common in multiple regression.

5. ANOVA in Regression

Definition:
ANOVA (Analysis of Variance) checks if regression model explains a significant amount of
variance in data. It breaks down total variation into parts: explained and residual.

Points:

1. Total = Explained + Residual.

2. F-test is used.

3. Checks if regression is useful.

4. Helps compare models.

5. High F = model is good.

6. Low p-value means variables are important.

7. ANOVA table shows SS, df, MS, F.

8. Found in statsmodels summary.

9. Used in model selection.

10. Works for both simple and multiple regression.

6. Generalized Linear Models (GLM)

Definition:
GLMs are an extension of linear models that can handle different types of data (like binary or
count). They use link functions to connect predictors to response.

Points:
1. Examples: logistic, Poisson regression.

2. Can model binary outcomes.

3. Not limited to normal distribution.

4. Uses link functions like logit, log.

5. Flexible for real-world problems.

6. Common in healthcare, insurance, etc.

7. Built using statsmodels or R.

8. GLM = linear predictor + link + distribution.

9. Better for non-normal responses.

10. Used in classification problems.

7. Variable Selection

Definition:
Variable selection means choosing only important features for the model. It removes
irrelevant or redundant features, improving model accuracy and simplicity.

Points:

1. Helps avoid overfitting.

2. Improves model interpretation.

3. Methods: forward, backward, stepwise.

4. Uses p-values, AIC, BIC.

5. Can use Lasso for automatic selection.

6. Reduces multicollinearity.

7. Faster model training.

8. Uses domain knowledge too.

9. Important in high-dimensional data.

10. Done with feature selection libraries in Python.

8. Confidence Interval in Python


Definition:
In Python, we use statsmodels to calculate confidence intervals for regression coefficients or
predictions. It shows the reliability of estimates.

Points:

1. Import OLS from statsmodels.api.

2. Fit model and call .conf_int().

3. Gives lower and upper bounds.

4. Default is 95% confidence.

5. Can adjust level as needed.

6. Works for slope, intercept, predictions.

7. Useful for reporting.

8. Can be visualized with plots.

9. Helps interpret results.

10. Easy and reliable for exams.

9. Multicollinearity

Definition:
Multicollinearity means that some input variables are highly related. It makes coefficient
estimates unstable and hard to interpret.

Points:

1. Affects multiple regression.

2. Leads to large standard errors.

3. Hard to know which variable is useful.

4. Detected using VIF (Variance Inflation Factor).

5. VIF > 10 = serious multicollinearity.

6. Solved by removing variables.

7. PCA can help reduce dimensions.

8. Regularization (like Ridge) helps.

9. Does not affect prediction, just inference.


10. Must be fixed for good model explanation.

10. Confidence vs Prediction Intervals

Definition:
Confidence interval gives a range for average prediction. Prediction interval gives a range for
a new single prediction. Prediction interval is always wider.

Points:

1. CI = average value range.

2. PI = new value range.

3. PI includes more error.

4. CI is more reliable for means.

5. PI is used for forecasting.

6. Both need confidence level (like 95%).

7. CI is narrow, PI is wide.

8. Different formulas.

9. Both important in regression.

10. Statsmodels can calculate both.


UNIT 3: Classification
1. Random Forest Classifier

Definition:
Random Forest is an ensemble learning method that uses multiple decision trees to classify
data. Each tree votes, and the majority vote is taken. It improves accuracy and reduces
overfitting.

Points:

1. Combines many decision trees.

2. Each tree sees random data and features.

3. More trees = better accuracy.

4. Uses majority voting for classification.

5. Reduces overfitting compared to one tree.

6. Works well with missing data.

7. Handles both classification and regression.

8. Important for large datasets.

9. Feature importance can be seen.

10. Built with RandomForestClassifier in sklearn.

2. Bayes' Rule in Classification

Definition:
Bayes’ Rule is a probability-based method. It calculates the chance of a class given the
features. It’s used in Naive Bayes classifier.

Points:

1. Formula: P(A|B) = [P(B|A) * P(A)] / P(B).

2. Predicts probability of class.

3. Assumes features are independent.

4. Used in spam filters and text classification.

5. Very fast and simple algorithm.

6. Works well with large text data.


7. Handles multiclass problems easily.

8. P(A) is prior, P(B|A) is likelihood.

9. Naive Bayes is its application.

10. sklearn has GaussianNB for this.

3. Logistic Regression vs LDA

Definition:
Logistic Regression predicts class using a sigmoid function. LDA (Linear Discriminant Analysis)
assumes data is normally distributed and maximizes class separability.

Points:

1. Logistic: directly models probability.

2. LDA: models data distribution first.

3. Logistic doesn’t assume equal variance.

4. LDA works well with small datasets.

5. Logistic is more flexible.

6. LDA is good when assumptions are true.

7. Logistic used in binary/multiclass classification.

8. LDA used in pattern recognition.

9. Both output probabilities.

10. Use sklearn for both models.

4. Distance Metrics in KNN

Definition:
KNN classifies based on closest neighbors using a distance formula. Distance metric decides
how “closeness” is calculated. Common ones are Euclidean and Manhattan.

Points:

1. Euclidean: straight-line distance.

2. Manhattan: block-wise path.

3. Affects neighbor selection.


4. Important in high dimensions.

5. Affects model accuracy.

6. Standardize data before using.

7. Cosine distance for text data.

8. Choose based on data nature.

9. sklearn allows distance choice.

10. Closer neighbors = stronger influence.

5. Support Vector Machines (SVM)

Definition:
SVM finds the best line or hyperplane that separates classes with the largest margin. It can
also work with non-linear data using kernels.

Points:

1. Maximizes margin between classes.

2. Works well in high dimensions.

3. Uses kernel trick for non-linear data.

4. Common kernels: linear, RBF, polynomial.

5. Can be used for regression too.

6. Less prone to overfitting.

7. Needs proper scaling of data.

8. Sensitive to parameter choice (C, gamma).

9. sklearn: SVC class.

10. Visualizable for 2D data.

6. Classification Metrics

Definition:
Metrics like Accuracy, Precision, Recall, and F1-score help evaluate classification models.
Each gives different insight into performance.

Points:
1. Accuracy = (TP+TN)/(Total).

2. Precision = TP / (TP + FP).

3. Recall = TP / (TP + FN).

4. F1 = 2*(Precision*Recall)/(Precision + Recall).

5. Use confusion matrix to calculate.

6. Precision important for false positives.

7. Recall important for false negatives.

8. F1 balances both.

9. ROC-AUC also useful.

10. Use sklearn’s classification_report.

7. SoftMax for Multi-Class Classification

Definition:
SoftMax converts outputs into probabilities for each class. It’s used in logistic regression and
neural networks when there are more than two classes.

Points:

1. Outputs sum to 1.

2. Gives probability of each class.

3. Generalization of sigmoid function.

4. Helps in decision-making.

5. Used in neural nets' output layer.

6. Good for multiclass problems.

7. Each class has its own score.

8. Max probability is predicted class.

9. Found in sklearn.linear_model & TensorFlow.

10. Log loss uses SoftMax outputs.

8. Stochastic Gradient Descent (SGD)


Definition:
SGD is a method to train models by updating weights using one data point at a time. It’s fast
and good for large datasets.

Points:

1. One sample is used per iteration.

2. Faster than batch gradient descent.

3. Introduces randomness.

4. Good for online learning.

5. Used in Logistic Regression, SVM.

6. Needs learning rate tuning.

7. May not always converge smoothly.

8. Common in deep learning.

9. sklearn: SGDClassifier.

10. Combines speed with decent accuracy.

9. Compare Multiple Classification Models in Sklearn

Definition:
Sklearn allows you to build, train, and compare many classifiers like KNN, SVM, RF, Logistic,
etc., using accuracy and F1-score.

Steps/Points:

1. Use train_test_split for data.

2. Fit models like SVC, RandomForest.

3. Predict on test data.

4. Evaluate using accuracy, F1.

5. Compare results.

6. Use cross-validation if needed.

7. Plot confusion matrix.

8. Visualize using ROC curve.

9. Choose best based on metric.


10. Use pipelines for preprocessing.

10. Effect of Evaluation Metric on Model Selection

Definition:
Choosing the right evaluation metric is crucial. For imbalanced data, accuracy can be
misleading. Metrics affect which model looks best.

Points:

1. Accuracy not enough for imbalanced data.

2. Use precision-recall for fraud/spam detection.

3. F1-score balances both.

4. ROC-AUC for probabilistic outputs.

5. Metric must match business goal.

6. Different models rank differently by metric.

7. Choose metric before modeling.

8. Confusion matrix helps pick metric.

9. Sklearn supports many metrics.

10. Visual plots help understand better.


UNIT 4: Clustering
1. One Iteration of K-Means Clustering (k=2)

Definition:
K-Means groups data by assigning each point to the nearest centroid. Then, new centroids
are calculated. This process repeats. One iteration means assigning and updating once.

Steps:

1. Initial centroids: (1,2) and (5,4).

2. Compute distances to centroids.

3. Assign each point to the closest centroid.

4. Group 1: (1,2), (2,1); Group 2: (4,5), (5,4).

5. Compute mean of each group.

6. New centroids: Group 1 → (1.5, 1.5), Group 2 → (4.5, 4.5).

7. One iteration is complete.

8. Distances use Euclidean formula.

9. Next iteration repeats this.

10. Helps in learning clustering step-by-step.

2. Single Linkage Hierarchical Clustering

Definition:
This method merges clusters based on the closest points. In single linkage, the shortest
distance between any two points in two clusters is used.

Steps:

1. Start with all points as separate clusters.

2. Find closest pair of clusters.

3. Merge them into one.

4. Repeat until all points are in one cluster.

5. Plot these merges in a dendrogram.

6. Points: (1,2), (2,1), (4,5), (5,4).

7. Merge (1,2) and (2,1) first.


8. Then (4,5) and (5,4).

9. Finally, merge both groups.

10. Dendrogram shows all these steps.

3. K-Means vs Hierarchical Clustering

Definition:
K-Means is partitional, Hierarchical is tree-based. K-Means divides data into K groups.
Hierarchical builds a tree of clusters.

Comparison:

1. K-Means needs number of clusters (k).

2. Hierarchical doesn’t need k initially.

3. K-Means is faster.

4. Hierarchical gives dendrogram.

5. K-Means better for large data.

6. Hierarchical shows nested structure.

7. K-Means is sensitive to initial points.

8. Hierarchical is deterministic.

9. K-Means updates centroids.

10. Use based on data size and clarity needed.

4. Complete Linkage Dendrogram

Definition:
Complete linkage merges clusters based on the farthest distance between any two points.
The dendrogram shows how clusters form.

Steps:

1. Points: (1,2), (1.5,1.8), (5,8), (8,8).

2. Calculate all pairwise distances.

3. Merge closest points: (1,2) & (1.5,1.8).

4. Merge (5,8) & (8,8).


5. Compute distances between merged groups.

6. Use maximum distance to merge.

7. Repeat until one cluster.

8. Dendrogram reflects these steps.

9. Shape of dendrogram depends on linkage.

10. Helpful for cluster selection.

5. Partitioning vs Hierarchical Clustering

Definition:
Partitioning clustering (like K-Means) divides data into non-overlapping clusters. Hierarchical
builds a tree (dendrogram) by joining points step-by-step.

Points:

1. Partitioning gives flat clusters.

2. Hierarchical gives nested clusters.

3. Partitioning is faster.

4. Hierarchical is more informative.

5. Partitioning needs predefined k.

6. Hierarchical can visualize cluster formation.

7. Partitioning uses centroids.

8. Hierarchical uses distances.

9. Partitioning can’t undo steps.

10. Hierarchical is flexible for small data.

6. K-Means vs Hierarchical – Real-World Use

Definition:
Each clustering method is suitable for different tasks. K-Means is fast for big data.
Hierarchical is used when structure matters.

Applications:

1. K-Means for customer segmentation.


2. K-Means for image compression.

3. Hierarchical for genetic studies.

4. Hierarchical for document clustering.

5. K-Means in market analysis.

6. Hierarchical for social network analysis.

7. K-Means for sensor grouping.

8. Hierarchical for biological taxonomy.

9. Choose based on data size and clarity.

10. Use dendrogram when relationships matter.

7. Advantages & Limitations of K-Means

Definition:
K-Means is a simple and fast clustering method but has some drawbacks like being sensitive
to initial centroids.

Pros:

1. Simple and easy to implement.

2. Works well on large datasets.

3. Fast and scalable.

4. Produces tight clusters.

5. Easy to interpret results.

Cons:
6. Needs number of clusters in advance.
7. Sensitive to initial centroids.
8. Not good for non-spherical clusters.
9. Can converge to local minimum.
10. Doesn’t handle outliers well.

8. Optimal Number of Clusters in K-Means

Definition:
Choosing the best number of clusters (k) is critical. Too few clusters oversimplify data; too
many overfit.
Methods:

1. Elbow method: plot SSE vs k.

2. Look for "elbow" in the curve.

3. Silhouette score: measures how well clusters are formed.

4. Gap statistic: compares with random data.

5. Try different k values.

6. Visual inspection helps.

7. Domain knowledge can guide.

8. Combine methods for confidence.

9. Sklearn has silhouette_score.

10. No fixed k — depends on data.

9. Distance Metric in Clustering

Definition:
Distance metric defines how similarity is calculated between points. It affects cluster shapes
and results.

Common Metrics:

1. Euclidean (straight-line).

2. Manhattan (grid-based).

3. Cosine (angle-based).

4. Mahalanobis (scale-sensitive).

5. Choice affects clustering outcome.

6. Euclidean good for continuous data.

7. Cosine used in text clustering.

8. Normalize data before using.

9. Important in hierarchical linkage.

10. Use based on data type.

10. Types of Clustering Algorithms


Definition:
Clustering groups similar data points. Different algorithms suit different data types and goals.

Types:

1. K-Means (partitional).

2. Hierarchical (agglomerative/divisive).

3. DBSCAN (density-based).

4. OPTICS (density with ordering).

5. Mean-Shift (centroid-based).

6. Gaussian Mixture (probabilistic).

7. Spectral Clustering (graph-based).

8. Fuzzy C-Means (soft clustering).

9. BIRCH (for large datasets).

10. Choice depends on data shape and size.

UNIT 5: Reinforcement Learning (RL)


1. Temporal Difference (TD) Learning

Definition:
TD Learning is a method where an agent learns by updating its value estimates based on the
difference between predictions over time. It combines ideas from Monte Carlo and Dynamic
Programming.

Points:

1. Learns from incomplete episodes.

2. Updates after every step.

3. TD Error = reward + γ * V(next) – V(current).

4. No need for full environment knowledge.

5. Faster than Monte Carlo.

6. Example: TD(0) is a basic form.

7. Used in Q-learning and SARSA.

8. Balances prediction and reward.

9. Common in games and simulations.

10. Improves over time with exploration.

2. Monte Carlo vs Dynamic Programming

Definition:
Monte Carlo uses full episodes for learning. Dynamic Programming (DP) requires a model of
the environment and solves using value updates.

Comparison:

1. MC: Learns from sample episodes.

2. DP: Needs environment transition probabilities.

3. MC: Doesn’t need environment model.

4. DP: More theoretical, exact.

5. MC: Slower convergence.

6. DP: Works with full knowledge.

7. MC: Useful in unknown environments.

8. DP: Needs complete info.

9. MC: Uses averages for learning.

10. MC better for real-time learning.


3. Planning in RL

Definition:
Planning uses a model of the environment to decide actions before taking them. It simulates
outcomes to find better policies.

Points:

1. Uses environment model for decision-making.

2. Opposite of model-free RL.

3. Example: value iteration, policy iteration.

4. Combines prediction and control.

5. Faster learning with models.

6. Common in robotics, games.

7. Used in hybrid RL approaches.

8. Helps optimize long-term reward.

9. Requires accurate model.

10. Dyna-Q combines planning with Q-learning.

4. Dimensions of RL Problems

Definition:
RL problems vary by state space, action types, time horizons, etc. Each dimension defines
the complexity of the learning task.

Key Dimensions:

1. Finite vs infinite time horizon.

2. Discrete vs continuous states/actions.

3. Deterministic vs stochastic.

4. Fully observable vs partially observable.

5. Single-agent vs multi-agent.

6. Stationary vs changing environments.

7. Episodic vs continuous tasks.


8. Short-term vs long-term goals.

9. Model-free vs model-based.

10. Online vs offline learning.

5. Tabular vs Approximate Methods

Definition:
Tabular methods store values in a table. Approximate methods use functions to estimate
values when the state space is too large.

Comparison:

1. Tabular stores exact values.

2. Approximate uses neural nets or functions.

3. Tabular works for small problems.

4. Approximate needed for big problems.

5. Approximate generalizes to unseen states.

6. Tabular is faster but limited.

7. Approximate can be unstable.

8. Tabular is easy to debug.

9. Deep Q-Networks use approximation.

10. Choose based on problem size.

6. Eligibility Traces

Definition:
Eligibility traces help credit past states for current rewards. They mix TD and Monte Carlo
learning and speed up learning.

Points:

1. Keep memory of recently visited states.

2. Help update multiple states at once.

3. Combine future and past info.

4. Used in TD(λ) methods.


5. λ controls how far back credit is given.

6. Faster convergence.

7. Used in SARSA(λ), Q(λ).

8. Improve learning efficiency.

9. Bridge between TD and MC.

10. Needs careful λ tuning.

7. Value Iteration vs Policy Iteration

Definition:
Both are planning methods using dynamic programming. Value iteration updates values
directly. Policy iteration updates policy based on values.

Comparison:

1. Value iteration: update values first.

2. Policy iteration: improve policy step-by-step.

3. Value iteration is faster.

4. Policy iteration gives better interpretability.

5. Both need environment model.

6. Used in grid-world tasks.

7. Value iteration combines policy + value update.

8. Policy iteration alternates between steps.

9. Both give optimal policy.

10. Used in model-based RL.

8. Exploration vs Exploitation

Definition:
Exploration is trying new actions. Exploitation is choosing the best-known action. RL needs
to balance both for learning.

Points:

1. Exploration helps discover better actions.


2. Exploitation gives immediate reward.

3. Balance is key for success.

4. Too much exploration = waste time.

5. Too much exploitation = stuck in local optima.

6. ε-greedy is common strategy.

7. Decay ε over time.

8. Softmax and UCB are alternatives.

9. Important in all RL algorithms.

10. Impacts learning speed and quality.

9. Q-Learning

Definition:
Q-learning is a model-free RL method. It learns action values (Q-values) to find the best
policy. It’s an off-policy algorithm.

Points:

1. Learns Q(s, a) for each state-action.

2. Update: Q ← Q + α [r + γ max Q’ – Q].

3. γ = discount factor, α = learning rate.

4. Uses max future value.

5. Doesn’t need environment model.

6. Works well in discrete state-action spaces.

7. Needs exploration (e.g., ε-greedy).

8. Converges to optimal policy.

9. Foundation of Deep Q Networks.

10. Easy to implement and understand.

10. Application of RL
Definition:
RL is used where learning from feedback is needed. It helps in decision-making tasks with
trial-and-error approach.

Applications:

1. Game playing (e.g., AlphaGo, Atari).

2. Robotics (path planning, movement).

3. Traffic signal control.

4. Stock market trading bots.

5. Energy management (HVAC systems).

6. Chatbots for personalized replies.

7. Self-driving cars (lane control).

8. Industrial automation.

9. Smart recommendation systems.

10. Reduces manual tuning by learning optimal actions.

You might also like