Week 7. Intro To ML. Regression
Week 7. Intro To ML. Regression
Regression
• Regression vs Classification
• Evaluation
What is Machine Learning (ML)?
• Machine learning is the process of extracting knowledge from
data, combining elements of statistics, AI, and computer science.
It is widely used in daily life, from personalized recommendations
(Netflix, Amazon) to scientific research (DNA analysis, cancer
treatment).
• Earlier intelligent systems relied on manually coded rules ("if-
else" conditions), but these were limited in flexibility and
required expert knowledge. Machine learning, however, allows
models to learn patterns from data without explicit programming.
A key example is face detection, which was once unsolvable with
rule-based methods but is now achieved through ML algorithms
trained on large datasets.
What is Machine Learning (ML)?
Machine learning (ML) is a branch of artificial intelligence that
enables computers to learn patterns from data and make
predictions without explicit programming.
Application of Machine Learning
• Healthcare – Disease diagnosis, personalized treatments, drug discovery, patient outcome
prediction.
• Finance – Fraud detection, stock predictions, credit scoring, algorithmic trading.
• E-Commerce – Product recommendations, targeted ads, customer sentiment analysis, chatbots.
• Transportation – Self-driving cars, traffic prediction, predictive vehicle maintenance.
• Manufacturing – Quality control, supply chain optimization, automation, predictive maintenance.
• Cybersecurity – Threat detection, spam filtering, fraud prevention, malware analysis.
• Education – AI tutors, automated grading, student performance tracking.
• Agriculture – Crop monitoring, disease detection, smart irrigation, automated harvesting.
• Entertainment – Video/music recommendations, AI-generated content, face/speech recognition.
• Government – Smart cities, disaster prediction, surveillance, traffic flow optimization.
Machine Learning Algorithms
Linear Regression, Polynomial
Regression, Support Vector
Regression Regression, Decision Tree
Regression, Random Forest
Supervised Regression
Learning Logistic Regression, K-Nearest
Neighbors, Support Vector
Classification Machines, Decision Tree, Random
Forest, Naïve Bayes
Correlation Matrix: Examining the correlation matrix among the independent variables is a
common way to detect multicollinearity. High correlations (close to 1 or -1) indicate potential
multicollinearity.
VIF (Variance Inflation Factor): VIF is a measure that quantifies how much the variance of an
estimated regression coefficient increases if your predictors are correlated. A high VIF (typically
above 10) suggests multicollinearity.
Model Evaluation:
Regression
Evaluation: Mean squared error (MSE)
Mean squared error (MSE) measures error in statistical models by using the average
squared difference between observed and predicted values.
• RMSE is the square root of MSE. By taking the square root, we bring the
units of error back to the original units of the target variable, making
RMSE easier to interpret.
• A lower RMSE indicates better model performance, and it’s directly
comparable to the original units of the target. RMSE is sensitive to large
errors (outliers), so it might penalize models with big mistakes more
heavily than other metrics.
Evaluation: Mean Absolut Error (MAE)
• R² tells us how well the model explains the variance in the target variable. It
represents the proportion of the variance in the target variable that’s
explained by the model.
• R² ranges from 0 to 1. A higher R² indicates that the model explains a larger
proportion of the variance, which typically means a better model.
• An R² of 0 means the model explains none of the variance, and R² = 1 means
the model explains all the variance perfectly.
• Negative R² values can occur when the model performs worse than a simple
mean-based model.
Evaluation: Adjusted R-Squared