Modern Pridictive Modelling (Regression)
Modern Pridictive Modelling (Regression)
2. Classification Metrics
o Accuracy
o Recall
o F1-Score
o ROC-AUC: Area under Receiver Operating Characteristic curve
Plots true positive rate vs false positive rate at various thresholds.
Range is 0 to 1 (0.5 is random, 1 is perfect)
Threshold-independent metric
Good for imbalanced classes
o Precision-Recall Curves: Plots precision vs recall at various thresholds
Model Selection Techniques
1. Cross-validation
o K-fold
o Stratified K-fold
o Time-series cross-validation
2. Hyperparameter Optimization
o Grid Search
o Random Search
o Bayesian Optimization
o Neural Architecture Search
Primary Uses
1. Prediction and forecasting (machine learning applications)
2. Measuring variable relationships and influences
3. Data-driven decision making
Linear Regression
Linear regression predicts continuous values by modelling linear relationships
between variables. It's one of the most widely used statistical techniques in data
science.
Where:
Y i=¿ Dependent variable
f =¿ Function
X i =¿Independent Variable
β=¿ Unknown Parameters
e i=¿ Error term
Step 4
To achieve the best-fitted line, we have to minimise the value of the loss function. To
minimise the loss function, we use a technique called gradient descent.
Gradient Descent
Step 5
Once the loss function is minimized, we get the final equation for the best-fitted line
and we can predict the value of Y for any given X.
Logistic Regression
Overview
Logistic regression is used for classification problems, particularly binary outcomes. It
predicts categorical variables by calculating probabilities.
Types of Logistic Regression
1. Binary Logistic Regression
Two possible outcomes (Yes/No, 0/1)
Most common form
Step 1
To calculate the binary separation, first, we determine the best-fitted line by following
the Linear Regression steps.
Step 2
The regression line we get from Linear Regression is highly susceptible to outliers.
Thus it will not do a good job in classifying two classes.
Thus, the predicted value gets converted into probability by feeding it to the sigmoid
function.
The logistic regression hypothesis generalizes from the linear regression hypothesis
that it uses the logistic function is also known as sigmoid function (activation
function).
1
The equation of sigmoid: S ( x )= −x
1+ e
Thus, if we feed the output ŷ value to the sigmoid function it retunes a probability
value between 0 and 1.
Step 3
Finally, the output value of the sigmoid function gets converted into 0 or 1 (discreet
values) based on the threshold value. We usually set the threshold value as 0.5. In this
way, we get the binary classification.
Requirements
1. Binary/categorical dependent variable
2. Independent predictor variables
3. Low/no multicollinearity
4. Large sample size
Key Differences
Aspect Linear regression Logistic Regression
Output Continuous Values Categorical Values
Purpose Prediction Classification
Function Best fit line Sigmoid curve
Loss Calculation Mean square Error Maximum Likelihood
Application Quantitative response Binary/Categorical Response
Modern Applications
Linear Regression
Price prediction
Sales forecasting
Resource allocation
Performance analysis
Logistic Regression
Spam detection
Medical diagnosis
Credit risk assessment
Customer behaviour prediction
# Feature preparation
X = df[['sqft', 'bedrooms', 'age', 'location_score']]
y = df['price']
# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# Example prediction
new_house = [[1900, 3, 5, 8]] # sqft, bedrooms, age, location_score
new_house_scaled = scaler.transform(new_house)
predicted_price = model.predict(new_house_scaled)
2. Medical Diagnosis (Logistic Regression)
Problem Statement
Predicting the likelihood of a disease based on patient symptoms and characteristics.
Implementation
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame(patient_data)
# Prepare features
X = df[['age', 'blood_pressure', 'cholesterol']]
y = df['has_disease']
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train model
model = LogisticRegression()
model.fit(X_scaled, y)
Further Reading
- Statistical Learning Theory
- Advanced Regression Techniques
- Machine Learning Applications
- Model Optimization Methods