Logistic regression
Logistic regression
Mathematical
One-vs-All or Softmax Function Cumulative Logit Model
Approach
Scikit-learn Statsmodels
Implementation
(multi_class='multinomial') (OrderedModel)
1 22 2,50,000 0
2 35 6,00,000 1
3 28 3,50,000 0
4 42 9,00,000 1
Customer Ag Annual Purchased (Yes=1,
ID e Income (₹) No=0)
5 30 4,50,000 1
Objective
We aim to build a Logistic Regression model that predicts whether a customer
will buy a laptop (Purchased = 1) or not (Purchased = 0) based on their age
and annual income.
Solution Approach
1. Data Preprocessing
o Convert categorical data (if any) into numerical form.
2. Train-Test Split
o Split the dataset into a training set (80%) and a test set (20%).
# Sample dataset
data = {
'Age': [22, 35, 28, 42, 30],
'Annual_Income': [250000, 600000, 350000, 900000, 450000],
'Purchased': [0, 1, 0, 1, 1]
}
df = pd.DataFrame(data)
# Predictions
y_pred = model.predict(X_test)
1 2 50 0
2 5 65 0
3 7 80 1
4 9 85 1
5 6 70 1
# Making predictions
y_pred = model.predict(X_test)
1 22 3,00,000 5 0
2 30 5,50,000 12 1
3 40 8,00,000 20 1
Custom Ag Income Time Spent Purchased (1=Yes,
er e (₹) (mins) 0=No)
4 26 4,00,000 7 0
5 35 7,00,000 15 1
CopyEdit
# Creating the dataset
data = {
'Age': [22, 30, 40, 26, 35],
'Income': [300000, 550000, 800000, 400000, 700000],
'Time_Spent': [5, 12, 20, 7, 15],
'Purchased': [0, 1, 1, 0, 1]
}
df = pd.DataFrame(data)
# Making predictions
y_pred = model.predict(X_test)
warnings.simplefilter("always", ConvergenceWarning)
clf = LogisticRegression(max_iter=50)
clf.fit(X_train, y_train)
If you see "ConvergenceWarning: Solver failed to converge",
increase max_iter.
Problem Statement:
A company wants to predict which type of device (Laptop, Tablet, or Mobile)
a customer will buy based on their age and income.
# Sample dataset
data = {
'Age': [22, 30, 40, 26, 35, 50, 28, 45, 33, 38],
'Income': [300000, 550000, 800000, 400000, 700000, 900000, 350000,
600000, 480000, 750000],
'Device': ['Laptop', 'Mobile', 'Laptop', 'Tablet', 'Mobile', 'Laptop', 'Tablet',
'Mobile', 'Tablet', 'Laptop']
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Split the dataset into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
Expected Output
markdown
CopyEdit
Model Accuracy: 0.90
Classification Report:
precision recall f1-score support
accuracy 0.90 5
macro avg 0.93 0.93 0.93 5
weighted avg 0.90 0.90 0.90 5
90% accuracy means the model correctly predicted 90% of test cases.
High precision and recall indicate a good classification model.
Key Takeaways
✔ Used Multinomial Logistic Regression for multiple classes (Laptop,
Mobile, Tablet).
✔ Encoded categorical labels using LabelEncoder().
✔ Split dataset into training (80%) and testing (20%).
✔ Trained logistic regression using multi_class='multinomial'.
✔ Evaluated model using accuracy and classification report.
Next Steps
Would you like to: 1️⃣ Visualize Decision Boundaries? 📊
2️⃣ Hyperparameter Tune the Model? 🔧
3️⃣ Use Real-World E-Commerce Data? 🛒
Let me know how you'd like to proceed! 🚀
4o
You said:
solver='lbfgs' → Suitable for multinomial classification. explain in detail
ChatGPT said:
Understanding solver='lbfgs' in Multinomial Logistic Regression
The solver parameter in Scikit-learn's LogisticRegression specifies the
optimization algorithm used to minimize the cost function and find the best
model parameters.
1. What is LBFGS?
✅ LBFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno
algorithm) is an advanced quasi-Newton optimization algorithm.
✅ It is used for solving logistic regression, neural networks, and other
machine learning problems efficiently.
✅ It is particularly well-suited for multinomial classification, where we
have multiple categories to predict.
Second-order
newton- Uses more memory
Multinomial problems optimization, better for
cg than lbfgs
high-dimensional data