0% found this document useful (0 votes)
21 views6 pages

PS Project - Jupyter Notebook

This Jupyter notebook document outlines the steps to build a logistic regression model for a weather prediction problem. It loads and explores a weather dataset, preprocesses the data, splits it into training and test sets, trains a logistic regression model on the training set, evaluates the model's performance on the test set using various metrics, and interprets the model coefficients. The document provides code examples for each step of the predictive modeling process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

PS Project - Jupyter Notebook

This Jupyter notebook document outlines the steps to build a logistic regression model for a weather prediction problem. It loads and explores a weather dataset, preprocesses the data, splits it into training and test sets, trains a logistic regression model on the training set, evaluates the model's performance on the test set using various metrics, and interprets the model coefficients. The document provides code examples for each step of the predictive modeling process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

11/12/2023, 20:22 PS Project - Jupyter Notebook

In [3]: import pandas as pd


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

In [4]: # Step 1: Load your dataset


df = pd.read_csv('weather.csv') # Replace 'your_dataset.csv' with you

In [5]: # Step 2: Explore and clean the dataset


# (Assuming your target variable is 'RainTomorrow')
df.dropna(inplace=True) # Handle missing values, you might want a mor
df['RainTomorrow'].value_counts() # Check the balance of classes

Out[5]: No 1274
Yes 416
Name: RainTomorrow, dtype: int64

In [6]: # Step 3: Feature Engineering (if needed)


# No specific feature engineering is done in this example

In [7]: ​
# Step 4: Data Preprocessing
X = pd.get_dummies(df.drop('RainTomorrow', axis=1)) # One-hot encodin
y = df['RainTomorrow'].map({'Yes': 1, 'No': 0}) # Convert target vari

In [8]: ​
# Step 5: Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.

In [14]: from sklearn.preprocessing import StandardScaler



# Step 4: Data Preprocessing
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Out[14]: LogisticRegression(max_iter=1000)
In a Jupyter environment, please rerun this cell to show the HTML representation or
trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with nbviewer.org.

In [15]: ​
# Step 6: Choose a Classification Model
model = LogisticRegression()

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 1/6


11/12/2023, 20:22 PS Project - Jupyter Notebook

In [16]: ​
# Step 7: Model Training
model.fit(X_train, y_train)

/opt/anaconda3/lib/python3.9/site-packages/sklearn/linear_model/_log
istic.py:458: ConvergenceWarning: lbfgs failed to converge (status=
1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as sh


own in:
https://scikit-learn.org/stable/modules/preprocessing.html (http
s://scikit-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver option
s:
https://scikit-learn.org/stable/modules/linear_model.html#logist
ic-regression (https://scikit-learn.org/stable/modules/linear_model.
html#logistic-regression)
n_iter_i = _check_optimize_result(

Out[16]: LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or
trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with nbviewer.org.

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 2/6


11/12/2023, 20:22 PS Project - Jupyter Notebook

In [17]: from sklearn.metrics import confusion_matrix


import seaborn as sns
import matplotlib.pyplot as plt

# Step 8: Model Evaluation
y_pred = model.predict(X_test)

# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix using seaborn
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', annot_kws={"size":
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 3/6


11/12/2023, 20:22 PS Project - Jupyter Notebook

In [18]: from sklearn.metrics import roc_curve, auc



# Get predicted probabilities
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Calculate ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 8))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area =
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 4/6


11/12/2023, 20:22 PS Project - Jupyter Notebook

In [26]: from sklearn.metrics import precision_score, recall_score, f1_score, r



# Example metrics calculation
y_pred = model.predict(X_test)
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("AUC-ROC Score:", roc_auc_score(y_test, y_pred_proba))

Precision: 1.0
Recall: 0.9473684210526315
F1 Score: 0.972972972972973
AUC-ROC Score: 0.9998267273121074

In [19]: print("Accuracy:", accuracy_score(y_test, y_pred))


print("Classification Report:\n", classification_report(y_test, y_pred

Accuracy: 0.985207100591716
Classification Report:
precision recall f1-score support

0 0.98 1.00 0.99 243


1 1.00 0.95 0.97 95

accuracy 0.99 338


macro avg 0.99 0.97 0.98 338
weighted avg 0.99 0.99 0.99 338

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 5/6


11/12/2023, 20:22 PS Project - Jupyter Notebook

In [20]: # Step 9: Model Interpretation


# Coefficients for logistic regression model
coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': mode

# Sort coefficients by absolute value
coefficients = coefficients.reindex(coefficients['Coefficient'].abs().

# Plot coefficients
plt.figure(figsize=(12, 6))
sns.barplot(x='Coefficient', y='Feature', data=coefficients, palette='
plt.xlabel('Coefficient Value')
plt.ylabel('Feature')
plt.title('Logistic Regression Coefficients')
plt.show()

In [ ]: ​
# Step 10: Fine-Tuning and Optimization
# For logistic regression, fine-tuning may involve adjusting regulariz

# Step 11: Deployment (Not shown in code, as it depends on your deploy

# Step 12: Continuous Monitoring and Updating
# Monitor model performance over time and update as needed

In [ ]: ​

localhost:8888/notebooks/Data_Science_Course_Rune/PS Project.ipynb 6/6

You might also like