0% found this document useful (0 votes)
10 views7 pages

Lecture 10-Logistic Regression - Part - 2 - Jupyter Notebook

The document outlines a Jupyter Notebook lecture on Logistic Regression, detailing the process of importing a dataset, cleaning the data, and preparing it for analysis. It includes steps for handling missing data, converting categorical features, and splitting the dataset into training and testing sets. Finally, it demonstrates training a logistic regression model and evaluating its performance using precision, recall, and F1-score metrics.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Lecture 10-Logistic Regression - Part - 2 - Jupyter Notebook

The document outlines a Jupyter Notebook lecture on Logistic Regression, detailing the process of importing a dataset, cleaning the data, and preparing it for analysis. It includes steps for handling missing data, converting categorical features, and splitting the dataset into training and testing sets. Finally, it demonstrates training a logistic regression model and evaluating its performance using precision, recall, and F1-score metrics.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

Lecture 10-Part2

Logistic Regression
In [14]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
5 %matplotlib inline

The Data
Import the Dataset.

In [17]: 1 data = pd.read_csv('Downloads/Facebook.csv')


2 data.head(5)

Out[17]:
Time
Names emails Country Spent on Salary
Site

Martina
0 [email protected] Bulgaria 25.649648 55330.06006
Avila

Harlan
1 [email protected] Belize 32.456107 79049.07674
Barnes

Naomi
2 vulputate.mauris.sagittis@ametconsectetueradip... Algeria 20.945978 41098.60826
Rodriquez

Jade Cook
3 [email protected] 54.039325 37143.35536
Cunningham Islands

Cedric
4 [email protected] Brazil 34.249729 37355.11276
Leach

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 1/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

In [18]: 1 data.head()

Out[18]:
Time
Names emails Country Spent on Salary
Site

Martina
0 [email protected] Bulgaria 25.649648 55330.06006
Avila

Harlan
1 [email protected] Belize 32.456107 79049.07674
Barnes

Naomi
2 vulputate.mauris.sagittis@ametconsectetueradip... Algeria 20.945978 41098.60826
Rodriquez

Jade Cook
3 [email protected] 54.039325 37143.35536
Cunningham Islands

Cedric
4 [email protected] Brazil 34.249729 37355.11276
Leach

Missing Data
We can use seaborn to create a simple heatmap to see where we are missing data!

In [19]: 1 data.isnull()

Out[19]:
Names emails Country Time Spent on Site Salary Clicked

0 False False False False False False

1 False False False False False False

2 False False False False False False

3 False False False False False False

4 False False False False False False

... ... ... ... ... ... ...

494 False False False False False False

495 False False False False False False

496 False False False False False False

497 False False False False False False

498 False False False False False False

499 rows × 6 columns

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 2/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

Explore the dataset


In [38]: 1 click = data[data['Clicked']==1]
2 no_click = data[data['Clicked']==0]

In [39]: 1 print("Total num of data =", len(data))


2 ​
3 print("Number of customers who clicked on Ad =", len(click))
4 print("Percentage Clicked =", 1.*len(click)/len(data)*100.0, "%")
5
6 print("Did not Click =", len(no_click))
7 print("Percentage who did not Click =", 1.*len(no_click)/len(data)*100.0,
8
9

Total num of data = 499


Number of customers who clicked on Ad = 250
Percentage Clicked = 50.1002004008016 %
Did not Click = 249
Percentage who did not Click = 49.899799599198396 %

Data Cleaning
We want to fill in missing age data instead of just dropping the missing age data rows. One way
to do this is by filling in the mean age of all the passengers (imputation). However we can be
smarter about this and check the average age by passenger class. For example:

Now apply that function!

Let's go ahead and drop the Cabin column and the row in Embarked that is NaN.

In [20]: 1 data.drop(['Names', 'emails', 'Country'],axis = 1,inplace=True)

In [21]: 1 data.head()

Out[21]:
Time Spent on Site Salary Clicked

0 25.649648 55330.06006 0

1 32.456107 79049.07674 1

2 20.945978 41098.60826 0

3 54.039325 37143.35536 1

4 34.249729 37355.11276 0

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 3/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

In [22]: 1 data.dropna(inplace=True)

Converting Categorical Features


In [40]: 1 data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 499 entries, 0 to 498
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Time Spent on Site 499 non-null float64
1 Salary 499 non-null float64
2 Clicked 499 non-null int64
dtypes: float64(2), int64(1)
memory usage: 15.6 KB

Logistic Regression model


Train Test Split
In [41]: 1 from sklearn.model_selection import train_test_split

In [42]: 1 X_train, X_test, y_train, y_test = train_test_split(data.drop('Clicked',ax


2 data['Clicked'], test_
3 random_state=101)

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 4/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

In [43]: 1 X_train

Out[43]:
Time Spent on Site Salary

187 46.995205 89227.57988

55 27.432028 40814.47633

457 25.366808 37192.01715

57 47.070590 80709.83902

308 43.880448 77371.64859

... ... ...

63 31.518373 35277.25683

326 42.903343 78401.67203

337 37.278453 50158.74558

11 34.530898 30221.93714

351 30.391102 59519.43092

399 rows × 2 columns

In [44]: 1 y_train

Out[44]: 187 1
55 0
457 0
57 1
308 1
..
63 0
326 1
337 0
11 0
351 1
Name: Clicked, Length: 399, dtype: int64

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 5/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

In [45]: 1 X_test

Out[45]:
Time Spent on Site Salary

246 19.919153 30201.25465

491 37.173216 63750.41558

330 43.750975 50777.99687

453 29.156654 39394.28363

155 30.730586 47012.72759

... ... ...

98 12.866031 27148.27919

183 23.653926 29808.11365

72 26.410241 55388.71453

367 44.661437 75426.28108

405 30.916826 19123.46645

100 rows × 2 columns

In [46]: 1 y_test

Out[46]: 246 0
491 1
330 1
453 0
155 0
..
98 0
183 0
72 0
367 1
405 0
Name: Clicked, Length: 100, dtype: int64

Training and Predicting


In [47]: 1 from sklearn.linear_model import LogisticRegression

In [48]: 1 logmodel = LogisticRegression()


2 logmodel.fit(X_train,y_train)

Out[48]: LogisticRegression()

In [49]: 1 predictions = logmodel.predict(X_test)

Let's move on to evaluate our model!

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 6/7


3/29/23, 11:20 AM Lecture 10-Logistic Regression_Part_2 - Jupyter Notebook

Evaluation

We can check precision,recall,f1-score using classification report.

In [36]: 1 from sklearn.metrics import classification_report

In [37]: 1 print(classification_report(y_test,predictions))

precision recall f1-score support

0 0.94 0.89 0.92 57


1 0.87 0.93 0.90 43

accuracy 0.91 100


macro avg 0.91 0.91 0.91 100
weighted avg 0.91 0.91 0.91 100

In [51]: 1 from sklearn.metrics import classification_report, confusion_matrix


2 cm = confusion_matrix(y_test, predictions)
3 sns.heatmap(cm, annot=True, fmt="d")

Out[51]: <AxesSubplot:>

In [ ]: 1 ​

localhost:8888/notebooks/Lecture 10-Logistic Regression_Part_2.ipynb# 7/7

You might also like