0% found this document useful (0 votes)

115 views10 pages

Assignment - Jupyter Notebook

Uploaded by

Shivang Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views10 pages

Assignment - Jupyter Notebook

Uploaded by

Shivang Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

7/28/23, 1:10 PM Assignment - Jupyter Notebook

Assignment
You have been provided with a dataset containing information about customers of an e-commerce company. The task is
to build a binary classification model using logistic regression to predict whether a customer will make a purchase or not
based on their demographic and browsing behavior data. The dataset consists of the following features:

email
address
avatar
time on app
time on website
length of membership
yearly amount spent

The target variable is: Purchase (binary: 1 if the customer made a purchase over $450, 0 otherwise)

Instructions: Load the dataset and perform any necessary data preprocessing steps. Split the data into training and
testing sets (e.g., 80% training, 20% testing). Train a logistic regression model using the training data. Evaluate the
model's performance on the testing data using appropriate evaluation metrics (e.g., accuracy, precision, recall, F1-score).

Provide a brief summary of the model's performance and any insights you gather from the results. Note: You can use any
programming language or machine learning libraries of your choice. The aim of this problem is to assess your ability to
quickly understand the problem, preprocess the data, build a logistic regression model, evaluate its performance, and
derive meaningful insights from the results within a limited timeframe.

In [340]:

1 import numpy as np
2 import pandas as pd
3 import [Link] as plt
4 from [Link] import StandardScaler
5 from imblearn.over_sampling import RandomOverSampler
6
7 %matplotlib inline

In [341]:

1 df = pd.read_csv("scpl_folder/[Link]")

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 1/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [342]:

1 [Link](10)

Out[342]:

Time Yearly
Time on Length of
\tEmail Address Avatar on Amount C
Website Membership
App Spent

16338 Scott Corner Suite

0 aaron04@[Link] SeaGreen 10.16 37.76 4.78 521.24
727West Alexandra, AR...

672 Jesus Roads Apt.

1 aaron11@[Link] 443Thompsonland, WY LightSkyBlue 13.46 37.24 2.94 503.98
69228

38678 Sean Drive Suite

2 aaron22@[Link] 293Karentown, IA DarkGray 12.01 36.53 4.71 576.48
78306-...

0128 Sampson Loop

3 aaron89@[Link] Suite 943Hoffmanton, SaddleBrown 10.10 38.04 4.24 418.60
MO 02122

5791 Jessica
acampbell@sanchez-
4 CoveMckinneyborough, Wheat 11.45 37.58 2.59 420.74
[Link]
OK 64460-7536

88995 Edwards Row

5 acontreras@[Link] Suite 456North Jo, DE Sienna 10.74 37.46 3.86 476.19
02062-...

9991 Macdonald
6 adam75@[Link] SquaresVasquezborough, Purple 10.97 36.61 2.87 404.82
WY 73586...

2595 James Creek Apt.

7 adamperkins@[Link] PaleVioletRed 11.76 37.92 3.53 482.14
571Millerberg, HI 82236

399 Jeremy Skyway

8 afry@[Link] Suite 377North Keithville, PaleTurquoise 12.19 36.15 3.78 494.55
I...

PSC 2490, Box 2120APO

9 agolden@[Link] Black 12.88 37.44 1.56 419.94
AE 15445-2876

In [343]:

1 [Link] = ['Email', 'Address', 'Avatar', 'Time on App', 'Time on Website',

2 'Length of Membership', 'Yearly Amount Spent', 'Clean_Address_Loc','Clean_Address_County

In [344]:

1 [Link][df['Yearly Amount Spent']>450,'purchase']=1

2 [Link][df['Yearly Amount Spent']<=450,'purchase']=0

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 2/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [345]:

1 [Link]()

Out[345]:

Time on App Time on Website Length of Membership Yearly Amount Spent purchase

count 500.000000 500.000000 500.00000 500.000000 500.000000

mean 12.052620 37.060480 3.53336 499.314240 0.730000

std 0.994418 1.010555 0.99926 79.314764 0.444404

min 8.510000 33.910000 0.27000 256.670000 0.000000

25% 11.390000 36.347500 2.93000 445.037500 0.000000

50% 11.980000 37.070000 3.53500 498.890000 1.000000

75% 12.752500 37.720000 4.13000 549.312500 1.000000

max 15.130000 40.010000 6.92000 765.520000 1.000000

In [346]:

1 [Link]()

<class '[Link]'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Email 500 non-null object
1 Address 500 non-null object
2 Avatar 500 non-null object
3 Time on App 500 non-null float64
4 Time on Website 500 non-null float64
5 Length of Membership 500 non-null float64
6 Yearly Amount Spent 500 non-null float64
7 Clean_Address_Loc 500 non-null object
8 Clean_Address_County 500 non-null object
9 purchase 500 non-null float64
dtypes: float64(5), object(5)
memory usage: 39.2+ KB

In [347]:

1 [Link]()

Out[347]:

Time on Time on Length of Yearly Amount

purchase
App Website Membership Spent

Time on App 1.000000 0.082285 0.029240 0.499315 0.353636

Time on Website 0.082285 1.000000 -0.047443 -0.002601 0.003681

Length of
0.029240 -0.047443 1.000000 0.809184 0.601839
Membership

Yearly Amount Spent 0.499315 -0.002601 0.809184 1.000000 0.737246

purchase 0.353636 0.003681 0.601839 0.737246 1.000000

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 3/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

Vectorize the words

In [289]:

1 df = [Link](['Email','Address','Clean_Address_Loc'], axis=1)
2 df

Out[289]:

Yearly
Time on Time on Length of
Avatar Amount Clean_Address_County purchase
App Website Membership
Spent

0 SeaGreen 10.16 37.76 4.78 521.24 AR 1.0

1 LightSkyBlue 13.46 37.24 2.94 503.98 WY 1.0

2 DarkGray 12.01 36.53 4.71 576.48 IA 1.0

3 SaddleBrown 10.10 38.04 4.24 418.60 MO 0.0

4 Wheat 11.45 37.58 2.59 420.74 OK 0.0

... ... ... ... ... ... ... ...

495 DodgerBlue 12.94 36.73 4.56 544.41 UT 1.0

496 OldLace 11.83 36.84 3.61 502.09 MI 1.0

497 Purple 11.68 38.72 3.59 463.59 MT 1.0

498 Moccasin 12.75 36.71 3.28 548.28 Bo 1.0

499 PeachPuff 12.13 38.19 4.02 597.74 SC 1.0

500 rows × 7 columns

In [290]:

1 from sklearn.model_selection import train_test_split

2 from [Link] import RandomForestClassifier
3 from [Link] import accuracy_score
4 from sklearn.feature_extraction.text import CountVectorizer
5 vectorizer = CountVectorizer(analyzer = "word",
6 tokenizer = None,
7 preprocessor = None,
8 stop_words = None,
9 max_features = 2)

In [291]:

1 def vect(df, col):

2 dfw = vectorizer.fit_transform(df[col])
3 dfw = [Link]()
4 df = [Link](([Link](col, axis =1),[Link](dfw,(-1,2))))
5 df= [Link](df)
6 print(df)
7 return df

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 4/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [292]:

1 df = vect(df,'Avatar')
2 [Link] =['Time on App', 'Time on Website','Length of Membership', 'Yearly Amount Spent','Cl
3 df = vect(df,'Clean_Address_County')
4 [Link] =['Time on App', 'Time on Website','Length of Membership', 'Yearly Amount Spent','pu
5 [Link]()

0 1 2 3 4 5 6 7
0 10.16 37.76 4.78 521.24 AR 1.0 0 0
1 13.46 37.24 2.94 503.98 WY 1.0 0 0
2 12.01 36.53 4.71 576.48 IA 1.0 0 0
3 10.1 38.04 4.24 418.6 MO 0.0 0 0
4 11.45 37.58 2.59 420.74 OK 0.0 0 0
.. ... ... ... ... .. ... .. ..
495 12.94 36.73 4.56 544.41 UT 1.0 0 0
496 11.83 36.84 3.61 502.09 MI 1.0 0 0
497 11.68 38.72 3.59 463.59 MT 1.0 0 0
498 12.75 36.71 3.28 548.28 Bo 1.0 0 0
499 12.13 38.19 4.02 597.74 SC 1.0 0 0

[500 rows x 8 columns]

0 1 2 3 4 5 6 7 8
0 10.16 37.76 4.78 521.24 1.0 0 0 0 0
1 13.46 37.24 2.94 503.98 1.0 0 0 0 0
2 12.01 36.53 4.71 576.48 1.0 0 0 0 0
3 10.1 38.04 4.24 418.6 0.0 0 0 0 0
4 11.45 37.58 2.59 420.74 0.0 0 0 0 0
.. ... ... ... ... ... .. .. .. ..
495 12.94 36.73 4.56 544.41 1.0 0 0 0 0
496 11.83 36.84 3.61 502.09 1.0 0 0 0 0
497 11.68 38.72 3.59 463.59 1.0 0 0 0 0
498 12.75 36.71 3.28 548.28 1.0 0 0 1 0
499 12.13 38.19 4.02 597.74 1.0 0 0 0 1

[500 rows x 9 columns]

Out[292]:

Time on Time on Length of Yearly Amount

purchase avatar1 avatar2 clc1 clc2
App Website Membership Spent

0 10.16 37.76 4.78 521.24 1.0 0 0 0 0

1 13.46 37.24 2.94 503.98 1.0 0 0 0 0

2 12.01 36.53 4.71 576.48 1.0 0 0 0 0

3 10.1 38.04 4.24 418.6 0.0 0 0 0 0

4 11.45 37.58 2.59 420.74 0.0 0 0 0 0

In [294]:

1 train,test = [Link]([Link](frac=1),[int(0.8*len(df))])

In [295]:

1 [Link], [Link]

Out[295]:

((400, 9), (100, 9))

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 5/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [296]:

1 train = [Link](train)
2 [Link] =[Link]
3 print([Link]())
4
5 test = [Link](test)
6 [Link] =[Link]
7 print([Link]())

Time on App Time on Website Length of Membership Yearly Amount Spent \

235 11.08 37.96 4.72 517.17
11 12.6 37.37 3.47 501.93
55 12.36 38.04 3.31 468.91
473 11.47 35.68 1.81 374.27
201 12.52 37.15 2.67 487.38

purchase avatar1 avatar2 clc1 clc2

235 1.0 1 0 0 0
11 1.0 0 0 0 0
55 1.0 0 0 1 0
473 0.0 0 0 0 0
201 1.0 0 0 0 0
Time on App Time on Website Length of Membership Yearly Amount Spent \
97 12.91 36.05 3.49 547.71
496 11.83 36.84 3.61 502.09
57 11.17 35.63 5.46 587.57
334 13.29 38.63 3.87 543.34
95 11.33 35.46 4.54 568.72

purchase avatar1 avatar2 clc1 clc2

97 1.0 0 0 0 0
496 1.0 0 0 0 0
57 1.0 0 0 0 0
334 1.0 0 0 0 0
95 1.0 0 0 0 0

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 6/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [297]:

1 def column_to_move(df):
2 column_to_move = [Link]("purchase")
3 [Link](8, "purchase", column_to_move)
4 return df
5
6 train = column_to_move(train)
7 test = column_to_move(test)
8
9 train,test

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 7/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

Out[297]:

( Time on App Time on Website Length of Membership Yearly Amount Spent \

235 11.08 37.96 4.72 517.17
11 12.6 37.37 3.47 501.93
55 12.36 38.04 3.31 468.91
473 11.47 35.68 1.81 374.27
201 12.52 37.15 2.67 487.38
.. ... ... ... ...
439 13.15 36.62 2.49 470.45
339 11.56 35.98 1.48 282.47
432 12.7 35.36 4.0 553.6
160 11.75 36.94 0.8 298.76
375 12.43 37.63 4.33 532.72

avatar1 avatar2 clc1 clc2 purchase

235 1 0 0 0 1.0
11 0 0 0 0 1.0
55 0 0 1 0 1.0
473 0 0 0 0 0.0
201 0 0 0 0 1.0
.. ... ... ... ... ...
439 0 0 0 0 1.0
339 0 0 0 0 0.0
432 0 0 0 0 1.0
160 0 0 0 0 0.0
375 0 0 0 0 1.0

[400 rows x 9 columns],

Time on App Time on Website Length of Membership Yearly Amount Spent \
97 12.91 36.05 3.49 547.71
496 11.83 36.84 3.61 502.09
57 11.17 35.63 5.46 587.57
334 13.29 38.63 3.87 543.34
95 11.33 35.46 4.54 568.72
.. ... ... ... ...
393 11.54 37.53 2.92 431.62
210 12.05 38.51 2.85 409.09
215 11.67 37.34 4.26 567.48
461 12.5 38.05 4.64 616.85
368 11.41 36.38 4.04 541.05

avatar1 avatar2 clc1 clc2 purchase

97 0 0 0 0 1.0
496 0 0 0 0 1.0
57 0 0 0 0 1.0
334 0 0 0 0 1.0
95 0 0 0 0 1.0
.. ... ... ... ... ...
393 0 0 0 0 0.0
210 0 0 0 0 0.0
215 0 0 0 0 1.0
461 0 0 0 0 1.0
368 0 0 0 0 1.0

[100 rows x 9 columns])

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 8/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

In [305]:

1 def resample(dataframe, oversample=False):

2 x = dataframe[[Link][:-1]].values
3 y = dataframe[[Link][-1]].astype('int').values
4
5 if oversample:
6 ros = RandomOverSampler()
7 x,y = ros.fit_resample(x,y)
8
9 data = [Link]((x, [Link](y,(-1,1))))
10 return data, x, y

In [306]:

1 train, X_train, y_train = resample(train, oversample = True)

2 test, X_test, y_test = resample(test, oversample = False)

In [332]:

1 from sklearn.linear_model import LogisticRegression

2
3 lg_model = LogisticRegression(solver='lbfgs', max_iter=100)
4 lg_model.fit(X_train, y_train)

C:\Users\creat\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:458: Co
nvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
[Link] ([Link]
[Link]/stable/modules/[Link])
Please also refer to the documentation for alternative solver options:
[Link] (h
ttps://[Link]/stable/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(

Out[332]:

▾ LogisticRegression
LogisticRegression()

In [333]:

1 y_pred = lg_model.predict(X_test)

In [334]:

1 from [Link] import classification_report

In [335]:

1 print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.93 1.00 0.96 26

1 1.00 0.97 0.99 74

accuracy 0.98 100

macro avg 0.96 0.99 0.97 100
weighted avg 0.98 0.98 0.98 100

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 9/10
7/28/23, 1:10 PM Assignment - Jupyter Notebook

Summary
1. Created and cleaned address to get relaitable data point for the users to see similarity in behaviour-- built 2 more
colunms -- cleaned_address_loc and cleaned_address_county
2. Load the data and cleaned the columns name
3. Built the purchase colunms based on the contition mentioned
4. Looked on the basic stats before cleaning and resampling the data (describe and info)
5. Vectorize clean_address_county and avatar to use them in the regression
6. Split into train and test as mentioned (80:20)
7. Resample -- oversample to have a decent data to make generic model
8. build logistic regression model and predict using the same on the test data
9. Showcase the stats [ F1 : 98% accuracy ]

Insights
1. Higher the time spend on App and lenght of membership -- higher the probaboility they will make a purchase
2. App is more efective then web for purchase conversion
3. Lenght of membership has highest impact on the purchase

localhost:8888/notebooks/Desktop/Personal/code/Practice_ML/[Link]# 10/10

DA Programs
No ratings yet
DA Programs
44 pages
AI Regression & Classification Guide
No ratings yet
AI Regression & Classification Guide
47 pages
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
No ratings yet
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
7 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
Train
No ratings yet
Train
17 pages
Pandas
No ratings yet
Pandas
21 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Data Analytics Program
No ratings yet
Data Analytics Program
11 pages
Practical No. 09.ipynb - Colab
No ratings yet
Practical No. 09.ipynb - Colab
4 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
34 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
A09Ass05 - Jupyter Notebook
No ratings yet
A09Ass05 - Jupyter Notebook
15 pages
Boston Housing Price Prediction Analysis
No ratings yet
Boston Housing Price Prediction Analysis
5 pages
Da Program
No ratings yet
Da Program
18 pages
Importing Libraries and Loading Data: Import As Import As Import As Import As From Import
No ratings yet
Importing Libraries and Loading Data: Import As Import As Import As Import As From Import
13 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Code 1
No ratings yet
Code 1
3 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
ML Manual
No ratings yet
ML Manual
30 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
8 pages
ML Merged
No ratings yet
ML Merged
28 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Python ML Projects
No ratings yet
Python ML Projects
18 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Data Analytics Assignment Solutions
No ratings yet
Data Analytics Assignment Solutions
20 pages
SPA Group 13 - Assignment 2 Problem Statement
No ratings yet
SPA Group 13 - Assignment 2 Problem Statement
2 pages
PCA - Colab
No ratings yet
PCA - Colab
2 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
1 page
EDA and Modeling for Insurance Data
No ratings yet
EDA and Modeling for Insurance Data
11 pages
ML Manual
No ratings yet
ML Manual
9 pages
Parth ML
No ratings yet
Parth ML
24 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Customer Churn Prediction Model
No ratings yet
Customer Churn Prediction Model
6 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
DS Food
No ratings yet
DS Food
23 pages
DA Lab
No ratings yet
DA Lab
27 pages
FIND-S and Decision Tree Algorithms Explained
No ratings yet
FIND-S and Decision Tree Algorithms Explained
24 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
Machine Learning Practical Implementations
No ratings yet
Machine Learning Practical Implementations
16 pages
Python ML for Engineers: Week 3
No ratings yet
Python ML for Engineers: Week 3
12 pages
Da 012307
No ratings yet
Da 012307
8 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
Python File
No ratings yet
Python File
5 pages
External
No ratings yet
External
11 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Industry News Roundup (8 - 30 - 22)
No ratings yet
Industry News Roundup (8 - 30 - 22)
3 pages
Market Trends
No ratings yet
Market Trends
13 pages
Analytics OKRs for Data Teams
No ratings yet
Analytics OKRs for Data Teams
3 pages
A B Test Dashboard
No ratings yet
A B Test Dashboard
2 pages
ML VLD
No ratings yet
ML VLD
1 page
PG Certificate in Data Science by IIT Roorkee
No ratings yet
PG Certificate in Data Science by IIT Roorkee
17 pages
Regression Data
No ratings yet
Regression Data
976 pages
1.7 What Is AI
No ratings yet
1.7 What Is AI
1 page
Rural Electrification Sustainability Indicators - Manual For Field Workers
No ratings yet
Rural Electrification Sustainability Indicators - Manual For Field Workers
14 pages
CAPE Biology 2013 U1 P032
No ratings yet
CAPE Biology 2013 U1 P032
10 pages
Unit III Heat Exchange Equipment
No ratings yet
Unit III Heat Exchange Equipment
17 pages
Sailor 6222 Installation Manual
100% (1)
Sailor 6222 Installation Manual
104 pages
EN 1992-1-1 Second Order Effects Guide
100% (2)
EN 1992-1-1 Second Order Effects Guide
59 pages
Usais Pamphlet 350-6 Expert Infantryman Badge
No ratings yet
Usais Pamphlet 350-6 Expert Infantryman Badge
84 pages
How Do Socioeconomic Factors Affect Early Literacy
No ratings yet
How Do Socioeconomic Factors Affect Early Literacy
6 pages
Attorney Drafted L-1 Intracompany Visa Transferee Application: Entrepreneur
No ratings yet
Attorney Drafted L-1 Intracompany Visa Transferee Application: Entrepreneur
411 pages
Experiment 13 - Urine
No ratings yet
Experiment 13 - Urine
5 pages
Medical Laboratory Technician Resume Guide
No ratings yet
Medical Laboratory Technician Resume Guide
7 pages
Cive1219 1630 At2
0% (1)
Cive1219 1630 At2
6 pages
Rockefeller, Flow
No ratings yet
Rockefeller, Flow
22 pages
Sanako Lab 100 Multi MSU Technical Guide
No ratings yet
Sanako Lab 100 Multi MSU Technical Guide
27 pages
The Poisson Distribution
No ratings yet
The Poisson Distribution
20 pages
MC Interface
No ratings yet
MC Interface
33 pages
Mech Sol
No ratings yet
Mech Sol
5 pages
CH 8 Concepts of Cost - D30ca2ba 7035 43d9 Aa15 Eb7fda00ab27
No ratings yet
CH 8 Concepts of Cost - D30ca2ba 7035 43d9 Aa15 Eb7fda00ab27
44 pages
Discover 7951 Teka St, Makati
No ratings yet
Discover 7951 Teka St, Makati
1 page
Equipment Issue Procedure
100% (1)
Equipment Issue Procedure
2 pages
NDC Training Course Curriculum
No ratings yet
NDC Training Course Curriculum
6 pages
1027 Application Guideline
No ratings yet
1027 Application Guideline
6 pages
MSDS SHEET - Liquid Soap
No ratings yet
MSDS SHEET - Liquid Soap
4 pages
Technical Writing Activities
No ratings yet
Technical Writing Activities
28 pages
Lesson - Living and Non-Living Things What You Already Know
No ratings yet
Lesson - Living and Non-Living Things What You Already Know
7 pages
Grade 8 - Term 1 Geography
No ratings yet
Grade 8 - Term 1 Geography
5 pages
ZXJ10 System Structure (Overview)
100% (2)
ZXJ10 System Structure (Overview)
90 pages
Conspicuous Consumption in A Recession - Toning It Down or Turning It Up
No ratings yet
Conspicuous Consumption in A Recession - Toning It Down or Turning It Up
8 pages
Solenoid Switch
No ratings yet
Solenoid Switch
62 pages
Introduction to Telecommunication Engineering
No ratings yet
Introduction to Telecommunication Engineering
34 pages
Work Immersion DLL Week 1
100% (1)
Work Immersion DLL Week 1
6 pages

Assignment - Jupyter Notebook

Uploaded by

Assignment - Jupyter Notebook

Uploaded by

7/28/23, 1:10 PM Assignment - Jupyter Notebook

16338 Scott Corner Suite

672 Jesus Roads Apt.

38678 Sean Drive Suite

0128 Sampson Loop

88995 Edwards Row

2595 James Creek Apt.

399 Jeremy Skyway

PSC 2490, Box 2120APO

1 [Link] = ['Email', 'Address', 'Avatar', 'Time on App', 'Time on Website',

1 [Link][df['Yearly Amount Spent']>450,'purchase']=1

count 500.000000 500.000000 500.00000 500.000000 500.000000

mean 12.052620 37.060480 3.53336 499.314240 0.730000

std 0.994418 1.010555 0.99926 79.314764 0.444404

min 8.510000 33.910000 0.27000 256.670000 0.000000

25% 11.390000 36.347500 2.93000 445.037500 0.000000

50% 11.980000 37.070000 3.53500 498.890000 1.000000

75% 12.752500 37.720000 4.13000 549.312500 1.000000

max 15.130000 40.010000 6.92000 765.520000 1.000000

Time on Time on Length of Yearly Amount

Time on App 1.000000 0.082285 0.029240 0.499315 0.353636

Time on Website 0.082285 1.000000 -0.047443 -0.002601 0.003681

Yearly Amount Spent 0.499315 -0.002601 0.809184 1.000000 0.737246

purchase 0.353636 0.003681 0.601839 0.737246 1.000000

Vectorize the words

0 SeaGreen 10.16 37.76 4.78 521.24 AR 1.0

1 LightSkyBlue 13.46 37.24 2.94 503.98 WY 1.0

2 DarkGray 12.01 36.53 4.71 576.48 IA 1.0

3 SaddleBrown 10.10 38.04 4.24 418.60 MO 0.0

4 Wheat 11.45 37.58 2.59 420.74 OK 0.0

... ... ... ... ... ... ... ...

495 DodgerBlue 12.94 36.73 4.56 544.41 UT 1.0

496 OldLace 11.83 36.84 3.61 502.09 MI 1.0

497 Purple 11.68 38.72 3.59 463.59 MT 1.0

498 Moccasin 12.75 36.71 3.28 548.28 Bo 1.0

499 PeachPuff 12.13 38.19 4.02 597.74 SC 1.0

500 rows × 7 columns

1 from sklearn.model_selection import train_test_split

1 def vect(df, col):

[500 rows x 8 columns]

[500 rows x 9 columns]

Time on Time on Length of Yearly Amount

0 10.16 37.76 4.78 521.24 1.0 0 0 0 0

1 13.46 37.24 2.94 503.98 1.0 0 0 0 0

2 12.01 36.53 4.71 576.48 1.0 0 0 0 0

3 10.1 38.04 4.24 418.6 0.0 0 0 0 0

4 11.45 37.58 2.59 420.74 0.0 0 0 0 0

((400, 9), (100, 9))

Time on App Time on Website Length of Membership Yearly Amount Spent \

purchase avatar1 avatar2 clc1 clc2

purchase avatar1 avatar2 clc1 clc2

( Time on App Time on Website Length of Membership Yearly Amount Spent \

avatar1 avatar2 clc1 clc2 purchase

[400 rows x 9 columns],

avatar1 avatar2 clc1 clc2 purchase

[100 rows x 9 columns])

1 def resample(dataframe, oversample=False):

1 train, X_train, y_train = resample(train, oversample = True)

1 from sklearn.linear_model import LogisticRegression

1 from [Link] import classification_report

precision recall f1-score support

0 0.93 1.00 0.96 26

accuracy 0.98 100

You might also like