0% found this document useful (0 votes)

16 views9 pages

Advertising in ML

advertising in ML codes

Uploaded by

Suman Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views9 pages

Advertising in ML

advertising in ML codes

Uploaded by

Suman Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

advertising

September 1, 2024

1) Import libraries Let’s begin by importing the following Python libraries: NumPy, Pandas,
Seaborn,Matplotlib Pyplot, and Matplotlib inline.

[2]: import numpy as np

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

[ ]: 2) Import dataset
The second step is to import the dataset into the same cell. import the CSV␣
↪file into Jupyter Notebook using pd.read_csv() and add the file

directory according to your operating system.

[4]: df = pd.read_csv(r'D:\Jupyter\Advertising\advertising.csv')

[ ]: This loads the dataset into a Pandas dataframe. You can review the dataframe␣
↪using the head() command and clicking “Run” or by navigating to

Cell > Run All from the top menu. Here first 10 rows

[6]: df.head(10)

[6]: Daily Time Spent on Site Age Area Income Daily Internet Usage \
0 68.95 35 61833.90 256.09
1 80.23 31 68441.85 193.77
2 69.47 26 59785.94 236.50
3 74.15 29 54806.18 245.89
4 68.37 35 73889.99 225.58
5 59.99 23 59761.56 226.74
6 88.91 33 53852.85 208.36
7 66.00 48 24593.33 131.76
8 74.53 30 68862.00 221.51
9 69.88 20 55642.32 183.82

Ad Topic Line City Male Country \

0 Cloned 5thgeneration orchestration Wrightburgh 0 Tunisia
1 Monitored national standardization West Jodi 1 Nauru
2 Organic bottom-line service-desk Davidton 0 San Marino

1
3 Triple-buffered reciprocal time-frame West Terrifurt 1 Italy
4 Robust logistical utilization South Manuel 0 Iceland
5 Sharable client-driven software Jamieberg 1 Norway
6 Enhanced dedicated support Brandonstad 0 Myanmar
7 Reactive local challenge Port Jefferybury 1 Australia
8 Configurable coherent function West Colin 1 Grenada
9 Mandatory homogeneous architecture Ramirezton 1 Ghana

Timestamp Clicked on Ad
0 2016-03-27 00:53:11 0
1 2016-04-04 01:39:02 0
2 2016-03-13 20:35:42 0
3 2016-01-10 02:31:19 0
4 2016-06-03 03:36:18 0
5 2016-05-19 14:30:17 0
6 2016-01-28 20:59:32 0
7 2016-03-07 01:40:15 1
8 2016-04-18 09:33:42 0
9 2016-07-11 01:42:51 0

[ ]: We can see that the dataset comprises 10 features including

Daily Time Spent on Site, Age, Area Income, Daily Internet Usage, Ad Topic␣
↪Line, City, Male, Country,Timestamp, and Clicked on Ad.

3) Remove features
Next, we need to remove non-numerical features that can’t be parsed by this␣
↪algorithm, which includes Ad Topic Line, City, Country, and Timestamp.

Although the Timestamp values are expressed in numerals, their special␣

↪formatting is not compatible with the mathematical calculations that

must be made between variables using this algorithm.

We also need to remove the discrete variable Male, which is expressed as an␣
↪integer

(0 or 1), as our model only examines continuous input features.

Let’s remove the five features from the dataset using the del function and
specifying the column titles we wish to remove.

[10]: del df['Ad Topic Line']

del df['City']
del df['Country']
del df['Timestamp']
del df['Male']

[12]: df.head()

[12]: Daily Time Spent on Site Age Area Income Daily Internet Usage \
0 68.95 35 61833.90 256.09
1 80.23 31 68441.85 193.77
2 69.47 26 59785.94 236.50

2
3 74.15 29 54806.18 245.89
4 68.37 35 73889.99 225.58

Clicked on Ad
0 0
1 0
2 0
3 0
4 0

[14]: df.shape

[14]: (1000, 5)

[16]: df.columns

[16]: Index(['Daily Time Spent on Site', 'Age', 'Area Income',

'Daily Internet Usage', 'Clicked on Ad'],
dtype='object')

[18]: df.describe()

Daily Internet Usage Clicked on Ad

count 1000.000000 1000.00000
mean 180.000100 0.50000
std 43.902339 0.50025
min 104.780000 0.00000
25% 138.830000 0.00000
50% 183.130000 0.50000
75% 218.792500 1.00000
max 269.960000 1.00000

[28]: df.describe(include='all')

[28]: Daily Time Spent on Site Age Area Income \

count 1000.000000 1000.000000 1000.000000
unique NaN NaN NaN
top NaN NaN NaN

3
freq NaN NaN NaN
mean 65.000200 36.009000 55000.000080
std 15.853615 8.785562 13414.634022
min 32.600000 19.000000 13996.500000
25% 51.360000 29.000000 47031.802500
50% 68.215000 35.000000 57012.300000
75% 78.547500 42.000000 65470.635000
max 91.430000 61.000000 79484.800000

Daily Internet Usage Ad Topic Line City \

count 1000.000000 1000 1000
unique NaN 1000 969
top NaN Cloned 5thgeneration orchestration Lisamouth
freq NaN 1 3
mean 180.000100 NaN NaN
std 43.902339 NaN NaN
min 104.780000 NaN NaN
25% 138.830000 NaN NaN
50% 183.130000 NaN NaN
75% 218.792500 NaN NaN
max 269.960000 NaN NaN

Male Country Timestamp Clicked on Ad

count 1000.000000 1000 1000 1000.00000
unique NaN 237 1000 NaN
top NaN France 2016-03-27 00:53:11 NaN
freq NaN 9 1 NaN
mean 0.481000 NaN NaN 0.50000
std 0.499889 NaN NaN 0.50025
min 0.000000 NaN NaN 0.00000
25% 0.000000 NaN NaN 0.00000
50% 0.000000 NaN NaN 0.50000
75% 1.000000 NaN NaN 1.00000
max 1.000000 NaN NaN 1.00000

[20]: sns.pairplot(df,vars=['Area Income','Daily Time Spent on Site','Daily Internet␣

↪Usage'])

[20]: <seaborn.axisgrid.PairGrid at 0x266bcbf2bd0>

4
[ ]: 4) Scale data
Next we will import the Scikit-learn function StandardScaler, which will be␣
↪used to

standardize features by using zero as the mean for all variables and scaling to␣
↪unit

variance. The mean and standard deviation are then stored and used later with
the transform method (recreates the dataframe with the requested transformed
values).

[22]: #Import StandardScaler

from sklearn.preprocessing import StandardScaler

5
[ ]: After importing StandardScaler, we can assign it as a new variable, fit the␣
↪function

to the features contained in the dataframe, and transform those values under a
new variable name.

[24]: scaler = StandardScaler()

scaler.fit(df)
scaled_data = scaler.transform(df)

[ ]: StandardScaler is often used in conjunction with PCA and other algorithms

including k-nearest neighbors and support vector machines to rescale and
standardize data features. This gives the dataset the properties of a standard
normal distribution with a mean of zero and a standard deviation of one.
Without standardization, the PCA algorithm is likely to lock onto features that
maximize variance but that could be exaggerated by another factor. To provide an
example, the variance of Age changes dramatically when measured in days in place
of years, and if left unchecked, this type of formatting might mislead the␣
↪selection

of components which is based on maximizing variance. StandardScaler helps to

avoid this problem by rescaling and standardizing variables.
Conversely, standardization might not be necessary for PCA if the scale of the
variables is relevant to your analysis or consistent across variables.
5) Assign algorithm
Having laid much of the groundwork for our model, we can now import the PCA
algorithm from Scikit-learn’s decomposition library.

[26]: from sklearn.decomposition import PCA

[ ]: Take careful note of the next line of code as this is where we reshape the
dataframe’s features into a defined number of components. For this exercise, we
want to find the components that have the most impact on data variability. By
setting the number of components to 2 (n_components=2), we’re asking PCA to find
the two components that best explain variability in the data. The number of
components can be modified according to your requirements, but two components
is the simplest to interpret and visualize on a scatterplot.

[28]: pca = PCA(n_components=2)

[ ]: Next, we need to fit the two components to our scaled data and recreate the
dataframe’s values using the transform method.

[30]: pca.fit(scaled_data)
scaled_pca = pca.transform(scaled_data)

[ ]: Let’s check the transformation using the shape command to compare the two
datasets.

6
[32]: #Query the number of rows and columns in the scaled dataframe
scaled_data.shape

[32]: (1000, 5)

[ ]: Now query the shape of the scaled PCA dataframe.

[34]: #Query the number of rows and columns in the scaled PCA dataframe
scaled_pca.shape

[34]: (1000, 2)

[ ]: We can see that the scaled dataframe has been compressed from 1,000 rows with
5 columns to 1,000 rows with 2 columns using PCA.

[36]: #State the size of the plot

plt.figure(figsize=(10,8))

[36]: <Figure size 1000x800 with 0 Axes>

<Figure size 1000x800 with 0 Axes>

[ ]: 6) Visualize the output

Let’s use the Python plotting library Matplotlib to visualize the two principal
components on a 2-D scatterplot, with principal component 1 marked on the x-axis
and principal component 2 on the y-axis.
We’ll visualize the two principal components without a color legend in the first
version of the code before adding code for the color legend in the second␣
↪version.

[45]: #State the size of the plot

plt.figure(figsize=(10,8))
#Configure the scatterplot’s x and y axes as principal components 1 and 2, and␣
↪color-coded by the variable Clicked on Ad.

plt.scatter(scaled_pca[:, 0],scaled_pca[:, 1],c=df['Clicked on Ad'])

#State the scatterplot labels
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')

[45]: Text(0, 0.5, 'Second Principal Component')

7
[ ]: The two components are color-coded to delineate the outcome of Clicked on Ad
(Clicked/Did not click). Keep in mind that components don’t correspond to a
single variable but rather a combination of variables.
Finally, we can modify the code to add a color legend. This is a more advanced␣
↪set

of code and requires the use of a for-loop in Python and RGB color codes that␣
↪can

be found at Rapidtables.com.
Version 2: Visualized plot with color legend

[49]: plt.figure(figsize=(10,8))
legend = df['Clicked on Ad']
#Add indigo and yellow RGB colors
colors = {0: '#4B0082', 1: '#FFFF00'}
labels = {0: 'Clicked', 1: 'Did not click'}
#Use a for-loop to set color for each data point
for t in np.unique(legend):
ix = np.where(legend == t)

8
plt.scatter(scaled_pca[ix,0], scaled_pca[ix,1], c=colors[t],␣
↪label=labels[t])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.legend()
plt.show()

[ ]: From this visualization, we can see the clear separation of outcomes with the␣
↪aid

of a color legend in the top right corner. The output of PCA is now ready for
further analysis using a supervised learning technique such as logistic␣
↪regression

or k-nearest neighbors.

[ ]:

FHL Jason Fladlien Event Presentation
88% (8)
FHL Jason Fladlien Event Presentation
94 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Acmp-2100 Eda V2 e
50% (2)
Acmp-2100 Eda V2 e
22 pages
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
From Everand
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
equitypress
3.5/5 (2)
Repackage UWP Apps
No ratings yet
Repackage UWP Apps
3 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Python Cheatsheet.pptx
No ratings yet
Python Cheatsheet.pptx
2 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Assignment
No ratings yet
Assignment
24 pages
PCA Explained
No ratings yet
PCA Explained
9 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas
No ratings yet
Pandas
25 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
30 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
ML Assignment 1
No ratings yet
ML Assignment 1
12 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Updated New Eda Manual
No ratings yet
Updated New Eda Manual
76 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
ML Lab
No ratings yet
ML Lab
14 pages
Practical 5
No ratings yet
Practical 5
6 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Main - Py Text File
No ratings yet
Main - Py Text File
5 pages
AI Student HandbookXII 2025-26!8!20
No ratings yet
AI Student HandbookXII 2025-26!8!20
13 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
1736747395UPID - 001610 - MCAN-104 - Instruction Sheet
No ratings yet
1736747395UPID - 001610 - MCAN-104 - Instruction Sheet
6 pages
Basic HTML5 Structure
No ratings yet
Basic HTML5 Structure
5 pages
Data Science Through R Lesson-1 Introduction To Data Science
No ratings yet
Data Science Through R Lesson-1 Introduction To Data Science
33 pages
Poulomi Mondal BCA 1 49 GE2B-06 The Language of Graphic Basics and Beyond
No ratings yet
Poulomi Mondal BCA 1 49 GE2B-06 The Language of Graphic Basics and Beyond
6 pages
Doc-20240623-Wa0004 240624 111122
No ratings yet
Doc-20240623-Wa0004 240624 111122
4 pages
100+ Logical Puzzles Questions With Answers PDF
No ratings yet
100+ Logical Puzzles Questions With Answers PDF
63 pages
Artickles For Mar Point-1
No ratings yet
Artickles For Mar Point-1
2 pages
Lecture Plan - BCAC591
No ratings yet
Lecture Plan - BCAC591
2 pages
Complex MCQ
No ratings yet
Complex MCQ
12 pages
A. B. C. D.: Gorblflur Means Fan Belt Pixngorbl Means Ceiling Fan Arthtusl Means Tile Roof
No ratings yet
A. B. C. D.: Gorblflur Means Fan Belt Pixngorbl Means Ceiling Fan Arthtusl Means Tile Roof
12 pages
Discrete Mathematics Questions and Answers
No ratings yet
Discrete Mathematics Questions and Answers
27 pages
Multiple Choice Questions (MCQS) Class: V Chapter - 1: Subject: Mathematics
No ratings yet
Multiple Choice Questions (MCQS) Class: V Chapter - 1: Subject: Mathematics
6 pages
Discrete Mathematics MCQ
100% (4)
Discrete Mathematics MCQ
55 pages
Android
No ratings yet
Android
79 pages
Types of Database Users
No ratings yet
Types of Database Users
2 pages
Chapter 1
No ratings yet
Chapter 1
93 pages
Case: 27:02:2012 EOW Mandir Marg, New Delhi - Randi RV Ritu Varma Case #RandiRVRituVarmaCase #RandiO
No ratings yet
Case: 27:02:2012 EOW Mandir Marg, New Delhi - Randi RV Ritu Varma Case #RandiRVRituVarmaCase #RandiO
1 page
Lvtds Converter
No ratings yet
Lvtds Converter
10 pages
Trian Ticketing Project Proposal
No ratings yet
Trian Ticketing Project Proposal
9 pages
Top 500 Abbreviation For HC Aso Mains 2025
No ratings yet
Top 500 Abbreviation For HC Aso Mains 2025
14 pages
Gripon, John Adrian G. Bsit-601: Create A Simple HTML File With The Following Syntax. Use PTHTML As The File Name
No ratings yet
Gripon, John Adrian G. Bsit-601: Create A Simple HTML File With The Following Syntax. Use PTHTML As The File Name
5 pages
Looping in AmiBroker AFL - Jbmarwood - Com (PDFDrive)
100% (1)
Looping in AmiBroker AFL - Jbmarwood - Com (PDFDrive)
22 pages
Grumpy Monkey Pages 1-36 - Flip PDF Download - FlipHTML5
100% (1)
Grumpy Monkey Pages 1-36 - Flip PDF Download - FlipHTML5
36 pages
Configure PRTG Network Monitor
No ratings yet
Configure PRTG Network Monitor
2 pages
PRMS - VU Final Project
No ratings yet
PRMS - VU Final Project
93 pages
Microsoft Dynamics 365 2025 Release Guide
No ratings yet
Microsoft Dynamics 365 2025 Release Guide
3 pages
Keyboard Shortcut1
0% (1)
Keyboard Shortcut1
2 pages
EC6301 - Notes
No ratings yet
EC6301 - Notes
179 pages
Apple Iphone 11 Pro Max Vs Samsung Galaxy A15 5G
No ratings yet
Apple Iphone 11 Pro Max Vs Samsung Galaxy A15 5G
2 pages
Unit III JDBC Connectivity
No ratings yet
Unit III JDBC Connectivity
14 pages
ML0 Python
No ratings yet
ML0 Python
112 pages
CompTIA A+ 220-1102 Exam Updated Dumps
No ratings yet
CompTIA A+ 220-1102 Exam Updated Dumps
66 pages
Camfrog Bot Developer's Guide
No ratings yet
Camfrog Bot Developer's Guide
29 pages
Cs Project
No ratings yet
Cs Project
13 pages
Pa114 Reviewer
No ratings yet
Pa114 Reviewer
3 pages
Linux System Programming Part 5 - Interprocess Communication (IPC)
No ratings yet
Linux System Programming Part 5 - Interprocess Communication (IPC)
21 pages
9781119823414
No ratings yet
9781119823414
53 pages
HTML Cheat Sheet - Scaler Topics
No ratings yet
HTML Cheat Sheet - Scaler Topics
16 pages
Salesforce Platform Pricing Editions
No ratings yet
Salesforce Platform Pricing Editions
2 pages
Student Edition
No ratings yet
Student Edition
4 pages
Internships in Chongqing-Summer2023
No ratings yet
Internships in Chongqing-Summer2023
5 pages
It1814 1 Just Read It
No ratings yet
It1814 1 Just Read It
6 pages