We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
711323, 016M
In [36]:
In [37]:
out [37]:
In [38]:
out [38]:
‘Sales Prediction Project Task 2 - Jupyter Notebook
Project Report: Sales Prediction using
Python
Submitted by, Mr. Omkar Balwant Jadhav
Introduction
The objective of this project is to develop a machine learning model to predict sales based on
different advertising channels: TV, Radio, and Newspaper. The dataset used for this project
consists of historical data that includes advertising expenditures on each channel and
corresponding sales figures.
Collecting and Filtering Data
+ The dataset contains 200 records with four columns: TV, Radio, Newspaper, and Sales.
+ Before proceeding with the analysis, it's important to perform exploratory data analysis to
gain insights into the data distribution, identify any missing values, and check for
correlations between variables.
+ The data should be preprocessed by handling missing values (if any), handling outliers,
and scaling the features (if required)
1 import pandas as pd
2 data = pd.read_csv(
:\\Users\\Onkar\\Downloads\\Sales Prediction Data.cs\
1 data.shape
(200, 5)
1 data.head()
Unnamed: 0 TV Radi
Newspaper Sales
° 1 2301 378 692 22.1
1 2 445 393 454 10.4
2 3 172 459 693 93,
3 41515 413 585 185
4 5 1808 10.8 584 129
Exploratory Data Analysis
Iocahost 8888inotebooks/Desklop/Final ProjectSales Presition Projecct Task 2ipynb ve711323, 216 AM ‘Sales Prediction Project Task 2 - Jupyter Notebook
Exploratory Data Analysis refers to the critical process of performing initial investigations on
data so as to discover patterns,to spot anomalies, to test hypothesis and to check assumptions
with the help of summary statistics and graphical representations.
Itis a good practice to understand the data first and try to gather as many insights from it
EDA\is all about making sense of data in hand.
In [39]: 1 data.info()
RangeIndex: 200 entries, @ to 199
Data colunns (total 5 columns):
# Column Non-Null Count Dtype
Unnamed: @ 200 non-null —inte4
e
1 ow 28 non-null —float64
2 Radio 208 non-null —floate4
3 Newspaper 208 non-null —float64
4 Sales 208 non-null —floate4
dtypes: floatea(4), int6a(1)
memory usage: 7.9 KB
In [40]: 1 data.describe()
out (49):
Unnamed: 0 TV___ Radio Newspaper __ Sales
count 200.000000 700.000000 200.000000 200.000000 200.000000
mean 100.5000 147.042500 23,264000 30.554000 4.022500
std 7.879185 85.854236 14.848809 21.78621 5.217457
min 1,000000 0.700000 0.000000 0.300000 1.600000
25% — 50.750000 74.3750 9.875000 12.750000 10.375000
50% 100,500000 149.750000 22.900000 2.750000 12.900000
75% 150.250000 218,825000 96.525000 45.100000 17.4000
max 200,000000 298.400000 49.600000 114.0000 27.0000
In [41]: 1 data.isnul1().sum()
out[41]: Unname
wv
Radio
Newspaper
sales
dtype: intes
Data Visualization
Incas 8888/notebooks/Desklop/Final ProjectSales Prasition Projecct Task 2ipynb 28711323, 216 AM
In [42]:
sales
Sales Prediction Project Task 2 -Jupyter Notebook
import matplotlib.pyplot as plt
# Scatter plot of TV vs Sales
pit.scatter(data['TV'], data['sales’})
plt.xlabel('Tv")
plt.ylabel('Sales*)
plt.title(‘TV vs Sales")
pt. show()
# Scatter plot of Radio vs Sales
plt.scatter(data[ 'Radio'], data[‘Sales'])
plt.xlabel(‘Radio")
plt.ylabel(‘Sales')
plt.title( ‘Radio vs Sales‘)
plt.show()
#t Scatter plot of Newspaper vs Sales
plt.scatter(datal ‘Newspaper'], data[ ‘Sales'])
plt.xlabel(‘Newspaper')
plt.ylabel( Sales")
plt.title( ‘Newspaper vs Sales")
plt.show()
TV vs Sales
25
20
15
10
ei
5
e
e
° 50 100 150
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb
se711323, 216 AM
25
20
10
25
20
15
sales
10
Sales Prediction Project Task 2 -Jupyter Notebook
Radio vs Sales
.
o 10 20 30 40 50
Radio
Newspaper vs Sales
°
° °
e
°
0 20 40 60 20 100
Newspaper
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb
48711323, 216 AM
In [43]:
Sales Prediction Project Task 2 -Jupyter Notebook
1 import matplotlib.pyplot as plt
2
3 # Set the figure size
4 pit. figure(figsize=(10, 6))
5
6 # Plot the bar plots for 'TV', ‘Radio’, and ‘Newspaper’ columns
7 plt.bar(data['TV'], data['Sales’], color="red", alpha=0.5, label="TV")
8 plt.bar(data['Radio'], data['sales'], color="green', alpha-0.5, label='Ra\
9 plt.bar(data[ "Newspaper" ], data[‘Sales'], color="blue’, alpha=0.5, label=
10
11 # Add Labels and title to the plot
12. plt.xlabel( ‘Advertisement Medium’)
13. plt.ylabel(‘Sales')
14 plt.title( ‘Sales vs Advertisement Medium’)
15
16 # Add a Legend
17. plt-legend()
18
19 # Show the plot
28 plt.show()
21
Sales vs Advertisement Medium
=
mm Rao
mm Newspaper
2
20
sates
100 350
‘Advertisement Medium
Machine Learning Process
Step 1: Import the necessary libraries
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb 58711323, 216 AM
In [44]:
In [45]:
out [45]:
‘Sales Prediction Project Task 2 - Jupyter Notebook
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_odel import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
whens
Step 2: Load and preprocess the data
1 data = pd.read_csv("C:\\Users\\Omkar\\Downloads\\Sales Prediction Data.csy
5 ‘# Split the data into features (X) and target variable (y)
4 X = data.drop('Sales', axis=1)
5S y = data['Sales']
200 rows x 5 columns
Step 3: Train the model
+ Several regression models can be considered for sales prediction, such as linear
regression, decision tree regression, random forest regression, and support vector
regression.
+ The dataset can be split into training and testing sets using tech
validation or a simple train-test split.
+ The selected regression model can then be trained on the training set.
ues like k-fold cross-
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb ae711323, 216 AM ‘Sales Prediction Project Task 2 - Jupyter Notebook
In [46]: # Create a Linear regression model
model = LinearRegression()
1
2
3
4 # Train the model on the training data
5 model.fit(Xtrain, y train)
6
out[46]: LinearRegression()
Step 4: Evaluate the model
+ The trained model needs to be evaluated to assess its performance and generalization
capabilities.
+ Evaluation metrics like mean squared error (MSE), root mean squared error (RMSE), and
R-squared (R2) can be used to quantify the performance of the model
+ Comparing the mode''s performance on the training and testing sets can help identify
overfiting or underfitting issues.
In [47]: # Make predictions on the test data
y_pred = model.predict(Xx_test)
1
2
3
4 # Calculate evaluation metrics
5 mse = mean_squared_error(y test, y_pred)
6 r2= r2_score(y_test, y_pred)
7
8
9
print(
print(
"ean Squared Error:", mse)
", 72)
square’
Mean Squared Error: 3.1990044685889063
Resquared: @.898648915141708
Step 5: Predict sales for new data
+ Once the best model is selected and trained, it can be deployed for real-world predictions,
+ New data points can be provided as input to the trained model to predict sales based on
the advertising expenditures on TV, Radio, and Newspaper.
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb 78711323, 216 AM
In [48]:
In [
In]:
‘Sales Prediction Project Task 2 - Jupyter Notebook
# Create a new DataFrame for new data
pd.DataFrame({'SepalLengthcm': [5.2, 6.1, 4.9],
"SepalWidthcm': (3.1, 2.8, 3.5],
"PetalLengthcm': [1.7, 4.7, 1.5],
"PetalWidthcm': [@.5, 1.6, @.4]})
# Predict sales for the new data
predictions = nodel.predict (new_data)
print("Predictions:", predictions)
Predictions: [3.37175054 3.93001816 3.35128999]
€:\Users\Onkar\anaconda3\1ib\site-packages\sklearn\base.py:493: FutureWarnin
g: The feature names should match those that were passed during fit. Starting
version 1.2, an error will be raised.
Feature names unseen at fit time:
- PetalLengthcm
~ PetalWidthcm
- SepalLengthcm
~ SepalWidthcm
Feature names seen at fit time, yet now missing:
- Newspaper
~ Radio
wv
- Unnamed: @
warnings.warn(message, FutureWarning)
Conclusion
advertising strategies and maximize sales.
Incas 8888inotebooks/Desklop/Final ProjectSalos Preston Projecct Task 2ipynb
‘Summarize the key findings of the project, including the best-performing model, important
features affecting sales, and the model's predictive capabilities.
Highlight any insights or recommendations based on the analysis that could help optimize
ory