0% found this document useful (0 votes)

41 views21 pages

Python For Business Decision Making Asm2

The document discusses analyzing sales data from burger stores between January 2014 to September 2015. It cleans the data by changing data types, filtering dates, and checking for missing values, duplicates and outliers. It then calculates descriptive statistics on the monthly aggregated data, including the mean price, total quantity and total sales by month. No outliers were found in the data.

Uploaded by

Đặng Hoàng Nhật Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views21 pages

Python For Business Decision Making Asm2

Uploaded by

Đặng Hoàng Nhật Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Python_for_Business_Decision_Making_Asm2

May 3, 2023

Date: Date of the transactions(group by day) Price: Unit price Qty: Quantity of products Item:
Name of the item Holiday: Name of the holiday (0= non holiday, 1= holiday) Is Weekend: Flag of
weekend (0= week day,1= is weekend) Is Schoolbreak: Flag of schoolbreak (0= non schoolbreak,
1= schoolbreak) total_sales

[ ]: from google.colab import drive

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call

drive.mount("/content/drive", force_remount=True).

[ ]: import pandas as pd
from scipy.stats import stats
import warnings
warnings.filterwarnings('ignore')

[ ]: from google.colab import drive

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call

drive.mount("/content/drive", force_remount=True).

[ ]: stores = pd.read_csv('/content/drive/MyDrive/Python for Da/Burger_store.csv')

stores.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 608 entries, 0 to 607
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 608 non-null object
1 price 608 non-null float64
2 qty 608 non-null int64
3 item 608 non-null object
4 holiday 608 non-null int64
5 is_weekend 608 non-null int64
6 is_schoolbreak 608 non-null int64
7 total_sales 608 non-null int64

1
dtypes: float64(1), int64(5), object(2)
memory usage: 38.1+ KB

[ ]: # change 'date' from object type to daytime

stores['date'] = pd.to_datetime(stores['date'])
stores.info()
stores

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 608 entries, 0 to 607
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 608 non-null datetime64[ns]
1 price 608 non-null float64
2 qty 608 non-null int64
3 item 608 non-null object
4 holiday 608 non-null int64
5 is_weekend 608 non-null int64
6 is_schoolbreak 608 non-null int64
7 total_sales 608 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(5), object(1)
memory usage: 38.1+ KB

[ ]: date price qty item holiday is_weekend is_schoolbreak \

0 2014-01-01 15.5 72 BURGER 1 0 0
1 2014-01-02 15.5 76 BURGER 1 0 0
2 2014-01-03 15.5 68 BURGER 1 0 0
3 2014-01-04 15.5 74 BURGER 0 1 0
4 2014-01-05 15.5 70 BURGER 0 1 0
.. … … … … … … …
603 2015-08-27 14.5 92 BURGER 0 0 1
604 2015-08-28 14.5 90 BURGER 0 0 1
605 2015-08-29 14.5 68 BURGER 0 1 1
606 2015-08-30 14.5 64 BURGER 0 1 1
607 2015-08-31 14.5 90 BURGER 0 0 1

total_sales
0 1116
1 1178
2 1054
3 1147
4 1085
.. …
603 1334
604 1305
605 986

2
606 928
607 1305

[608 rows x 8 columns]

[ ]: # convert the date column to a datetime object

stores['date'] = pd.to_datetime(stores['date'])

# create a new column with the month and day information

stores['month_year'] = stores['date'].dt.strftime('%B %Y')
stores.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 608 entries, 0 to 607
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 608 non-null datetime64[ns]
1 price 608 non-null float64
2 qty 608 non-null int64
3 item 608 non-null object
4 holiday 608 non-null int64
5 is_weekend 608 non-null int64
6 is_schoolbreak 608 non-null int64
7 total_sales 608 non-null int64
8 month_year 608 non-null object
dtypes: datetime64[ns](1), float64(1), int64(5), object(2)
memory usage: 42.9+ KB

[ ]: # convert the data type of the month_year column to datetime

stores['month_year'] = pd.to_datetime(stores['month_year'], format='%B %Y')

# filter the records from January 2014 to September 2015

start_date = pd.to_datetime('2014-01-01')
end_date = pd.to_datetime('2015-09-30')
mask = (stores['month_year'] >= start_date) & (stores['month_year'] <= end_date)
stores = stores.loc[mask]

# group by month and year and calculate the total sales, quantity, and mean␣
↪price

monthly_price = stores.groupby('month_year')['price'].mean().round(2)
monthly_qty = stores.groupby('month_year')['qty'].sum()
monthly_total_sales = stores.groupby('month_year')['total_sales'].sum()

# combine the results into a single DataFrame

3
monthly_summary = pd.concat([monthly_price, monthly_qty, monthly_total_sales],␣
↪axis=1)

# display the result

print(monthly_summary)

price qty total_sales

month_year
2014-01-01 15.50 2780 43090
2014-02-01 15.50 2268 35154
2014-03-01 15.50 2390 37045
2014-04-01 15.13 2348 35498
2014-05-01 14.50 2604 37758
2014-06-01 14.50 2480 35960
2014-07-01 14.50 2666 38657
2014-08-01 14.95 2528 37766
2014-09-01 15.50 2334 36177
2014-10-01 15.50 2348 36394
2014-11-01 15.50 2302 35681
2014-12-01 14.73 2884 42197
2015-01-01 14.00 2910 40740
2015-02-01 14.00 2492 34888
2015-03-01 14.00 2802 39228
2015-04-01 15.07 2394 35948
2015-05-01 16.00 2194 35104
2015-06-01 16.00 2228 35648
2015-07-01 16.00 2374 37984
2015-08-01 15.08 2518 37864

0.1 1.b. Clean data:

[ ]: # Checking for missing data
print("Number of missing values in each column:\n", stores.isnull().sum())

Number of missing values in each column:

date 0
price 0
qty 0
item 0
holiday 0
is_weekend 0
is_schoolbreak 0
total_sales 0
month_year 0
dtype: int64

4
[ ]: # Checking for duplicate data
print("Number of duplicated records: ", len(stores[stores.duplicated()]))

Number of duplicated records: 0

[ ]: # Checking for outlier data

import seaborn as sns
import matplotlib.pyplot as plt

[ ]: # Calculate IQR for 'price' variable

Q1 = stores['price'].quantile(0.25)
Q3 = stores['price'].quantile(0.75)
IQR = Q3 - Q1

# Define outliers
price_outliers = (stores['price'] < (Q1 - 1.5 * IQR)) | (stores['price'] > (Q3␣
↪+ 1.5 * IQR))

# Print number of outliers

print("Number of outliers for 'price':", price_outliers.sum())

# Create box plot

sns.boxplot(x=stores['price'])

Number of outliers for 'price': 0

[ ]: <Axes: xlabel='price'>

5
[ ]: # Calculate IQR for 'qty' variable
Q1 = stores['qty'].quantile(0.25)
Q3 = stores['qty'].quantile(0.75)
IQR = Q3 - Q1

# Define outliers
qty_outliers = (stores['qty'] < (Q1 - 1.5 * IQR)) | (stores['qty'] > (Q3 + 1.5␣
↪* IQR))

# Print number of outliers

print("Number of outliers for 'qty':", qty_outliers.sum())

# Create box plot

sns.boxplot(x=stores['qty'])

Number of outliers for 'qty': 0

[ ]: <Axes: xlabel='qty'>

6
[ ]: # Calculate IQR for 'total_sales' variable
Q1 = stores['total_sales'].quantile(0.25)
Q3 = stores['total_sales'].quantile(0.75)
IQR = Q3 - Q1

# Define outliers
sales_outliers = (stores['total_sales'] < (Q1 - 1.5 * IQR)) |␣
↪(stores['total_sales'] > (Q3 + 1.5 * IQR))

# Print number of outliers

print("Number of outliers for 'total_sales':", sales_outliers.sum())

# Create box plot

sns.boxplot(x=stores['total_sales'])

Number of outliers for 'total_sales': 0

[ ]: <Axes: xlabel='total_sales'>

7
[ ]: # calculate sum of all outliers
total_outliers = price_outliers.sum() + qty_outliers.sum() + sales_outliers.
↪sum()

# print sum of all outliers

print("Sum of all outliers: " + str(total_outliers))

Sum of all outliers: 0

1 1.c. Calculate descriptive statistics:

[ ]: # calculate descriptive statistics
summary_stats = monthly_summary.describe().round(2)

# display the result

print(summary_stats)

price qty total_sales

count 20.00 20.00 20.00
mean 15.07 2492.20 37439.05
std 0.66 217.34 2354.58

8
min 14.00 2194.00 34888.00
25% 14.50 2344.50 35672.75
50% 15.10 2437.00 36719.50
75% 15.50 2619.50 38152.25
max 16.00 2910.00 43090.00
Univariate analysis: The categorical variables are: holiday is_weekend is_schoolbreak
The continuous variables are: price qty total_sales
[ ]: cat_cols =['holiday', 'is_weekend','is_schoolbreak']
con_cols =['price', 'qty','total_sales']

[ ]: for column in cat_cols:

print("*, Column: ", column)
print(len(stores[column].unique()), "unique values")

*, Column: holiday
2 unique values
*, Column: is_weekend
2 unique values
*, Column: is_schoolbreak
2 unique values

[ ]: for column in con_cols:

print("*, Column: ", column)
print(len(stores[column].unique()))

*, Column: price
4
*, Column: qty
39
*, Column: total_sales
64
Categorical:
[ ]: import sys
sys.path.append("/content/drive/MyDrive/Python for Da/")
import EDA_funcs

[ ]: from EDA_funcs import *

import scipy
from scipy.stats import chi2_contingency
from scipy.stats import chi2
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

9
[ ]: for cat in cat_cols:
print('Univariate analysis', cat)
univariate_analysis_categorical_variable_2(stores, cat)
print()

[ ]: for con in con_cols:

print('Univariate analysis', con)
univariate_analysis_continuous_variable(stores, stores[con])
check_outlier(stores, stores[con])
univariate_visualization_analysis_continuous_variable_new(stores[con])
print()

Univariate analysis price

Describe:
count 608.000000
mean 15.074013
std 0.735843
min 14.000000
25% 14.500000
50% 15.500000
75% 15.500000
max 16.000000
Name: price, dtype: float64
Mode: 0 15.5
Name: price, dtype: float64
Range: 2.0
IQR: 1.0
Var: 0.5414652518858927
Std: 0.73584322507304
Skew: -0.2570967196392115
Kurtosis: -1.4796550712211953

10
Number of upper outliers: 0
Number of lower outliers: 0
Percentage of ouliers: 0.0

11
12
Univariate analysis qty
Describe:
count 608.000000
mean 81.980263
std 16.412303
min 38.000000
25% 68.000000
50% 84.000000
75% 92.500000
max 124.000000
Name: qty, dtype: float64
Mode: 0 84
Name: qty, dtype: int64
Range: 86
IQR: 24.5
Var: 269.3636954825284
Std: 16.412303174220504
Skew: -0.13864071557455301
Kurtosis: -0.3265014211143429

13
Number of upper outliers: 0
Number of lower outliers: 0
Percentage of ouliers: 0.0

14
Univariate analysis total_sales
Describe:
count 608.000000
mean 1231.547697
std 230.822548
min 589.000000
25% 986.000000
50% 1312.000000
75% 1372.000000
max 1736.000000
Name: total_sales, dtype: float64
Mode: 0 1344
Name: total_sales, dtype: int64
Range: 1147
IQR: 386.0
Var: 53279.04879205324
Std: 230.82254827475856
Skew: -0.409031487896684
Kurtosis: -0.5073796235684283

15
Number of upper outliers: 0
Number of lower outliers: 0
Percentage of ouliers: 0.0

16
17
Bi-variable analysis (total_sales with others).
Continuos - Continuos
[ ]: for i in range(0, len(con_cols)):
col1 = con_cols[i]
col2 = 'total_sales'
print('Bi-variable analysis', col1, 'and', col2)
print(stores[[col1, col2]].corr())
print()

Bi-variable analysis price and total_sales

price total_sales
price 1.000000 -0.108335
total_sales -0.108335 1.000000

Bi-variable analysis qty and total_sales

qty total_sales
qty 1.00000 0.96776
total_sales 0.96776 1.00000

Bi-variable analysis total_sales and total_sales

total_sales total_sales
total_sales 1.0 1.0
total_sales 1.0 1.0

[ ]: sns.pairplot(stores[["total_sales", "price"]])

[ ]: <seaborn.axisgrid.PairGrid at 0x7feb29120fd0>

18
[ ]: sns.pairplot(stores[["total_sales", "qty"]])

[ ]: <seaborn.axisgrid.PairGrid at 0x7feb28f55450>

19
Two-variable analysis price and total_sales:
The correlation coeﬀicient between price and total_sales is -0.108. This indicates a weak negative
correlation between the two variables. In other words, as price increases, total_sales tends to
decrease slightly. However, the correlation is weak, so this relationship may not be statistically
significant.
Two-variable analysis qty and total_sales:
The correlation coeﬀicient between qty and total_sales is 0.968. This indicates a strong positive
correlation between the two variables. In other words, as qty increases, total_sales tends to
increase as well. This relationship is statistically significant and suggests that qty is a strong
predictor of total_sales.
Categorical - Continuos
[ ]: cat_cols

[ ]: ['holiday', 'is_weekend', 'is_schoolbreak']

20
[ ]: # ANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols

[ ]: d_melt = stores[['holiday', 'is_weekend','is_schoolbreak', 'total_sales']]

d_melt.head()

[ ]: holiday is_weekend is_schoolbreak total_sales

0 1 0 0 1116
1 1 0 0 1178
2 1 0 0 1054
3 0 1 0 1147
4 0 1 0 1085

[ ]: import statsmodels.api as sm
from statsmodels.formula.api import ols

# create the linear regression model

model = ols('total_sales ~ holiday + is_weekend + is_schoolbreak', data=stores).
↪fit()

# perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)

# print the ANOVA table

print(anova_table)

sum_sq df F PR(>F)
holiday 5.146908e+06 1.0 549.695410 6.209636e-87
is_weekend 2.129255e+07 1.0 2274.067488 6.121558e-207
is_schoolbreak 1.631602e+04 1.0 1.742568 1.873138e-01
Residual 5.655373e+06 604.0 NaN NaN
The ANOVA table shows the results of a linear regression model that investigates the relationship
between total sales and three predictor variables: holiday, is_weekend, and is_schoolbreak.
The table shows that both holiday and is_weekend have a significant effect on total sales,
with very low p-values (6.21e-87 and 6.12e-207, respectively). However, is_schoolbreak does not
have a significant effect on total sales, with a relatively high p-value (1.87e-01).
Overall, these results suggest that holiday and is_weekend are strong predictors of total sales, while
is_schoolbreak does not have a significant effect.

Identification and Completion Type of Test
No ratings yet
Identification and Completion Type of Test
6 pages
A Comparative Analysis On Linear Regression and Support Vector Regression
No ratings yet
A Comparative Analysis On Linear Regression and Support Vector Regression
5 pages
Business Intelligence, Analytics, and A Data Science: Managerial Perspective
No ratings yet
Business Intelligence, Analytics, and A Data Science: Managerial Perspective
406 pages
Midwives
No ratings yet
Midwives
156 pages
Lec1 - Introduction
No ratings yet
Lec1 - Introduction
10 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Data Manipulation-MS Access
No ratings yet
Data Manipulation-MS Access
72 pages
Chapter 1
100% (2)
Chapter 1
20 pages
Literature Review Table Apa
100% (2)
Literature Review Table Apa
6 pages
Pandas
No ratings yet
Pandas
20 pages
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
No ratings yet
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
41 pages
5 Data Visualization
No ratings yet
5 Data Visualization
58 pages
SMDM Final - Jupyter Notebook
100% (1)
SMDM Final - Jupyter Notebook
17 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
EXP 5 DE Lab
No ratings yet
EXP 5 DE Lab
5 pages
Beige and Black Colorful Illustration Stock Market Presentation
No ratings yet
Beige and Black Colorful Illustration Stock Market Presentation
19 pages
OpenText EDOCS DM 16.7 Registry Guide
No ratings yet
OpenText EDOCS DM 16.7 Registry Guide
150 pages
Cap 793
No ratings yet
Cap 793
17 pages
Wa0016.
No ratings yet
Wa0016.
13 pages
3318-Article Text-8852-1-10-20220430
No ratings yet
3318-Article Text-8852-1-10-20220430
12 pages
m03 v01 Store Sales Prediction
No ratings yet
m03 v01 Store Sales Prediction
11 pages
Updated Synopsis Format
No ratings yet
Updated Synopsis Format
6 pages
CDAC Assignment
No ratings yet
CDAC Assignment
3 pages
EcommerceAnalysis 1680541297
No ratings yet
EcommerceAnalysis 1680541297
11 pages
Data Science Tutorial 1686911993
No ratings yet
Data Science Tutorial 1686911993
41 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Mathematics Worksheet For Primary 5
No ratings yet
Mathematics Worksheet For Primary 5
4 pages
Multimedia Database: Chapter 9, Principles of Multimedia Database Systems. V.S. Subrahmanian, 1998
No ratings yet
Multimedia Database: Chapter 9, Principles of Multimedia Database Systems. V.S. Subrahmanian, 1998
30 pages
Bud-Light Speaker Tower Teardown Internal Photos Int Impact Machine Design
No ratings yet
Bud-Light Speaker Tower Teardown Internal Photos Int Impact Machine Design
17 pages
DM Lab Cycle 1
No ratings yet
DM Lab Cycle 1
12 pages
Project 4: Retail Analysis With Walmart Data
No ratings yet
Project 4: Retail Analysis With Walmart Data
12 pages
Documentpython 2
No ratings yet
Documentpython 2
22 pages
Lab File
No ratings yet
Lab File
96 pages
Retail Analysis With Walmart Data
100% (10)
Retail Analysis With Walmart Data
2 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Code Feature
No ratings yet
Code Feature
7 pages
Customer Marketing Analysis 1738244935
No ratings yet
Customer Marketing Analysis 1738244935
42 pages
Khalid Mixed Methods Research Workshop
100% (2)
Khalid Mixed Methods Research Workshop
32 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Retail Analysis Walmart
No ratings yet
Retail Analysis Walmart
18 pages
Practicals
No ratings yet
Practicals
42 pages
ePO 4.5 and 4.6 Server Backup and Disaster Recovery Procedure
No ratings yet
ePO 4.5 and 4.6 Server Backup and Disaster Recovery Procedure
8 pages
Lab Manual 4
No ratings yet
Lab Manual 4
23 pages
CMDB Data Foundation Dashboard 1726921117
No ratings yet
CMDB Data Foundation Dashboard 1726921117
7 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
PRJ Sales Forecasting
No ratings yet
PRJ Sales Forecasting
22 pages
Svy2001 Prac1
No ratings yet
Svy2001 Prac1
14 pages
Dataframe
No ratings yet
Dataframe
19 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
Steps On How To Perform Task in Microsoft Access
No ratings yet
Steps On How To Perform Task in Microsoft Access
2 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
Pandas Notes
No ratings yet
Pandas Notes
8 pages
What Is Association Rule Learning: 7. Implement Association Algorithms For Supervised Classification On Any Dataset
No ratings yet
What Is Association Rule Learning: 7. Implement Association Algorithms For Supervised Classification On Any Dataset
18 pages
Practical File Class 12 2025-26
No ratings yet
Practical File Class 12 2025-26
19 pages
Task 6
No ratings yet
Task 6
14 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Muruganantham CV
No ratings yet
Muruganantham CV
3 pages
Wa0002.
No ratings yet
Wa0002.
4 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Introduction To Clinical Protocol
No ratings yet
Introduction To Clinical Protocol
42 pages
Parenting of Generation Z With Generation Alpha Children: A Case Study
No ratings yet
Parenting of Generation Z With Generation Alpha Children: A Case Study
26 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
KARTHI3316
No ratings yet
KARTHI3316
29 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
IBM Storage For IBM Z Level 1 Quiz - Attempt Review
100% (1)
IBM Storage For IBM Z Level 1 Quiz - Attempt Review
15 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
R Programming
No ratings yet
R Programming
11 pages
Guides
No ratings yet
Guides
23 pages
Database Model and Database Types (12M)
No ratings yet
Database Model and Database Types (12M)
81 pages
Ground Water Quality Mapping: Methodology Manual
No ratings yet
Ground Water Quality Mapping: Methodology Manual
90 pages
Varsha .... Internship
No ratings yet
Varsha .... Internship
10 pages
IP Project Final
No ratings yet
IP Project Final
9 pages
5-2a Dataframes Column Operations - Instruction
No ratings yet
5-2a Dataframes Column Operations - Instruction
2 pages
Walmart (Project)
No ratings yet
Walmart (Project)
46 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
UNIT 5 Scenario
No ratings yet
UNIT 5 Scenario
5 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Solution
No ratings yet
Solution
4 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
No ratings yet
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
8 pages
Practice Questions2
No ratings yet
Practice Questions2
2 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
Walmart Sales Prediction
No ratings yet
Walmart Sales Prediction
21 pages
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
From Everand
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
Abdelfattah Ragab
No ratings yet

Python For Business Decision Making Asm2

Uploaded by

Python For Business Decision Making Asm2

Uploaded by

Python_for_Business_Decision_Making_Asm2

[ ]: from google.colab import drive

Drive already mounted at /content/drive; to attempt to forcibly remount, call

[ ]: from google.colab import drive

Drive already mounted at /content/drive; to attempt to forcibly remount, call

[ ]: stores = pd.read_csv('/content/drive/MyDrive/Python for Da/Burger_store.csv')

[ ]: # change 'date' from object type to daytime

[ ]: date price qty item holiday is_weekend is_schoolbreak \

[608 rows x 8 columns]

[ ]: # convert the date column to a datetime object

# create a new column with the month and day information

[ ]: # convert the data type of the month_year column to datetime

# filter the records from January 2014 to September 2015

# combine the results into a single DataFrame

# display the result

price qty total_sales

0.1 1.b. Clean data:

Number of missing values in each column:

Number of duplicated records: 0

[ ]: # Checking for outlier data

[ ]: # Calculate IQR for 'price' variable

# Print number of outliers

# Create box plot

Number of outliers for 'price': 0

# Print number of outliers

# Create box plot

Number of outliers for 'qty': 0

# Print number of outliers

# Create box plot

Number of outliers for 'total_sales': 0

# print sum of all outliers

Sum of all outliers: 0

1 1.c. Calculate descriptive statistics:

# display the result

price qty total_sales

[ ]: for column in cat_cols:

[ ]: for column in con_cols:

[ ]: from EDA_funcs import *

[ ]: for con in con_cols:

Univariate analysis price

Bi-variable analysis price and total_sales

Bi-variable analysis qty and total_sales

Bi-variable analysis total_sales and total_sales

[ ]: ['holiday', 'is_weekend', 'is_schoolbreak']

[ ]: d_melt = stores[['holiday', 'is_weekend','is_schoolbreak', 'total_sales']]

[ ]: holiday is_weekend is_schoolbreak total_sales

# create the linear regression model

# print the ANOVA table

You might also like