0% found this document useful (0 votes)

145 views48 pages

Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022

The document analyzes wholesale customer data from 440 customers across different regions and channels. It finds that customers from the 'Other' region spent the most total, over $106 million. Hotels spent more total than retailers at around $80 million vs $66 million. When broken down by region and channel, hotels in the 'Other' region spent the most total.

Uploaded by

ravikgovindu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views48 pages

Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022

Uploaded by

ravikgovindu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Python Project Submission by - Ravikanth Govindu

STATISTICS FOR DECISION MAKING

Due Date: 27th Mar 2022

Import all Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set( color_codes =True)
m% atplotlib inline
from warnings import filterwarnings
filterawrnings( "ignore")
pd.set_option( 'display.float_format', lambda x: '%.2f' % x)
import scipy.stats as stats
import matplotlib.pyplot as plt
from scipy.stats import ttest_lsamp, ttest_rel, ttest_ind, chi2_contingency, shapiro

Problem 1 - Wholesale Customers Analysis

Load the Wholesale Customer Analysis Dataset

In [2]:
dfl = pd. read_csv( "Wholesale+Customers+Data.csv")

In [7]:
dfl.head()

Out[7]: Buyer/Spender Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicatessen

0 Retail Other 12669 9656 7561 214 2674 1338

1 2 Retail Other 7057 9810 9568 1762 3293 1776

Buyer/Spender Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicatessen

2 3 Retail Other 6353 8808 7684 2405 3516 7844

3 4 Hotel Other 13265 1196 4221 6404 507 1788

4 5 Retail Other 22615 5410 7198 3915 1777 5185

In [8]: dfl.tail()

Out[8]: Buyer/Spender Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicatessen

435 436 Hotel Other 29703 12051 16027 13135 182 2204
436 437 Hotel Other 39228 1431 764 4510 93 2346

437 438 Retail Other 14531 15488 30243 437 14841 1867

438 439 Hotel Other 10290 1981 2232 1038 168 2125
439 440 Hotel Other 2787 1698 2510 65 477 52

In [10]:
df1.shape

(440, 9)
Out[10]:

In [11]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Rangelndex: 440 entries, 0 to 439
Data columns (total 9 columns):
# Column Non-Null Count Dtype

0 Buyer/Spender 440 non-null int64

1 Channel 440 non-null object
2 Region 440 non-null object
3 Fresh 440 non-null int64
4 Milk 440 non-null int64
5 Grocery 440 non-null int64
6 Frozen 440 non-null int64
7 Detergents_Paper 440 non-null int64
8 Delicatessen 440 non-null int64
dtypes: int64(7), object(2)
memory usage: 31.1+ KB

In [13]:
df l. de s c r i be ( i nc l ude = " all " )

Out[13]: Buyer/Spender Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicatessen

count 440.00 440 440 440.00 440.00 440.00 440.00 440.00 440.00

unique NaN 2 3 NaN NaN NaN NaN NaN NaN

top NaN Hotel Other NaN NaN NaN NaN NaN NaN

freq NaN 298 316 NaN NaN NaN NaN NaN NaN

mean 220.50 NaN NaN 12000.30 5796.27 7951.28 3071.93 2881.49 1524.87

std 127.16 NaN NaN 12647.33 7380.38 9503.16 4854.67 4767.85 2820.11

min 1.00 NaN NaN 3.00 55.00 3.00 25.00 3.00 3.00

25% 110.75 NaN NaN 3127.75 1533.00 2153.00 742.25 256.75 408.25

50% 220.50 NaN NaN 8504.00 3627.00 4755.50 1526.00 816.50 965.50

75% 330.25 NaN NaN 16933.75 7190.25 10655.75 3554.25 3922.00 1820.25

max 440.00 NaN NaN 112151.00 73498.00 92780.00 60869.00 40827.00 47943.00

In [16]:
df l . i s nul l ( ) . s mu ( )

Buyer/Spender 0
Out[16]: Channel 0
Region 0
Fresh 0
Milk 0
Grocery 0
Frozen 0
Detergents_Paper 0
Delicatessen 0
dtype: int64
In [18]: df1.columns

Out[18]: Index(['Buyer/Spender', 'Channel', 'Region', 'Fresh', 'Milk', 'Grocery',

'Frozen', 'Detergents_Paper', 'Delicatessen'],
dtype='object')
In [
3]: # To findout the total spending in each of the record, Lets create one more column which is summation of all the six variables
df1["Total"]=df1["Fresh"]+df1["Milk"]+df1["Grocery"]+df1["Frozen"]+df1["Detergents_Paper"]+df1["Delicatessen"]

In [ 4]: df1.groupby( "Region")[ "Total"].sum() . sort_values( ascending =Fa lse)

Out[4]: Region
Other 10677599
Lisbon 2386813
Oporto 1555088
Name: Total, dtype: int64

In [51]:
plt.figure(figsize=( 10,5));
x=df1.groupby( 'Region').agg({'Total':sum}) . sort_values(
by="Total",ascending=False) sns.barplot( x='Region" , y='Total',data
=x.reset_index(),ci=False)

Out [ 5l ]: <AxesSubplot:xlabel='Region', ylabel='Total'>

1e 7

1.0

0.8

]i 0.6
i2
0.4

0..2

0.0

ether Lisbon Cporlo

R e g ion

In [52]:
dfl.groupby( "Channel")["Total"].sum() .sort_value(sascending =False)

Out[52]: Channel
Hotel 7999569
Retail 6619931
Name: Total, dtype: int64

In [53]:
plt.figure(figsize=( 10,5));
x=dfl.groupby( 'Channel',) agg({'Total':sum}) .sort_values( by="Total",ascending=False)
sns.barplot( x='Channel',y='Total',data =x.reset_index() ,ci=Fa

lse) Out [ 53]: <AxesSubplot:xlabel='Channel', ylabel='Total'>

1e6
8

0
Hote l Reta il
O umn el

In [12]:
dfl.groupby([ "Region","Channel"])["Total"].sum() . sort_values( ascend ing=Fa lse)

Region Channel
Out[12]: Other Hotel 5742077
Retail 4935522
Lisbon Hotel 1538342
Retail 848471
Oporto Retail 835938
Hotel 719150

Name: Total, dtype: int64

In [13]:
dfl.groupby([ "Region","Channel"]).sum()

Out[13]: Buyer/Spender Fresh Milk Grocery Frozen Detergents_Paper Delicatessen Total

Region Channel

Lisbon Hotel 14026 761233 228342 237542 184512 56081 70632 1538342
Retail 4069 93600 194112 332495 46514 148055 33695 848471
Buyer/Spender Fresh Milk Grocery Frozen Detergents_Paper Delicatessen Total

Region Channel

Oporto Hotel 8988 326215 64519 123074 160861 13516 30965 719150

Retail 5911 138506 174625 310200 29271 159795 23541 835938

Other Hotel 48020 2928269 735753 820101 771606 165990 320358 5742077

Retail 16006 1032308 1153006 1675150 158886 724420 191752 4935522

In [65]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Fresh", hue ="Region", ki nd ="bar " , ci =None , data=dfl)
plt.xlabel( "Channel", size=l5)
plt.ylabel( "Total Spendin g",
size=l5) plt.title('Item - Fr
esh',size=18)

Out [ 6 5]: Text(0.5, 1.0, 'Item - Fresh')

<Figure size 864x504 with 0 Axes>

llt em - Fresh
14000

1200
0

Ol 1000
,5 0
"IOI
C: 8000 Region
,:])
Cl.
- aher
00
- Usbori
6000
- Cporot

4000

2000

0
Retai
Hotel
l
Channe
l

In [67]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Fresh", kind =" bar" , ci =None , data=dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Fr esh',size=18)

Out [ 6l ]: Text(0.5, 1.0, 'Item - Fresh')

<Figure size 864x504 with 0 Axes>
Uem - Fresh
14000

12000

10000
Ol
,5
"IOI
C: 0000
,:])
Cl.
00
6000

4000

2000

0
Retail Hatel
Cllan11el

In [68]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="Fre sh", kind =" bar ", ci =None, data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Fr esh',size=18)

Out [ 6B ]: Text(0.5, 1.0, 'Item - Fresh')

<Figure size 864x504 with 0 Axes>
Uem - Fresh

·12000

10000

Ol
,5 0000
"IOI
C:
,:])
Cl.
00 6000

4000

2000

0
a11er Lisbofl Q:Jorto-
Region

In [69]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Milk", hue ="Region", kind="bar",ci=None, data =dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Milk',size=18)

Out [ 69 ]: Text(0.5, 1.0, 'Item - Milk')

<Figure size 864x504 with 0 Axes>
Item - Milk

10000

--
0000
Ol
,5
"IOI

C: Reg ion

-
,:]) 61]00
Cl. aher
00 Usbori
Cporto
4000

2ll00

0
Retail Hotel
Channel

In [70]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Milk", kind="bar",ci=None, data=dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Milk',size=18)

Out[?e]: Text(0.5, 1.0, 'Item - Milk')

<Figure size 864x504 with 0 Axes>
Item - Milk

10000

0000
Ol
,5
"IOI
C:
,:]) 0000
Cl.
00

4000

0
Retail Hotel
Channel

In [71]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="Milk", kind ="bar",ci=None, data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Milk',size=18)

Out[?l]: Text(0.5, 1.0, 'Item - Milk')

<Figure size 864x504 with 0 Axes>
Uem - Milk
6000

0000

Ol 4000
,5
"IOI
C:
,:])

3000

2!)00

1000

0
ether Lisbon
orlo
Region

In [72]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Grocery", hue ="Region", kind =" bar ", ci =None,
data=dfl) plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Grocery',size=18)

Out [ l 2]: Text(0.5, 1.0, 'Item - Grocery')

<Figure size 864x504 with 0 Axes>
Item - Grocery

17500

15000

. 12500
"IOI
C: Reg ion
:g_ 10000
- aher
00
- Usbori
7500 - Cporot

2500

0
Retail Hotel
Channel

In [73]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Grocery", kind ="bar ", ci = None, data=dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Grocery',size=18)

Out [ ? 3]: Text(0.5, 1.0, 'Item - Grocery')

<Figure size 864x504 with 0 Axes>
tern - Grocery
16000

14000

12000
Ol
C:
10000
C:
,:])
Cl.
00 MOO

6000

4000

2000

0
Retail Hote l
Channel

In [74]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="Grocery" , kind ="bar",ci=None, data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Grocery',size=18)

Out [ ? 4]: Text(0.5, 1.0, 'Item - Grocery')

<Figure size 864x504 with 0 Axes>
Item - Grocery

8000

Ol
,5 6000
"IOI
C:
,:])
Cl.
00
4000

2il00

0
ether Lisbon
orlo
Region

In [75]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Frozen", hue ="Region", kind =" bar ", ci =None, data=dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Frozen',size=18)

Out [ ? 5]: Text(0.5, 1.0, 'Item - Frozen')

<Figure size 864x504 with 0 Axes>
Item - Frozen
6000

.'iOOO

Ol 4000
,5
"O
C: Reg ion
,:])
Cl. 3000 - other
00
- Lisbon
- or to
2000

1000

0
Retail Hotel
Channel

In [76]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Frozen", kind =" bar ", ci =None, data=dfl)
plt.xlabel( "Channel", size=l5)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Frozen',size=18)

Out [ l 6]: Text(0.5, 1.0, 'Item - Frozen')

<Figure size 864x504 with 0 Axes>
[It em - Frozen

3500

3000

O l
, 5
2500
"IOI
C:
:g_ 2iJOO
00

1500

1000

500

0 Retail Hotel
Channel

In [77]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="Frozen", kind =" bar ", ci =None , data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Frozen',size=18)

Out[n]: Text(0.5, 1.0, 'Item - Frozen')

<Figure size 864x504 with 0 Axes>
[It em - Frozen
4000

3300

3000
Ol
C:
2500
C:
,:])
Cl.
00 2!:100

1 50 0

1000

000

0
ether Lisbon
orlo
Region

In [78]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Detergents_Paper", hue ="Region", kind ="bar" ,ci =None,
data=dfl) plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Detergents_Paper' ,size=18)

Out[?BJ: Text(0.5, 1.0, 'Item - Detergents_Paper')

<Figure size 864x504 with 0 Axes>
ll tem- Detergents_Paper

8000

moo

6000
Ol
,5
5000 Reg ion
,:])
Cl. - Oher
OO 4000 - Lisbon
- orlo
3000

21)00

• -
Ch a
1000

0
Retail Hotel

n11 1e l

In [79]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Detergents_Paper", kind =" bar ", ci = None , data =dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Detergents_Paper' ,size=18)

Out[?gJ: Text(0.5, 1.0, 'Item - Detergents_Paper')

<Figure size 864x504 with 0 Axes>
Item - Detergents_Paper
moo

6000

Ol 0000
,5
"IOI
C:
© 4000
Cl.
00
3000

2000

100 0

0
Retail Hotel
Channel

In [80]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="Detergent s_Paper" , kind =" bar ", ci = None, data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Detergents_Paper' ,size=18)

Out [ 80]: Text(0.5, 1.0, 'Item - Detergents_Paper')

<Figure size 864x504 with 0 Axes>

I te m - Detergents_Paper

3500

3000

Ol 2500
,5
"IOI
C:
:g_ 2!)00
00
1500

1000

000

0
ether Lisbon
orlo
Region

In [81]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Delicatessen", hue ="Regi on", kind =" bar ", ci =None, data=dfl)
plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Delicatessen' ,size=18)

Out [ 8l ]: Text(0.5, 1.0, 'Item - Delicatessen')

<Figure size 864x504 with 0 Axes>
nern - Delicatessen

1750

1500

. 1 25 0
"IOI
C: Reg ion
:g_ 1 00 0
- Oher
00
- Lisbon
- orlo
750

500

250

0
Retail Hotel
Ch a n111e l

In [82]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Channel", y="Delicatessen", kind ="bar" , ci = None,
data=dfl) plt.xlabel( "Channel", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Delicatessen' ,size=18)

Out [ 82]: Text(0.5, 1.0, 'Item - Delicatessen')

<Figure size 864x504 with 0 Axes>
tern - Delicatesse11
1750

150
0

Ol
1250
,5
"IOI
53 1000
Cl.
00
750

250

0
Retail Hotel
Channel

In [83]:
plt.figure(figsize=( 12,7));
sns.catplot( x="Region", y="De licatessen", kind ="bar",ci =None, data=dfl)
plt.xlabel( "Region", size=15)
plt.ylabel( "Total Spend ing", size=15)
plt.title('Item - Delicatessen' ,size=18)

Out [ 83]: Text(0.5, 1.0, 'Item - Delicatessen')

<Figure size 864x504 with 0 Axes>
#to find variability, first we need to
calculate Standard deviation of all the items
dfl.std() . sort_values(ascending=Fa lse)

#Lets find out Coefficient of Variation of all the items

prin t("Coefficient of Variation of Fresh Item is '',dfl["Fr esh"J . std() / dfl["Fr esh"J . mean()) prin t("Coeffi
prin t("Coefficient of Variation of Detergents_Paper Item is ",dfl["Detergents_Paper"].std() /
dfl["Detergents_Paper"].mean()) prin t("Coefficient of Variation of Delicatessen Item is ",dfl["Delicatessen"].std() / dfl["De
licatessen"]. mean())

Coefficient of Variation of Fresh Item is 1.0539179237473149

Coefficient of Variation of Grocery Item is 1.1951743730016824
Coefficient of Variation of Milk Item is 1.2732985840065414
Coefficient of Variation of Frozen Item is 1.5803323836352914
Coefficient of Variation of Detergents_Paper Item is 1.6546471385005155
Coefficient of Variation of Delicatessen Item is 1.8494068981158382

In [49]:
plt.figure(figsize=( 20,20))
sns.boxplot( data=dfl.dr op([ "Total","Buyer/Spender" ],axis =l));
plt.title('Box plots for all the items',size=18)

Out [ 49 ]: Text(0.5, 1.0, 'Box plots for all the items')

Box plots for all the

items

100000

aoooo

♦
♦

♦
♦
♦
60000 ♦
♦ ♦
♦
♦

♦
♦ ♦
♦ ♦ ♦
♦ ♦

40000 •I ♦
♦

♦ ♦
♦ ♦ ♦
♦ ♦
♦
♦

•i •
♦

'• •
♦
♦

I
20000

• I ♦

I ♦

Fresh flAlk Grocery Frozen Detergenls_Paper Delicatessen

In [92]:
dfl.mean() .sort_values( ascending=False)

Total 33226.14
Out[92]:
Fresh 12000.30
Grocery 7951.28
Milk 5796.27
Frozen 3071.93
Detergents_Paper 2881.49
Delicatessen 1524.87
Buyer/Spender 220.50
dtype: float64

In [94]:
dfl.describe().T
Out [94]: count mean std min 25% 50% 75% max

Buyer/Spender 440.00 220.50 127.16 1.00 110.75 220.50 330.25 440.00

Fresh 440.00 12000.30 12647.33 3.00 3127.75 8504.00 16933.75 112151.00

Milk 440.00 5796.27 7380.38 55.00 1533.00 3627.00 7190.25 73498.00

Grocery 440.00 7951.28 9503.16 3.00 2153.00 4755.50 10655.75 92780.00

Frozen 440.00 3071.93 4854.67 25.00 742.25 1526.00 3554.25 60869.00

Detergents_Paper 440.00 2881.49 4767.85 3.00 256.75 816.50 3922.00 40827.00

Delicatessen 440.00 1524.87 2820.11 3.00 408.25 965.50 1820.25 47943.00

Total 440.00 33226.14 26356.30 904.00 17448.75 27492.00 41307.50 199891.00

Problem 2 - CMSU Student Survey

Load the Survey Dataset

In [16]:
df2=pd. r ead_c sv ( " Sur vey - 1 . c sv " )

In [17]:
df2. head ( )

Out[17]: Grad Social Text

ID Gender Age Class Major GPA Employment Salary Satisfaction Spending Computer
Intention Networking Messages

0 Female 20 Junior Other Yes 2.90 Full-Time 50.00 3 350 Laptop 200

1 2 Male 23 Senior Management Yes 3.60 Part-Time 25.00 4 360 Laptop 50

2 3 Male 21 Junior Other Yes 2.50 Part-Time 45.00 2 4 600 Laptop 200

3 4 Male 21 Junior CIS Yes 2.50 Full-Time 40.00 4 6 600 Laptop 250

4 5 Male 23 Senior Other Undecided 2.80 Unemployed 40.00 2 4 500 Laptop 100

In [102...
df2.tail()

Out[102... Grad Social Text

ID Gender Age Class Major GPA Employment Salary Satisfaction Spending Computer
Intention Networking Messages
International
57 58 Female 21 Senior No 2.40 Part-Time 40.00 3 1000 Laptop 10
Business
58 59 Female 20 Junior CIS No 2.90 Part-Time 40.00 2 4 350 Laptop 250

59 60 Female 20 Sophomore CIS No 2.50 Part-Time 55.00 4 500 Laptop 500

60 61 Female 23 Senior Accounting Yes 3.50 Part-Time 30.00 2 3 490 Laptop 50

61 62 Female 23 Senior Economics/Finance No 3.20 Part-Time 70.00 2 3 250 Laptop 0

In [103. . .
df2.shape

(62, 14)
Out[103...

In [104...
df2.info()

<class 'pandas.core.frame.DataFrame'>
Rangelndex: 62 entries, 0 to 61
Data columns (total 14 columns):
# Column Non-Null Count Dtype

0 ID 62 non-null int64
1 Gender 62 non-null object
2 Age 62 non-null int64
3 Class 62 non-null object
4 Major 62 non-null object
5 Grad Intention 62 non-null object
6 GPA 62 non-null float64
7 Employment 62 non-null object
8 Salary 62 non-null float64
9 Social Networking 62 non-null int64
10 Satisfaction 62 non-null int64
11 Spending 62 non-null int64
12 Computer 62 non-null object
13 Text Messages 62 non-null int64
dtypes: float64(2), int64(6), object(6)
memory usage: 6.9+ KB

In [105...
df2.isnull().sum()

ID 0
Out[105...
Gender 0
Age 0
Class 0
Major 0
Grad Intention 0
GPA 0
Employment 0
Salary 0
Social Networking 0
Satisfaction 0
Spending 0
Computer 0
Text Messages 0
dtype: int64

In [111........
df2.describe( include="all").T

Out[111... count unique top freq mean std min 25% 50% 75% max

ID 62.00 NaN NaN NaN 31.50 18.04 1.00 16.25 31.50 46.75 62.00
Gender 62 2 Female 33 NaN NaN NaN NaN NaN NaN NaN

Age 62.00 NaN NaN NaN 21.13 1.43 18.00 20.00 21.00 22.00 26.00
Class 62 3 Senior 31 NaN NaN NaN NaN NaN NaN NaN

Major 62 8 Retailing/Marketing 14 NaN NaN NaN NaN NaN NaN NaN

Grad Intention 62 3 Yes 28 NaN NaN NaN NaN NaN NaN NaN
GPA 62.00 NaN NaN NaN 3.13 0.38 2.30 2.90 3.15 3.40 3.90
Employment 62 3 Part-Time 43 NaN NaN NaN NaN NaN NaN NaN
Salary 62.00 NaN NaN NaN 48.55 12.08 25.00 40.00 50.00 55.00 80.00

Social Networking 62.00 NaN NaN NaN 1.52 0.84 0.00 1.00 1.00 2.00 4.00
count unique top freq mean std min 25% 50% 75% max

Satisfaction 62.00 NaN NaN NaN 3.74 1.21 1.00 3.00 4.00 4.00 6.00

Spending 62.00 NaN NaN NaN 482.02 221.95 100.00 312.50 500.00 600.00 1400.00

Computer 62 3 Laptop 55 NaN NaN NaN NaN NaN NaN NaN

Text Messages 62.00 NaN NaN NaN 246.21 214.47 0.00 100.00 200.00 300.00 900.00

In [114...
df 2 . c o l umns

Index( ['ID', 'Gender', 'Age', 'Class', 'Major', 'Grad Intention', 'GPA',

Out[114...
'Employment', 'Salary', 'Social Networking', 'Satisfaction', 'Spending',
'Computer', 'Text Messages'],
dtype='object')

Let us construct various Contigency Tables

In [116...
# Contingency Table - Gender and Major
pd . c r o s s t a b( df 2[ "Ge nde r " J , df 2[ "Ma j o r " ] )

Out [ 116 ... Major Accounting CIS Economics/Finance International Business Management Other Retailing/Marketing Undecided

Gender

Female 3 3 7 4 4 3 9 0

Male 4 4 2 6 4 5 3

In [117...
# Contingency Table - Gender and Grad Intention
pd . c r os s t a b( df 2[ "Ge nde r " ] , df 2[ " Gr a d I nt e nt i o n" ] )

Out [ 117... Grad Intention No Undecided Yes

Gender

Female 9 13 11

Male 3 9 17
#Contingency Table - Gender and Employment
pd. crosstab( df2["Gender"],df2["Employment"])

#Contingency Table - Gender and Computer

pd. crosstab( df2["Gender"J, df2["Computer"J)

df2["Gender" J . value_ counts()

prin t("The probability that a randomly selected CMSU Student is male is", round( 29/len( df2["Gender"]),4))

prin t("The probability that a randomly selected CMSU Student is female is", round( 33/len( df2["Gender"]),4))

pd.crosstab( df2["Gender"],df2["Major"])
prin t("The conditional probability of Accounting Major among male students is ",round( 4/ 29,4))
prin t("The conditional probability of CIS Major among male students is ",round( l/29,4))
prin t("The conditional probability of Economics/Finance Major among male students is ",round( 4/29,4))
prin t("The conditional probability of International Business Major among male students is ",round(
2/29,4)) prin t("The conditional probability of Management Major among male students is ",round( 6/ 29,4))
prin t("The conditional probability of Other Major among male students is ",round( 4/ 29,4))
prin t("The conditional probability of Retailing/Marketing Major among male students is ",round( S/29,4))
prin t("The conditional probability of among male students who have not decided thier major is ",round(
3/29,4))

prin t("The conditional probability of Accounting Major among female students is ",round( 3/33,4))
prin t("The conditional probability of CIS Major among female students is ",round( 3/33,4))
prin t("The conditional probability of Economics/Finance Major among female students is ",round( 7/33,4))
prin t("The conditional probability of International Business Major among female students is ",round( 4/ 33,4))
prin t("The conditional probability of Management Major among female students is ",round( 4/33,4))
prin t("The conditional probability of Other Major among female students is ",round( 3/33,4))
prin t("The conditional probability of Retailing/Marketing Major among female students is ",round( 9/ 33,4))
prin t("The conditional probability of among female students who have not decided thier major is ",round(
0/33,4))
The conditional probability of International Business Major among female students is 0.1212
The conditional probability of Management Major among female students is 0.1212
The conditional probability of Other Major among female students is 0.0909
The conditional probability of Retailing/Marketing Major among female students is 0.2727
The conditional probability of among female students who have not decided thier major is 0.0

In [133...
pd.crosstab( df2["Gender"],df2["Grad Intention"])

Out[133... Grad Intention No Undecided Yes

Gender

Female 9 13 11

Male 3 9 17

P(Grad lnetention-Yes n Male) = P (Grad Intention-Yes! Male) x P (male)

In
prin t("The probability that a randomly chosen student is a male and intends to graduate is",17/29*29/62)
[134...
The probability that a randomly chosen student is a male and intends to graduate is 0.27419354838709675

pd. crosstab( df2["Gender"],df2["Computer"])

In
[135...

Out[135... Computer Desktop Laptop Tablet

Gender

Female 2 29 2

Male 3 26 0

[139...

In
[136...

In
P(not robability that a randomly chosen student is a female and does not have a laptop is 0.06451612903225806 pd.crosstab(
having
laptop n
Female) = df2["Gender"J, df2["Employment"])
P
(Desktop!
female) x
P (female)
+ P
(Tablet!
female) x
P (female)

prin
t(''The
probabi
lity
that a
randoml
y
chosen
student
is a
female
and
does
not
have a
laptop
is",2/3
3*33/62
+
2/33*33
/62)

p
Out[139... Employment Full-Time Part-Time Unemployed

Gender

Female 3 24 6

Male 7 19 3

In [140...
df2[ "Empl oyment"].value_counts()

Out[140.. Part-Time 43
.
Full-Time 10
Unemployed 9
Name: Employment, dtype: int64
P(Full time Employment U Male) = P(Full-Time) + P(Male) - P (Full-time Employment! male) x P (male)

In
[141...
prin t("The probability that a randomly chosen student is a male or has full-time employment is ",10/62 + 29/62 - 7/ 29*29/62)

The probability that a randomly chosen student is a male or has full-time employment is 0.5161290322580645
In
[142... pd. crosstab( df2["Gender"],df2["Major"])

Out[142... Major Accounting CIS Economics/Finance International Business Management Other Retailing/Marketing Undecided

Gender

Female 3 3 7 4 4 3 9 0

Male 4 4 2 6 4 5 3

P(Major-lnternation Business or Management n Female) = P (Major-International Business! female) + P (Management! female)

In [58]: In [173...
prin 3
t("The
conditio
nal pd.crosstab( df2["Gender"J, df2["Grad Intention"J, margins =True)
probabil
ity
that
given a
female
student
is
randomly
chosen,
she is
majoring
in
internat
ional
business
or ma

The
conditio
nal
probabil
ity
that
given a
female
student
is
randomly
chosen,
she is
majoring
in
internat
ional
business
or
manageme
n tis
0.2424
242424
242424
Out[ 173.. Grad Intention No Undecided Yes All
.
Gender

Female 9 13 11 33

Male 3 9 17 29

All 12 22 28 62

In [62]:
pd. crosstab( df2["Gender"J, df2["Grad Intention"J) . drop( "Undecided",axis=l)

Out[62]: Grad Intention No Yes

Gender

Female 9 11

Male 3 17

For 2 events to be independent, following condition is to be satified

P(A n B) = P(A) * P(B)

So, P (Grad Intention n Female) = P(Grad Intention) * P(Female)

P(Female) = 20/40 = 0.5

P(Grad Intention) = 28/40 = 0.7

P(Grad Intention) P(Female) = 0.5 0.7 = 0.35

P (Grad Intention n Female) = 11/40 = 0.275

This is not independent events as probability multiplication of both events is not equal to combined event, so being a winner and being female
candidate are not independent events.

In
[160... df2["GPA" ].mean()
Out[160... 3.129032258064516

In [161...
df2[ "GPA" ].std()

Out[161. 0.3773883926969118
..

In [162...
stats.norm. cdf(3,loc=df2["GPA"]. mean(), scale=df2["GPA"].std())

Out[162. 0.3662099174094998
..

In prin t(''The probability that if a student is randomly choosen, thier GPA is less than 3 is", stats.norm.
[163...
cdf(3,loc=df2["GPA"].mean() The probability that if a student is randomly choosen, thier GPA is less than 3 is 0.3662099174094998

df2[df2[ "Gender"]=="Male"]. head()

In
[166...

Out[166... Grad Social Text

ID Gender Age Class Major GPA Employment Salary Satisfaction Spending Computer
Intention Networking Messages

1 2 Male 23 Senior Management Yes 3.60 Part-Time 25.00 4 360 Laptop 50

2 3 Male 21 Junior Other Yes 2.50 Part-Time 4 5.00 2 4 600 Laptop 200

3 4 Male 21 Junior CIS Yes 2.50 Full-Time 40 .00 4 6 600 Laptop 250

4 5 Male 23 Senior Other Undecided 2.80 Unemployed 40.00 2 4 500 Laptop 100
11 12 Male 21 Senior Undecided No 3.50 Full-Time 37.00 2 3 500 Laptop 100

In [168. . .
df2[df2["Gender"]=="Male"][ "Salary"].std()

10.79317427068786
Out[168

In [169. . .
df2[df2[ "Gender"]=="Male"][ "Salary"]. mean()
48.275862068965516
Out[169
1-stats.norm. cdf(50,loc=df2[df2["Gender"]=="Male"]["Salary" ]. mean(), scale=df2[df2["Gender"]=="Male"]["Salary" ].std())

prin t("The probability that if a male student is randomly choosen, thier GPA is 50 or more is',' 1-stats.norm. cdf(50,

prin t("The probability that if a female student is randomly choosen, thier GPA is 50 or more is", 1-stats.norm.
cdf(50,loc=df2[df2[

# Lets plot the histograms of GPA, Salary, Spending and Text Messages
plt.figure(figsize=( 20,20))
plt.subplot( 2,2,1)
sns.distpl ot(
df2['GPA'],kde=True,color=None)
plt.subplot( 2,2,2)
sns.distplot( df2['Salary'],kde=True)
plt.subplot( 2,2,3)
sns.distpl ot( df2['Spendin g'],kde=True)
plt.subplot( 2,2,4)
sns.distpl ot( df2['Text Messages'],kde=True)
0.015

0.4

0.010

0.2
0.005

0.0 0.000
20 25 3.0 3.5 4.0 4.5 20 40 60 80 100
GPA Salary

0.0025
0.0025

0.0020
0.0020

0.0015

0.0010
0.0010

0.0005
0.0005

0.0000 0.0000
-250 0 250 500 750 1000 1250 1500 1750 -200 0 200 400 600 800 1000 1200
Spending Text Messages

In [197...
# Shapiro tests to check whether GPA, Salary, Spending and Text Messages follow normal Distribution
shapiro(df2["GPA"])
Out[197. ShapiroResult(statistic=0.9685361981391907, pvalue=0.11204058676958084)
..

In shapiro(df2["Salary"])
[199...
ShapiroResult(statistic=0.9565856456756592, pvalue=0.028000956401228905)
Out[199.
..
shapiro(df2["Spending"])

In
ShapiroResult(statistic=0.8777452111244202, pvalue=l.6854661225806922e-05)
[201...

Out[201.. shapiro(df2["Text Messages"])

.
ShapiroResult(statistic=0.8594191074371338, pvalue=4.324040673964191e-06)
In
[202... In case of Spending and Text Messages, the p-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the sample
data does not come from a normal dist ribution.

Out[202..
. Problem 3 - ABC Asphalt Shingles
Load the Dataset

In [18]:
df3=pd. read_csv( "A+&+B+shingles.csv")

In [19]:
df3.head()

Out[19]: A B

0 0.44 0.14

1 0.61 0.15

2 0.47 0.31

3 0.30 0.16
df3.shap e

df3.isnull().sum()

df3.describe()
In tstat,pvalue =stats.ttest_lsamp( df3["A"],0.35)
[212...

In #Since this is a one tail test, pvalue will be half

[214... pvalue /2

0.07477633144907513

Out[214.
..

p-value is more than 0.05. We cannot reject Null Hypothesis. We do not have enough evidence to prove the claim that mean values in A Shingles are
not less than 0.35 pounds per 100 square feet at 0.05 significance level.

For B Shingles

HO=> mu<=0.35

Ha=> mu>0.35

In
[215... tstat,pvalue =stats.ttest_lsamp( df3["B"].dr opna(), 0.35)

In #Since this is a one tail test, pvalue will be half

[216... pvalue /2

0.0020904774003191813

Out[ 21
6...

p-value is less than 0.05. We can reject Null Hypothesis. We have enough evidence to prove the claim that mean values in B Shingles are not less than
0.35 pounds per 100 square feet at 0.05 significance level

Next Question

HO=> muA = muB

In [217...
Ha = > muA is not equal to muB
In
[218...
tstat,pvalue =stats.ttest_ind( df3["A"],df3["B"].dr opna())

#Since this is a one tail test, pvalue will be full

pvalue

0.2017496571835328
Out[ 21
8...
p-value is more than 0.05. We cannot reject Null Hypothesis. We do not have enough evidence to prove that mean values in A Shingles are not equal
to mean values in B Shingles.

Assumptions:
- That both popluations of A Shingles and B Shingles follow normal distribution and that the variances of both the
normal distributions are same.

In [27]:
df3.plot( kind ="box")

<AxesSubplot:>
Out[27]:

0
0.7

0.6

0.5

0.4

0.3

0..2

0.1

A B

In [ ] :

FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Time Series Forecasting Project (Shoe Sales)
No ratings yet
Time Series Forecasting Project (Shoe Sales)
26 pages
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
22 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
TSF - Project
100% (1)
TSF - Project
5 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
PYF Project LearnerNotebook LowCode
No ratings yet
PYF Project LearnerNotebook LowCode
6 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
Wholesale Customers Data Analysis PDF
No ratings yet
Wholesale Customers Data Analysis PDF
25 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
SQL Project Questions
0% (1)
SQL Project Questions
3 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Project-Time Series Forecasting
100% (1)
Project-Time Series Forecasting
10 pages
SMDM Extended Project
No ratings yet
SMDM Extended Project
1 page
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Project - Ipynb - Colaboratory
No ratings yet
Project - Ipynb - Colaboratory
4 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
DMV - 5 - Jupyter Notebook
No ratings yet
DMV - 5 - Jupyter Notebook
5 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
SMDM Project
100% (1)
SMDM Project
22 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
VARUNSAINI - 13 Nov 2022
No ratings yet
VARUNSAINI - 13 Nov 2022
14 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
4
No ratings yet
4
5 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Hyosung 1800CE ATM Machine Owners Manual PDF
No ratings yet
Hyosung 1800CE ATM Machine Owners Manual PDF
216 pages
Engineering Emergence - Joris Dormans
No ratings yet
Engineering Emergence - Joris Dormans
302 pages
Ambiguous Grammar: Context Free Grammars (CFGS) Are Classified Based On
No ratings yet
Ambiguous Grammar: Context Free Grammars (CFGS) Are Classified Based On
3 pages
Mastering jBPM6 - Sample Chapter
No ratings yet
Mastering jBPM6 - Sample Chapter
52 pages
C Piscine: Rush 01
No ratings yet
C Piscine: Rush 01
9 pages
Study Id51495 Smart-Cities
No ratings yet
Study Id51495 Smart-Cities
66 pages
OOSD Unit 1.3
No ratings yet
OOSD Unit 1.3
27 pages
Chorus Trio Expander User Manual Rev 1.8 en 05.2022
No ratings yet
Chorus Trio Expander User Manual Rev 1.8 en 05.2022
104 pages
Advanced Excel - Waterfall Chart
No ratings yet
Advanced Excel - Waterfall Chart
8 pages
(ET) Remote Utilities (Viewer + Host) Pro 6.8.0.1 TORRENT (v6.8.0
No ratings yet
(ET) Remote Utilities (Viewer + Host) Pro 6.8.0.1 TORRENT (v6.8.0
5 pages
MS OFFICE APPLICATION-ren
No ratings yet
MS OFFICE APPLICATION-ren
17 pages
PNG Digital Transformation Policy - 21122020 - Updated
No ratings yet
PNG Digital Transformation Policy - 21122020 - Updated
52 pages
Specifications:: Specifications Product Product Name Merk / Neg - Asal Type
No ratings yet
Specifications:: Specifications Product Product Name Merk / Neg - Asal Type
4 pages
Explain Briefly The Different Building Blocks of Algorithms
No ratings yet
Explain Briefly The Different Building Blocks of Algorithms
19 pages
AEO Case Study
No ratings yet
AEO Case Study
2 pages
Data Sheet 6ES7331-7NF00-0AB0: Input Current
No ratings yet
Data Sheet 6ES7331-7NF00-0AB0: Input Current
3 pages
Lecture 3
No ratings yet
Lecture 3
23 pages
Litera 03z Week 2
No ratings yet
Litera 03z Week 2
65 pages
HP Laserjet Pro M404 Series
No ratings yet
HP Laserjet Pro M404 Series
5 pages
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
No ratings yet
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
3 pages
Character - Ai Faces Lawsuit After Teen's Suicide - The New York Times
No ratings yet
Character - Ai Faces Lawsuit After Teen's Suicide - The New York Times
10 pages
Tutorial 1 The Fairy On The Dead Tree
No ratings yet
Tutorial 1 The Fairy On The Dead Tree
4 pages
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
No ratings yet
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
48 pages
Wires
No ratings yet
Wires
4 pages
System 3
No ratings yet
System 3
86 pages
Sap Powerdesigner: Object-Oriented Model Report
No ratings yet
Sap Powerdesigner: Object-Oriented Model Report
13 pages
Quy Dinh Ve Style (Layer, Font, Dim) PDF
No ratings yet
Quy Dinh Ve Style (Layer, Font, Dim) PDF
2 pages
About HTTP Directives For CGIDEV2
No ratings yet
About HTTP Directives For CGIDEV2
3 pages
Arjun Jaggi: Mapple July 2012 - Jan 2013
No ratings yet
Arjun Jaggi: Mapple July 2012 - Jan 2013
3 pages
STC Issue
No ratings yet
STC Issue
2 pages