0% found this document useful (0 votes)
9 views

Masterclass Data Analysis.ipynb - Colab

Data science

Uploaded by

gloria.n.lorna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Masterclass Data Analysis.ipynb - Colab

Data science

Uploaded by

gloria.n.lorna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1/17/25, 7:39 PM Masterclass Data Analysis.

ipynb - Colab

1+1

2+3

Importation of Library

import pandas as pd
import numpy as np

df=pd.read_csv("/content/supermarket_sales - Sheet1 (2) (1).csv")

df

g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen

750-67- Health and


0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76
8428 beauty

226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76
3081 accessories

631-41- Home and Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76
3108 lifestyle card

123-19- Health and


3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76
1176 beauty

373-73- Sports and


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76
7910 travel

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

233-67- Health and


995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty

303-96- Home and


996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle

727-02- Food and


997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages

347-56- Home and


998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle

849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories

1000 rows × 17 columns

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.head()

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 1/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab

gros
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margi
ID type line price
percentag

750-67- Health and


0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76190
8428 beauty

226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76190
3081 accessories

631-41- Home and Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76190
3108 lifestyle card

123-19- Health and


3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76190
1176 beauty

373-73- Sports and


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76190
7910 travel

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.tail()

g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen

233-67- Health and


995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty

303-96- Home and


996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle

727-02- Food and


997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages

347-56- Home and


998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle

849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories

df.shape

(1000, 17)

df.describe()

Unit price Quantity Tax 5% Total cogs gross margin percentage gross income Rating

count 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000 1.000000e+03 1000.000000 1000.00000

mean 55.672130 5.510000 15.379369 322.966749 307.58738 4.761905e+00 15.379369 6.97270

std 26.494628 2.923431 11.708825 245.885335 234.17651 6.131498e-14 11.708825 1.71858

min 10.080000 1.000000 0.508500 10.678500 10.17000 4.761905e+00 0.508500 4.00000

25% 32.875000 3.000000 5.924875 124.422375 118.49750 4.761905e+00 5.924875 5.50000

50% 55.230000 5.000000 12.088000 253.848000 241.76000 4.761905e+00 12.088000 7.00000

75% 77.935000 8.000000 22.445250 471.350250 448.90500 4.761905e+00 22.445250 8.50000

max 99 960000 10 000000 49 650000 1042 650000 993 00000 4 761905e+00 49 650000 10 00000

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Invoice ID 1000 non-null object
1 Branch 1000 non-null object
2 City 1000 non-null object
3 Customer type 1000 non-null object
4 Gender 1000 non-null object

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 2/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab
5 Product line 1000 non-null object
6 Unit price 1000 non-null float64
7 Quantity 1000 non-null int64
8 Tax 5% 1000 non-null float64
9 Total 1000 non-null float64
10 Date 1000 non-null object
11 Time 1000 non-null object
12 Payment 1000 non-null object
13 cogs 1000 non-null float64
14 gross margin percentage 1000 non-null float64
15 gross income 1000 non-null float64
16 Rating 1000 non-null float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB

Exploratory Data Analysis

Which is the Busiest Branch?

df['Branch'].value_counts()

count

Branch

A 340

B 332

C 328

The busiest Branch is Branch A with count 340 followed by Branch B and Then Branch C at 328

df['Gender'].value_counts()

count

Gender

Female 501

Male 499

df['Product line'].unique()

array(['Health and beauty', 'Electronic accessories',


'Home and lifestyle', 'Sports and travel', 'Food and beverages',
'Fashion accessories'], dtype=object)

import seaborn as sns


sns.countplot(data=df,x='Product line')

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 3/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab

<Axes: xlabel='Product line', ylabel='count'>

print("The number of the Product line is :",df['Product line'].nunique())

The number of the Product line is : 6

Which had the highest gross income

import datetime

df['Date'][0]

df['Date']= pd.to_datetime(df['Date'])

df['Date'][0]

Timestamp('2019-01-05 00:00:00')

df['Month']=df['Date'].dt.month

df.head()

gross
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margin
ID type line price
percentage

750-67- Health and 2019-


0 A Yangon Member Female 74.69 7 26.1415 548.9715 13:08 Ewallet 522.83 4.761905 2
8428 beauty 01-05

226-31- Electronic 2019-


1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 10:29 Cash 76.40 4.761905
3081 accessories 03-08

631-41- Home and 2019- Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 13:23 324.31 4.761905 1
3108 lifestyle 03-03 card

123-19- Health and 2019-


3 A Yangon Member Male 58.22 8 23.2880 489.0480 20:33 Ewallet 465.76 4.761905 2
1176 beauty 01-27

373-73- Sports and 2019-


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 10:37 Ewallet 604.17 4.761905 3
7910 travel 02-08

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.groupby('Month')['gross income'].sum()

gross income

Month

1 5537.708

2 4629.494

3 5212.167

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 4/4

You might also like