0% found this document useful (0 votes)
19 views

Masterclass Data Analysis.ipynb - Colab

Data science

Uploaded by

gloria.n.lorna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Masterclass Data Analysis.ipynb - Colab

Data science

Uploaded by

gloria.n.lorna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1/17/25, 7:39 PM Masterclass Data Analysis.

ipynb - Colab

1+1

2+3

Importation of Library

import pandas as pd
import numpy as np

df=pd.read_csv("/content/supermarket_sales - Sheet1 (2) (1).csv")

df

g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen

750-67- Health and


0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76
8428 beauty

226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76
3081 accessories

631-41- Home and Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76
3108 lifestyle card

123-19- Health and


3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76
1176 beauty

373-73- Sports and


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76
7910 travel

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

233-67- Health and


995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty

303-96- Home and


996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle

727-02- Food and


997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages

347-56- Home and


998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle

849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories

1000 rows × 17 columns

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.head()

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 1/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab

gros
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margi
ID type line price
percentag

750-67- Health and


0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76190
8428 beauty

226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76190
3081 accessories

631-41- Home and Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76190
3108 lifestyle card

123-19- Health and


3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76190
1176 beauty

373-73- Sports and


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76190
7910 travel

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.tail()

g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen

233-67- Health and


995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty

303-96- Home and


996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle

727-02- Food and


997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages

347-56- Home and


998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle

849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories

df.shape

(1000, 17)

df.describe()

Unit price Quantity Tax 5% Total cogs gross margin percentage gross income Rating

count 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000 1.000000e+03 1000.000000 1000.00000

mean 55.672130 5.510000 15.379369 322.966749 307.58738 4.761905e+00 15.379369 6.97270

std 26.494628 2.923431 11.708825 245.885335 234.17651 6.131498e-14 11.708825 1.71858

min 10.080000 1.000000 0.508500 10.678500 10.17000 4.761905e+00 0.508500 4.00000

25% 32.875000 3.000000 5.924875 124.422375 118.49750 4.761905e+00 5.924875 5.50000

50% 55.230000 5.000000 12.088000 253.848000 241.76000 4.761905e+00 12.088000 7.00000

75% 77.935000 8.000000 22.445250 471.350250 448.90500 4.761905e+00 22.445250 8.50000

max 99 960000 10 000000 49 650000 1042 650000 993 00000 4 761905e+00 49 650000 10 00000

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Invoice ID 1000 non-null object
1 Branch 1000 non-null object
2 City 1000 non-null object
3 Customer type 1000 non-null object
4 Gender 1000 non-null object

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 2/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab
5 Product line 1000 non-null object
6 Unit price 1000 non-null float64
7 Quantity 1000 non-null int64
8 Tax 5% 1000 non-null float64
9 Total 1000 non-null float64
10 Date 1000 non-null object
11 Time 1000 non-null object
12 Payment 1000 non-null object
13 cogs 1000 non-null float64
14 gross margin percentage 1000 non-null float64
15 gross income 1000 non-null float64
16 Rating 1000 non-null float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB

Exploratory Data Analysis

Which is the Busiest Branch?

df['Branch'].value_counts()

count

Branch

A 340

B 332

C 328

The busiest Branch is Branch A with count 340 followed by Branch B and Then Branch C at 328

df['Gender'].value_counts()

count

Gender

Female 501

Male 499

df['Product line'].unique()

array(['Health and beauty', 'Electronic accessories',


'Home and lifestyle', 'Sports and travel', 'Food and beverages',
'Fashion accessories'], dtype=object)

import seaborn as sns


sns.countplot(data=df,x='Product line')

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 3/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab

<Axes: xlabel='Product line', ylabel='count'>

print("The number of the Product line is :",df['Product line'].nunique())

The number of the Product line is : 6

Which had the highest gross income

import datetime

df['Date'][0]

df['Date']= pd.to_datetime(df['Date'])

df['Date'][0]

Timestamp('2019-01-05 00:00:00')

df['Month']=df['Date'].dt.month

df.head()

gross
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margin
ID type line price
percentage

750-67- Health and 2019-


0 A Yangon Member Female 74.69 7 26.1415 548.9715 13:08 Ewallet 522.83 4.761905 2
8428 beauty 01-05

226-31- Electronic 2019-


1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 10:29 Cash 76.40 4.761905
3081 accessories 03-08

631-41- Home and 2019- Credit


2 A Yangon Normal Male 46.33 7 16.2155 340.5255 13:23 324.31 4.761905 1
3108 lifestyle 03-03 card

123-19- Health and 2019-


3 A Yangon Member Male 58.22 8 23.2880 489.0480 20:33 Ewallet 465.76 4.761905 2
1176 beauty 01-27

373-73- Sports and 2019-


4 A Yangon Normal Male 86.31 7 30.2085 634.3785 10:37 Ewallet 604.17 4.761905 3
7910 travel 02-08

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.groupby('Month')['gross income'].sum()

gross income

Month

1 5537.708

2 4629.494

3 5212.167

https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 4/4

You might also like