Masterclass Data Analysis.ipynb - Colab
Masterclass Data Analysis.ipynb - Colab
ipynb - Colab
1+1
2+3
Importation of Library
import pandas as pd
import numpy as np
df
g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen
226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76
3081 accessories
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.head()
https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 1/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab
gros
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margi
ID type line price
percentag
226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76190
3081 accessories
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.tail()
g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen
849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories
df.shape
(1000, 17)
df.describe()
Unit price Quantity Tax 5% Total cogs gross margin percentage gross income Rating
max 99 960000 10 000000 49 650000 1042 650000 993 00000 4 761905e+00 49 650000 10 00000
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Invoice ID 1000 non-null object
1 Branch 1000 non-null object
2 City 1000 non-null object
3 Customer type 1000 non-null object
4 Gender 1000 non-null object
https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 2/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab
5 Product line 1000 non-null object
6 Unit price 1000 non-null float64
7 Quantity 1000 non-null int64
8 Tax 5% 1000 non-null float64
9 Total 1000 non-null float64
10 Date 1000 non-null object
11 Time 1000 non-null object
12 Payment 1000 non-null object
13 cogs 1000 non-null float64
14 gross margin percentage 1000 non-null float64
15 gross income 1000 non-null float64
16 Rating 1000 non-null float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB
df['Branch'].value_counts()
count
Branch
A 340
B 332
C 328
The busiest Branch is Branch A with count 340 followed by Branch B and Then Branch C at 328
df['Gender'].value_counts()
count
Gender
Female 501
Male 499
df['Product line'].unique()
https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 3/4
1/17/25, 7:39 PM Masterclass Data Analysis.ipynb - Colab
import datetime
df['Date'][0]
df['Date']= pd.to_datetime(df['Date'])
df['Date'][0]
Timestamp('2019-01-05 00:00:00')
df['Month']=df['Date'].dt.month
df.head()
gross
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margin
ID type line price
percentage
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.groupby('Month')['gross income'].sum()
gross income
Month
1 5537.708
2 4629.494
3 5212.167
https://colab.research.google.com/drive/1wsFn-RQqHucuqumJi6GDEyOs7DkOg1ZY?authuser=0#scrollTo=uJbzeViq8UI3&printMode=true 4/4