0% found this document useful (0 votes)

32 views50 pages

Data Analysis Project

Uploaded by

Vamshi Krishna reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views50 pages

Data Analysis Project

Uploaded by

Vamshi Krishna reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PROJECT NO - 01 : WORLD POPULATION ANALYSIS

In [50]: #Install and Import Necessary Libraries:

pip install plotly

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: plotly in c:\users\user\appdata\local\packages\[Link].3.12_qb
z5n2kfra8p0\localcache\local-packages\python312\site-packages (5.23.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\user\appdata\local\packages\[Link]
n.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from plotly) (9.0.0)
Requirement already satisfied: packaging in c:\users\user\appdata\local\packages\[Link].3.12
_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from plotly) (23.2)
Note: you may need to restart the kernel to use updated packages.

In [1]: import pandas as pd

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

In [2]: import [Link] as px

import [Link] as sp
import plotly.graph_objects as go

In [3]: #Data Collection

df=pd.read_csv(r'C:\Users\user\Desktop/world_population.csv')

In [4]: [Link]()
Out[4]:
2022 2020 2015 2010 2000 1990
Rank Country/Territory Continent
Population Population Population Population Population Population

0 1 China Asia 1.425887e+09 1.424930e+09 1.393715e+09 1.348191e+09 1.264099e+09 1.153704e+09 9

1 2 India Asia 1.417173e+09 1.396387e+09 1.322867e+09 1.240614e+09 1.059634e+09 8.704522e+08 6

North
2 3 United States 3.382899e+08 3.359420e+08 3.246078e+08 3.111828e+08 2.823986e+08 2.480837e+08 2
America

3 4 Indonesia Asia 2.755013e+08 2.718580e+08 2.590920e+08 2.440162e+08 2.140724e+08 1.821599e+08 1

4 5 Pakistan Asia 2.358249e+08 2.271967e+08 2.109693e+08 1.944545e+08 1.543699e+08 1.154141e+08

In [18]: #Data Preprocessing

[Link]().sum()
<class '[Link]'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 234 non-null int64
1 Country/Territory 234 non-null object
2 Continent 234 non-null object
3 2022 Population 234 non-null int64
4 2020 Population 234 non-null int64
5 2015 Population 234 non-null int64
6 2010 Population 234 non-null int64
7 2000 Population 234 non-null int64
8 1990 Population 234 non-null int64
9 1980 Population 234 non-null int64
10 1970 Population 234 non-null int64
11 Area (km?) 234 non-null int64
12 Density (per km?) 234 non-null float64
13 Growth Rate 234 non-null float64
14 World Population Percentage 234 non-null float64
dtypes: float64(3), int64(10), object(2)
memory usage: 27.6+ KB

In [51]: [Link]()
<class '[Link]'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 234 non-null int64
1 Country/Territory 234 non-null object
2 Continent 234 non-null object
3 2022 Population 234 non-null float64
4 2020 Population 234 non-null float64
5 2015 Population 234 non-null float64
6 2010 Population 234 non-null float64
7 2000 Population 234 non-null float64
8 1990 Population 234 non-null float64
9 1980 Population 234 non-null float64
10 1970 Population 234 non-null float64
11 Area (km?) 234 non-null int64
12 Density (per km?) 234 non-null float64
13 Growth Rate 234 non-null float64
14 World Population Percentage 234 non-null float64
dtypes: float64(11), int64(2), object(2)
memory usage: 27.6+ KB

In [52]: #Exploratory Data Analysis (EDA)

#check coorelation of dataset
[Link](numeric_only=True)
Out[52]:
2022 2020 2015 2010 2000 1990 1980 1970 Ar
Rank
Population Population Population Population Population Population Population Population (km

Rank 1.000000 -0.358361 -0.355854 -0.351222 -0.347461 -0.341057 -0.336152 -0.335246 -0.335379 -0.3837

2022
-0.358361 1.000000 0.999946 0.999490 0.998629 0.994605 0.987228 0.980285 0.973162 0.4534
Population

2020
-0.355854 0.999946 1.000000 0.999763 0.999105 0.995583 0.988724 0.982121 0.975254 0.4549
Population

2015
-0.351222 0.999490 0.999763 1.000000 0.999783 0.997340 0.991594 0.985724 0.979414 0.4582
Population

2010
-0.347461 0.998629 0.999105 0.999783 1.000000 0.998593 0.993929 0.988786 0.983042 0.4619
Population

2000
-0.341057 0.994605 0.995583 0.997340 0.998593 1.000000 0.998336 0.995160 0.990956 0.4739
Population

1990
-0.336152 0.987228 0.988724 0.991594 0.993929 0.998336 1.000000 0.999042 0.996602 0.4867
Population

1980
-0.335246 0.980285 0.982121 0.985724 0.988786 0.995160 0.999042 1.000000 0.999194 0.4981
Population

1970
-0.335379 0.973162 0.975254 0.979414 0.983042 0.990956 0.996602 0.999194 1.000000 0.5099
Population

Area (km?) -0.383774 0.453411 0.454993 0.458240 0.461936 0.473933 0.486764 0.498166 0.509940 1.0000

Density
0.129436 -0.027618 -0.027358 -0.026857 -0.026505 -0.026139 -0.026224 -0.026587 -0.026881 -0.0631
(per km?)
2022 2020 2015 2010 2000 1990 1980 1970 Ar
Rank
Population Population Population Population Population Population Population Population (km

Growth
-0.224561 -0.020863 -0.025116 -0.032154 -0.037983 -0.050515 -0.062397 -0.072349 -0.081313 -0.0139
Rate

World
Population -0.358464 0.999999 0.999944 0.999487 0.998626 0.994598 0.987218 0.980273 0.973150 0.4532
Percentage

In [53]: #plot a Heatmap Of coorelation matrix

[Link]([Link](numeric_only=True))

Out[53]: <Axes: >

In [7]: #Based on coorelations lets plot graphs
#change the data types of columns as per requirement
df[' 2022 Population ']=df[' 2022 Population '].astype(int)
df[' 2015 Population ']=df[' 2015 Population '].astype(int)
df[' 2010 Population ']=df[' 2010 Population '].astype(int)
df[' 2000 Population ']=df[' 2000 Population '].astype(int)
df[' 1990 Population ']=df[' 1990 Population '].astype(int)
df[' 1980 Population ']=df[' 1980 Population '].astype(int)
df[' 1970 Population ']=df[' 1970 Population '].astype(int)
df[' 2020 Population ']=df[' 2020 Population '].astype(int)

In [5]: #plot graph for countries over continents

continent_wise=[Link](by='Continent')['Country/Territory'].count()

In [9]: continent_wise

Out[9]: Continent
Africa 57
Asia 50
Europe 50
North America 40
Oceania 23
South America 14
Name: Country/Territory, dtype: int64

In [16]: continents=continent_wise.index
counts=continent_wise.values
sns.set_style('whitegrid')
[Link](counts,labels=continents,shadow=True,startangle=90,colors=sns.color_palette('viridis'),autopct=lambda p:f'{in
[Link]('Continents wise Countries',fontsize=20)

[Link](True)
[Link]()

In [8]: #plot graph for population percentage over the continents

continent_wise_pop=[Link](['Continent'])[' 2022 Population '].sum()

In [9]: continent_wise_pop

Out[9]: Continent
Africa 1.426731e+09
Asia 4.721383e+09
Europe 7.431475e+08
North America 6.002961e+08
Oceania 4.503855e+07
South America 4.368166e+08
Name: 2022 Population , dtype: float64

In [10]: continents=continent_wise_pop.index
population=continent_wise_pop.values
sns.set_style('whitegrid')
explode=[0,0,0,0,1,0]
[Link](population,labels=continents,explode=explode,autopct='%1.1f%%',shadow=True)
[Link]('continent wise population percentage',fontsize=20)
#[Link]()
[Link]()
In [17]: #lets plot world population over the years
Total_Population=df[[' 2022 Population ',' 2020 Population ',' 2015 Population ',' 2010 Population ',' 2000 Populatio
Total_Population
Out[17]: 2022 Population 7.973413e+09
2020 Population 7.839251e+09
2015 Population 7.424810e+09
2010 Population 6.983785e+09
2000 Population 6.147056e+09
1990 Population 5.314192e+09
1980 Population 4.442400e+09
1970 Population 3.694137e+09
dtype: float64

In [18]: [Link](figsize=(15,5))
sns.set_style('whitegrid')
x=Total_Population.values
[Link](Total_Population,marker="o",ms=7,label=x)
[Link](ls='dotted')
[Link]('Total Population over years',fontsize=20)
[Link]('In Billions')
[Link](title='Population',fancybox=False)
[Link]('Year')
[Link]()
In [19]: years_population_continent_wise=[Link](['Continent'])[[' 2022 Population ',' 2020 Population ',' 2015 Population

In [20]: dff=years_population_continent_wise.transpose()
dff
Out[20]: Continent Asia Africa Europe North America South America Oceania

2022 Population 4.721383e+09 1.426731e+09 743147538.0 600296136.0 436816608.0 45038554.0

2020 Population 4.663087e+09 1.360672e+09 745792196.0 594236593.0 431530043.0 43933426.0

2015 Population 4.458250e+09 1.201102e+09 741535608.0 570383850.0 413134396.0 40403283.0

2010 Population 4.220041e+09 1.055228e+09 735613934.0 542720651.0 393078250.0 37102764.0

2000 Population 3.735090e+09 8.189460e+08 726093423.0 486069584.0 349634282.0 31222778.0

1990 Population 3.210564e+09 6.381506e+08 720320797.0 421266425.0 297146415.0 26743822.0

1980 Population 2.635334e+09 4.815364e+08 692527159.0 368293361.0 241789006.0 22920240.0

1970 Population 2.144906e+09 3.654443e+08 655923991.0 315434606.0 192947156.0 19480270.0

In [21]: [Link](figsize=(15,6))
[Link](dff,marker='*',ms=10)
[Link]('Year')
[Link]('In Billions')
[Link](ls='dotted')
[Link]('Continental Population over years',fontsize=20)
[Link]()
In [87]: header_values=years_population_continent_wise.columns

In [88]: header_values

Out[88]: Index([' 2022 Population ', ' 2020 Population ', ' 2015 Population ',
' 2010 Population ', ' 2000 Population ', ' 1990 Population ',
' 1980 Population ', ' 1970 Population '],
dtype='object')
In [22]: #Melt the data as per requirement
df_melted=[Link](id_vars=['Continent'],value_vars=[' 2022 Population ',' 2020 Population ',' 2015 Population ',' 201

In [17]: df_melted

Out[17]: Continent Year Population

0 Asia 2022 Population 1425887337

1 Asia 2022 Population 1417173173

2 North America 2022 Population 338289857

3 Asia 2022 Population 275501339

4 Asia 2022 Population 235824862

... ... ... ...

1867 North America 1970 Population 11402

1868 South America 1970 Population 2274

1869 Oceania 1970 Population 5185

1870 Oceania 1970 Population 1714

1871 Europe 1970 Population 752

1872 rows × 3 columns

In [34]: #population_by_continent=df_melted.groupby(['Continent'])['Year'].sum().reset_index()
In [35]: #population_by_continent

Out[35]: Continent Year

0 Africa 2022 Population 2022 Population 2022 Popula...

1 Asia 2022 Population 2022 Population 2022 Popula...

2 Europe 2022 Population 2022 Population 2022 Popula...

3 North America 2022 Population 2022 Population 2022 Popula...

4 Oceania 2022 Population 2022 Population 2022 Popula...

5 South America 2022 Population 2022 Population 2022 Popula...

In [24]: fig=[Link](df_melted,x='Year',y='Population',color='Continent',title='Population by continent in years')

[Link]()
Population by continent in years

1.4B

1.2B

1B
Population

0.8B

0.6B

0.4B

0.2B
In [25]: selected_columns=[' 2022 Population ',' 2020 Population ',' 2015 Population ',' 2010 Population ',' 2000 Population
total_population=df[selected_columns].sum()

In [26]: a=total_population.[Link]('Population','')
b=total_population.values

In [27]: [Link](figsize=(15,5))
[Link](a,b,label=b,color=sns.color_palette('inferno'),width=0.5)
[Link](title='Population ')
[Link]('YEAR')
[Link]('POPULATION')
[Link]('YEAR WISE WORLD POPULATION',fontsize=20)
#[Link](0,7)
[Link]()
In [28]: Top_10_by_growthrate=df.sort_values(by='Growth Rate',ascending=False).head(10)
Low_Population_growth_countries=df.sort_values(by='Growth Rate',ascending=False).tail(10)

In [29]: Top_10_by_growthrate
Out[29]:
2022 2020 2015 2010 2000 1990 1980
Rank Country/Territory Continent
Population Population Population Population Population Population Population

134 135 Moldova Europe 3272996.0 3084847.0 3277388.0 3678186.0 4251573.0 4480199.0 4103240.0

36 37 Poland Europe 39857145.0 38428366.0 38553146.0 38597353.0 38504431.0 38064255.0 35521429.0

53 54 Niger Africa 26207977.0 24333639.0 20128124.0 16647543.0 11622665.0 8370647.0 6173177.0

59 60 Syria Asia 22125249.0 20772595.0 19205178.0 22337563.0 16307654.0 12408996.0 8898954.0

115 116 Slovakia Europe 5643453.0 5456681.0 5424444.0 5396424.0 5376690.0 5261305.0 4973883.0

14 15 DR Congo Africa 99010212.0 92853164.0 78656904.0 66391257.0 48616317.0 35987541.0 26708686.0

181 182 Mayotte Africa 326101.0 305587.0 249545.0 211786.0 159215.0 92659.0 52233.0

68 69 Chad Africa 17723315.0 16644701.0 14140274.0 11894727.0 8259137.0 5827069.0 4408230.0

41 42 Angola Africa 35588987.0 33428485.0 28127721.0 23364185.0 16394062.0 11828638.0 8330047.0

58 59 Mali Africa 22593590.0 21224040.0 18112907.0 15529181.0 11239101.0 8945026.0 7372581.0

In [30]: x=Top_10_by_growthrate['Country/Territory']
y=Top_10_by_growthrate['Growth Rate']
[Link](figsize=(12,5))
[Link](x,y,width=0.5,label=y,color=sns.color_palette('plasma'))
[Link](loc=1,title='Growth rate')
[Link]('HIGHESH GROWING COUNTRIES IN THE WORLD',fontsize=20)
[Link](1,1.10)
[Link]('GROWTH RATE')
[Link]('COUNTRIES')
[Link]()

In [34]: #Low_Population_growth_countries=[Link]('Country/Territory')['Growth Rate'].sum().sort_values().head(10)

Low_Population_growth_countries
Out[34]:
2022 2020 2015 2010 2000 1990 1980
Rank Country/Territory Continent
Population Population Population Population Population Population Population

129 130 Croatia Europe 4030358 4096868 4254815 4368682 4548434 4873707 4680144

104 105 Serbia Europe 7221365 7358005 7519496 7653748 7935022 7987529 7777010

214 215 Marshall Islands Oceania 41569 43413 49410 53416 54224 46047 31988

Bosnia and
136 137 Europe 3233526 3318407 3524324 3811088 4179350 4494310 4199820
Herzegovina

150 151 Latvia Europe 1850651 1897052 1991955 2101530 2392530 2689391 2572037

140 141 Lithuania Europe 2750055 2820267 2963765 3139019 3599637 3785847 3521206

107 108 Bulgaria Europe 6781953 6979175 7309253 7592273 8097691 8767778 8980606

212 213 American Samoa Oceania 44273 46189 51368 54849 58230 47818 32886

118 119 Lebanon Asia 5489739 5662923 6398940 4995800 4320642 3593700 2963702

37 38 Ukraine Europe 39701739 43909666 44982564 45683020 48879755 51589817 49973920

In [46]: x=Low_Population_growth_countries['Country/Territory']
y=Low_Population_growth_countries['Growth Rate']
[Link](figsize=(16,5))
[Link](x,y,label=y,color=sns.color_palette('magma'))
[Link]('SLOWEST GROWING COUNTRIES',fontsize=20)
[Link](title='Growth Rate')
[Link]('COUNTRY')
[Link]('GROWTH RATE')
[Link](0,1.2)
[Link](-0.5,10)
[Link]()

In [30]: [Link]

Out[30]: Index(['Rank', 'Country/Territory', 'Continent', ' 2022 Population ',

' 2020 Population ', ' 2015 Population ', ' 2010 Population ',
' 2000 Population ', ' 1990 Population ', ' 1980 Population ',
' 1970 Population ', 'Area (km?)', 'Density (per km?)', 'Growth Rate',
'World Population Percentage'],
dtype='object')
In [33]: Top_10_Populated_countries_in_1970=[Link]('Country/Territory')[' 1970 Population '].sum().sort_values(ascending=F

In [34]: Top_10_Populated_countries_in_1970

Out[34]: Country/Territory
China 822534450.0
India 557501301.0
United States 200328340.0
Russia 130093010.0
Indonesia 115228394.0
Japan 105416839.0
Brazil 96369875.0
Germany 78294583.0
Bangladesh 67541860.0
Pakistan 59290872.0
Name: 1970 Population , dtype: float64

In [41]: x=Top_10_Populated_countries_in_1970.index
y=Top_10_Populated_countries_in_1970.values
[Link](figsize=(12,5))
[Link](x,y,label=y,color=sns.color_palette('magma'))
[Link]('MOST POPULATED COUNTRIES in 1970',fontsize=20)
[Link](title='POPULATION')
[Link]('COUNTRY')
[Link]('In Billions')
[Link]()
In [38]: Top_10_Populated_countries_in_2022=[Link]('Country/Territory')[' 2022 Population '].sum().sort_values(ascending=F

In [47]: x=Top_10_Populated_countries_in_2022.index
y=Top_10_Populated_countries_in_2022.values
[Link](figsize=(12,6))
#c=sns.color_palette('cividis')
[Link](x,y,label=y,color=sns.color_palette('cividis'))
[Link]('MOST POPULATED COUNTRIES in 2022',fontsize=20)
[Link]('COUNTRY')
[Link]('In Billions')
[Link](title='Population')
[Link]()
In [69]: df[df['Country/Territory']=='Vatican City']
Out[69]:
2022 2020 2015 2010 2000 1990 1980
Rank Country/Territory Continent
Population Population Population Population Population Population Population

233 234 Vatican City Europe 510 520 564 596 651 700 733

In [44]: Area=[Link]('Country/Territory')['Area (km?)'].sum().sort_values(ascending=False).head(10)

Area

Out[44]: Country/Territory
Russia 17098242
Canada 9984670
China 9706961
United States 9372610
Brazil 8515767
Australia 7692024
India 3287590
Argentina 2780400
Kazakhstan 2724900
Algeria 2381741
Name: Area (km?), dtype: int64

In [49]: [Link](figsize=(10,6))
#[Link](Area)
x=[Link]
y=[Link]
[Link](x,y,label=y,color=sns.color_palette('Greens'))
[Link]('LARGEST COUNTRIES IN THE WORLD',fontsize=20)
[Link](title='AREA')
[Link]('COUNTRY')
[Link]('Square Kilometers')
#[Link](labels=[17098242, 9984670,9706961,9372610,8515767,7692024,3287590,2780400,2724900,2381741])

[Link]()
In [48]: [Link]

Out[48]: Index(['Rank', 'Country/Territory', 'Continent', ' 2022 Population ',

In [105… densely_Populated=[Link]('Country/Territory')['Density (per km?)'].sum().sort_values(ascending=False).head(10)

In [106… densely_Populated

Out[106… Country/Territory
Macau 23172.2667
Monaco 18234.5000
Singapore 8416.4634
Hong Kong 6783.3922
Gibraltar 5441.5000
Bahrain 1924.4876
Maldives 1745.9567
Malta 1687.6139
Sint Maarten 1299.2647
Bermuda 1188.5926
Name: Density (per km?), dtype: float64

In [145… [Link](figsize=(10,6))
b=densely_Populated.values
a=densely_Populated.index
[Link](a,b,label=b,color=sns.color_palette('husl'))
[Link](title='Density (per km?)')
[Link]('DENSITY')
[Link]('COUNTRY')
[Link]('MOST DENSELY POPULATED COUNTRIES')
[Link]()
In [155… b=densely_Populated.values
a=densely_Populated.index
fig=[Link](data=[[Link](x=a,y=b)])
fig.update_layout(title='bar')
bar

20k

15k

10k

5k
In [147… low_densely_Populated=[Link]('Country/Territory')['Density (per km?)'].sum().sort_values().head(10)

In [148… low_densely_Populated

Out[148… Country/Territory
Greenland 0.0261
Falkland Islands 0.3105
Western Sahara 2.1654
Mongolia 2.1727
Namibia 3.1092
Australia 3.4032
Iceland 3.6204
French Guiana 3.6459
Guyana 3.7621
Suriname 3.7727
Name: Density (per km?), dtype: float64

In [153… [Link](figsize=(14,4))
x=low_densely_Populated.index
y=low_densely_Populated.values
[Link](x,y,label=y,color=sns.color_palette('viridis'))
[Link]('LOW DENSELY POPULATED COUNTRIES')
[Link](title='Density (per km?)')
[Link]('DENSITY')
[Link]('COUNTRY')
#[Link](-1,10)
[Link]()
In [21]: features=[' 1970 Population ' ,' 2020 Population ']
for feature in features:
fig = [Link](df,
locations='Country/Territory',
locationmode='country names',
color=feature,
hover_name='Country/Territory',
template='plotly_white',
title = feature)
[Link]()
1970 Population
2020 Population
PROJECT NO - 02 : SUPERMART GROCERY SALES - RETAIL ANALYSIS

In [1]: #Install and Import Necessary Libraries:

import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as px
from matplotlib import pyplot as plt

In [2]: #Data Collection

df=pd.read_csv(r'C:\Users\user\Desktop\Supermart Grocery Sal

In [7]: [Link]()

Out[7]: Order Customer Sub Order

Category City R
ID Name Category Date

Oil & 2017-

0 OD1 Harish Masalas Vellore
Masala 11-08

Health 2017-
1 OD2 Sudha Beverages Krishnagiri
Drinks 11-08

Food Atta & 2017-

2 OD3 Hussain Perambalur
Grains Flour 06-12

Fruits & Fresh 2016-

3 OD4 Jackson Dharmapuri
Veggies Vegetables 10-11

Food Organic 2016-

4 OD5 Ridhesh Ooty
Grains Staples 10-11

In [3]: #Data Preprocessing

[Link]()

<class '[Link]'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 9994 non-null object
1 Customer Name 9994 non-null object
2 Category 9994 non-null object
3 Sub Category 9994 non-null object
4 City 9994 non-null object
5 Order Date 9994 non-null object
6 Region 9994 non-null object
7 Sales 9994 non-null int64
8 Discount 9994 non-null float64
9 Profit 9994 non-null float64
10 State 9994 non-null object
dtypes: float64(2), int64(1), object(8)
memory usage: 859.0+ KB

In [5]: [Link]().sum()
Out[5]: Order ID 0
Customer Name 0
Category 0
Sub Category 0
City 0
Order Date 0
Region 0
Sales 0
Discount 0
Profit 0
State 0
dtype: int64

In [7]: #Exploratory Data Analysis (EDA)

#check coorelation of dataset
[Link](numeric_only=True)

Out[7]: order order

Sales Discount Profit
day month

Sales 1.000000 -0.005512 0.605349 0.000179 -0.009518 0

Discount -0.005512 1.000000 0.000017 0.003022 0.002068 -0

Profit 0.605349 0.000017 1.000000 0.010742 0.003184 -0

order
0.000179 0.003022 0.010742 1.000000 -0.033562 -0
day

order
-0.009518 0.002068 0.003184 -0.033562 1.000000 -0
month

order
0.007542 -0.018778 -0.006401 -0.017458 -0.020183 1
year

In [12]: #plot a Heatmap Of coorelation matrix

[Link]([Link](numeric_only=True))

Out[12]: <Axes: >

In [12]: #Based on coorelations lets plot graphs
sales_by_category=[Link]('Category')['Sales'].sum().sort

In [7]: sales_by_category

Out[7]: Category
Eggs, Meat & Fish 2267401
Snacks 2237546
Food Grains 2115272
Bakery 2112281
Fruits & Veggies 2100727
Beverages 2085313
Oil & Masala 2038442
Name: Sales, dtype: int64

In [103… x=sales_by_category.index
y=sales_by_category.values
[Link](figsize=(15,6))
[Link](x,y,label=y,color=sns.color_palette('plasma'),width=
[Link](title='Sales',loc=0)
[Link]('CATEGORY WISE SALES',fontsize=20)
[Link](axis='y')
[Link]('CATEGORY',fontsize=14)
[Link](0,3390000)
[Link]("SALES",fontsize=14)
[Link]()
In [7]: city_wise_sales=[Link]('City')['Sales'].sum().sort_value

In [8]: city_wise_sales

Out[8]: City
Trichy 541403
Nagercoil 551435
Dharmapuri 571553
Dindigul 575631
Theni 579553
Viluppuram 581274
Namakkal 598530
Ooty 599292
Virudhunagar 606820
Madurai 617836
Name: Sales, dtype: int64

In [78]: x=[Link]
y=[Link]
c=sns.color_palette('inferno')
[Link](figsize=(15,6))
[Link](x,y,label=y,color=sns.color_palette('inferno'))
[Link]('Top cities with high Sales',fontsize=20)
[Link]('CITY',fontsize=14)
[Link]('SALES',fontsize=14)
[Link](axis='x')
[Link](0,900000)
[Link](title='Sales',loc=0)
#[Link](rotation=45)
[Link]()
In [5]: df['Order Date']=pd.to_datetime(df['Order Date'])

C:\Users\user\AppData\Local\Temp\ipykernel_6768\1722919639.p
y:1: UserWarning: Could not infer format, so each element wil
l be parsed individually, falling back to `dateutil`. To ensu
re parsing is consistent and as-expected, please specify a fo
rmat.
df['Order Date']=pd.to_datetime(df['Order Date'])

In [6]: df['order day']=df['Order Date'].[Link]

df['order month']=df['Order Date'].[Link]
df['order year']=df['Order Date'].[Link]

In [137… month_sales=[Link]('order month')['Sales'].sum()

month_sales

Out[137… order month

1 577972
2 456102
3 1053980
4 998453
5 1086920
6 1057808
7 1089385
8 1046807
9 2064266
10 1243289
11 2193924
12 2088076
Name: Sales, dtype: int64

In [146… x=month_sales.index
y=month_sales.values
explode=[0,0,0,0,0,0,0,0,0,0,0,0]
[Link](y,labels=x,explode=explode,shadow=True,autopct='%1.1
[Link]('MONTH WISE SALES')

[Link]()
In [ ]: x=month_sales.index
y=month_sales.values
[Link]

In [17]: year_sales=[Link]('order year')['Sales'].sum().sort_valu

year_sales

Out[17]: order year

2018 4977512
2017 3871912
2016 3131959
2015 2975599
Name: Sales, dtype: int64

In [17]: year_sales=[Link]('order year')['Sales'].sum().sort_valu

year_sales

Out[17]: order year

2018 4977512
2017 3871912
2016 3131959
2015 2975599
Name: Sales, dtype: int64

In [24]: x=year_sales.index
y=year_sales.values

[Link](y,labels=x,shadow=True,autopct='%1.1f%%',colors=sns.
[Link]('YEAR WISE SALES')
[Link]()

In [3]: sales_by_product=[Link]('Sub Category')['Sales'].sum().s

sales_by_product

In [4]: x=sales_by_product.index
y=sales_by_product.values
[Link](figsize=(15,6))
[Link](x,y,label=y,color=sns.color_palette('plasma'),width=
[Link](title='Sales',loc=0)
[Link]('sales by product',fontsize=20)
[Link](axis='y')
[Link]('PRODUCT',fontsize=14)
[Link](0,1750000)
[Link]("SALES",fontsize=14)
[Link]()
In [8]: [Link]('Customer Name')['Profit'].sum().sort_values(asce

Out[8]: Customer Name

Arutra 87572.40
Vidya 86725.64
Krithika 85633.03
Akash 82121.26
Surya 80996.85
Name: Profit, dtype: float64

In [14]: day_sales=[Link]('order day')['Sales'].sum().sort_values

day_sales

Out[14]: order day

21 591670
20 586485
2 571073
3 564417
5 555101
26 550222
8 535192
11 533791
23 531681
9 526866
Name: Sales, dtype: int64

In [9]: le=LabelEncoder()

In [45]: #df['Sub Category']=le.fit_transform(df['Sub Category'])

#df['City']=le.fit_transform(df['City'])
#df['Region']=le.fit_transform(df['Region'])
#df['order month']=le.fit_transform(df['order month'])

In [22]: target=df['Sales']

In [27]: y_pred=[Link](x_test)

In [24]: x_train,x_test,y_train,y_test=train_test_split(features,targ

In [31]: mse=mean_squared_error(y_test,y_pred)
r2=r2_score(y_test,y_pred)
In [25]: scaler=StandardScaler()
x_train=scaler.fit_transform(x_train)
x_test=scaler.fit_transform(x_test)

In [26]: lr=LinearRegression()
[Link](x_train,y_train)

Out[26]: ▾ LinearRegression i ?

LinearRegression()

In [32]: print(mse,r2)

2199.6894915749735 0.9933977767041029

In [21]: features=[Link](columns=['Order ID','Customer Name','Order

In [37]: [Link](figsize=(8, 6))

[Link](y_test, y_pred)
[Link]([min(y_test), max(y_test)], [min(y_test),
max(y_test)], color='red')
[Link]('Actual vs Predicted Sales')
[Link]('Actual Sales')
[Link]('Predicted Sales')
[Link]()

In [11]: [Link](numeric_only=True)
Out[11]: order order
Sales Discount Profit
day month

Sales 1.000000 -0.005512 0.605349 0.000179 -0.009518 0

Discount -0.005512 1.000000 0.000017 0.003022 0.002068 -0

Profit 0.605349 0.000017 1.000000 0.010742 0.003184 -0

order
0.000179 0.003022 0.010742 1.000000 -0.033562 -0
day

order
-0.009518 0.002068 0.003184 -0.033562 1.000000 -0
month

order
0.007542 -0.018778 -0.006401 -0.017458 -0.020183 1
year

In [36]: [Link](y_test,y_pred)
[Link]([min(y_test), max(y_test)], [min(y_test),
max(y_test)], color='red')

Out[36]: [<[Link].Line2D at 0x20d9835b350>]

In [8]: [Link]

Out[8]: Index(['Order ID', 'Customer Name', 'Category', 'Sub Catego

ry', 'City',
'Order Date', 'Region', 'Sales', 'Discount', 'Profi
t', 'State',
'order day', 'order month', 'order year'],
dtype='object')

In [ ]:

World Population Trends with Python
No ratings yet
World Population Trends with Python
14 pages
World Population Analysis
100% (1)
World Population Analysis
64 pages
Web Scraping Using Python - Assignment Solutions
No ratings yet
Web Scraping Using Python - Assignment Solutions
2 pages
2
No ratings yet
2
18 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Matplot Question1
No ratings yet
Matplot Question1
2 pages
Basic Data Visualization with Matplotlib
No ratings yet
Basic Data Visualization with Matplotlib
37 pages
Data Visualization Lab Guide
No ratings yet
Data Visualization Lab Guide
41 pages
CovidData - Ipynb - Colaboratory
No ratings yet
CovidData - Ipynb - Colaboratory
4 pages
Lab 3
No ratings yet
Lab 3
3 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
Five Year Dataset
No ratings yet
Five Year Dataset
15 pages
Chirayu (1) Merged Merged
No ratings yet
Chirayu (1) Merged Merged
76 pages
Immigration Data Visualization Techniques
No ratings yet
Immigration Data Visualization Techniques
9 pages
Extended - Case - 2 - Fellow: 1 The Adverse Health Effects of Air Pollution - Are We Making Any Progress?
No ratings yet
Extended - Case - 2 - Fellow: 1 The Adverse Health Effects of Air Pollution - Are We Making Any Progress?
61 pages
Suicide Analysis
No ratings yet
Suicide Analysis
18 pages
DV0101EN-2-2-1-Area-Plots-Histograms-and-Bar-Charts-py-v2.0: 1 Exploring Datasets With Pandas and Matplotlib
No ratings yet
DV0101EN-2-2-1-Area-Plots-Histograms-and-Bar-Charts-py-v2.0: 1 Exploring Datasets With Pandas and Matplotlib
29 pages
Data Analysis and Preparation Guide
No ratings yet
Data Analysis and Preparation Guide
16 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
2 Tekrek M7 KNN - DGX 1
No ratings yet
2 Tekrek M7 KNN - DGX 1
15 pages
Pandas EDA for Data Science Students
No ratings yet
Pandas EDA for Data Science Students
20 pages
Eda - 1@3pm 8th Nov
No ratings yet
Eda - 1@3pm 8th Nov
2 pages
P Palaksha - Analyzing and Visualyzing Population Demographics Data
No ratings yet
P Palaksha - Analyzing and Visualyzing Population Demographics Data
10 pages
Course3 Notes
No ratings yet
Course3 Notes
44 pages
Chloropleth Population Growth - Py
No ratings yet
Chloropleth Population Growth - Py
1 page
Python Data Visualization Techniques
No ratings yet
Python Data Visualization Techniques
5 pages
SYNOPSIS
No ratings yet
SYNOPSIS
1 page
Pyhon Solution
No ratings yet
Pyhon Solution
45 pages
Paddy Diesease
No ratings yet
Paddy Diesease
20 pages
World Population Analysis Machine Learning Project (Data Analyst)
No ratings yet
World Population Analysis Machine Learning Project (Data Analyst)
27 pages
Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo
No ratings yet
Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo
22 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Pandas
No ratings yet
Pandas
25 pages
Exercise 1 Data Viz Histogram Bar Charts
No ratings yet
Exercise 1 Data Viz Histogram Bar Charts
7 pages
Projet Swift
No ratings yet
Projet Swift
12 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Panda 1
No ratings yet
Panda 1
18 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Draft 1 - Huan Heo
No ratings yet
Draft 1 - Huan Heo
31 pages
EDA Session-3 Categorical Data Analysis
No ratings yet
EDA Session-3 Categorical Data Analysis
16 pages
Exercises Part2
No ratings yet
Exercises Part2
7 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Exploratory Data Analysis and Preprocessing Pipeline
No ratings yet
Exploratory Data Analysis and Preprocessing Pipeline
18 pages
Population Growth Modeling Maps
No ratings yet
Population Growth Modeling Maps
1 page
Data Visualization With Python
No ratings yet
Data Visualization With Python
42 pages
Handling Missing Values, Outliers and Irregular Cardinalities
No ratings yet
Handling Missing Values, Outliers and Irregular Cardinalities
16 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
60 pages
Practical File Ip
No ratings yet
Practical File Ip
27 pages
BDA File
No ratings yet
BDA File
26 pages
Liberal Democrat
No ratings yet
Liberal Democrat
11 pages
EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
Interactive Mapping in Python With UK Census Data
No ratings yet
Interactive Mapping in Python With UK Census Data
24 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Customer Retail Shopping Analysis 1686591558
No ratings yet
Customer Retail Shopping Analysis 1686591558
45 pages
Dictionaries, Part 1: Hugo Bowne-Anderson
No ratings yet
Dictionaries, Part 1: Hugo Bowne-Anderson
60 pages
Financial Performance Dashboard - (Tableau - Finance Analyst)
100% (1)
Financial Performance Dashboard - (Tableau - Finance Analyst)
9 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
Machine Learning for Company Valuation
No ratings yet
Machine Learning for Company Valuation
41 pages
Personalized Healthcare Recommendations
No ratings yet
Personalized Healthcare Recommendations
6 pages
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
No ratings yet
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
21 pages
Regulatory Affairs of Road Accident Data 2020 India
No ratings yet
Regulatory Affairs of Road Accident Data 2020 India
23 pages
Tobacco Use and Mortality, 2004-2015
No ratings yet
Tobacco Use and Mortality, 2004-2015
12 pages
Climate Change Machine Learning Project
No ratings yet
Climate Change Machine Learning Project
10 pages
Odf2 Unit 9 Review
No ratings yet
Odf2 Unit 9 Review
7 pages
Systematic Qualitative Analysis of Anions
0% (1)
Systematic Qualitative Analysis of Anions
1 page
Grade 9 English Unified Learning Materials
No ratings yet
Grade 9 English Unified Learning Materials
10 pages
Drainage System Report for Engineering
No ratings yet
Drainage System Report for Engineering
4 pages
Geospatial Mapping and Analysis of The Distribution of Public Primary Healthcare Centers in Kaduna State, Nigeria.
No ratings yet
Geospatial Mapping and Analysis of The Distribution of Public Primary Healthcare Centers in Kaduna State, Nigeria.
15 pages
Cbse Ka and Tn Ipl Science General 086 Prefinals-2 Question Paper Set-2
No ratings yet
Cbse Ka and Tn Ipl Science General 086 Prefinals-2 Question Paper Set-2
18 pages
History Aqua
No ratings yet
History Aqua
11 pages
Cebeci 1972
No ratings yet
Cebeci 1972
7 pages
Nightingale's Environmental Theory Explained
No ratings yet
Nightingale's Environmental Theory Explained
25 pages
Water and Wastewater Treatment Exam
No ratings yet
Water and Wastewater Treatment Exam
10 pages
Vishay Micro Measurements CEA 06 062UWA 350 - C404310
No ratings yet
Vishay Micro Measurements CEA 06 062UWA 350 - C404310
7 pages
OPVL Analysis: A Critical Guide
0% (1)
OPVL Analysis: A Critical Guide
4 pages
LINA01 Introduction
No ratings yet
LINA01 Introduction
33 pages
5 Geography Farming Sba
No ratings yet
5 Geography Farming Sba
19 pages
Belts, Chain, Sprockets, Threaded Members (Lecture Notes) 1120
0% (1)
Belts, Chain, Sprockets, Threaded Members (Lecture Notes) 1120
1 page
Harold Et Al. 2014 BMC Ecology Winning Images
No ratings yet
Harold Et Al. 2014 BMC Ecology Winning Images
11 pages
PYP Exhibition Journal 2025
No ratings yet
PYP Exhibition Journal 2025
36 pages
Operation Management
No ratings yet
Operation Management
26 pages
The Calculating Machine
No ratings yet
The Calculating Machine
3 pages
Skripsi, Es Krim Rosella
No ratings yet
Skripsi, Es Krim Rosella
8 pages
ISO IEC 27003 ISMS Risk Management
No ratings yet
ISO IEC 27003 ISMS Risk Management
10 pages
Science 8 Quarter 3 Module 5 Colored
100% (5)
Science 8 Quarter 3 Module 5 Colored
16 pages
John Rawls
No ratings yet
John Rawls
4 pages
SEMIKRON DataSheet SK96GAB06UF 24923160
No ratings yet
SEMIKRON DataSheet SK96GAB06UF 24923160
7 pages
Unit 1
No ratings yet
Unit 1
10 pages
Orbital Rendezvous Tools Guide
No ratings yet
Orbital Rendezvous Tools Guide
138 pages
Niclal 38, Cumn10Ni4: (Shunt Grade)
No ratings yet
Niclal 38, Cumn10Ni4: (Shunt Grade)
2 pages
MSC Final
No ratings yet
MSC Final
8 pages
Savitribai Phule Pune University
No ratings yet
Savitribai Phule Pune University
2 pages
1 - Equilibrium of A Particle
No ratings yet
1 - Equilibrium of A Particle
25 pages