0% found this document useful (0 votes)
15 views

Data Science

The document describes a data science internship project. It includes code to analyze a dataset containing transaction data from a retail store with 129 rows and 7 columns. Various visualizations are created using the code, including heatmaps, relational plots, distplots, scatterplots, bar charts, and countplots. Key observations are that paperclips have the highest sales, laptops have the highest price, and there are more female than male customers. The conclusion recommends selling more of the high-selling categories and increasing quantities of less common products.

Uploaded by

rupesh karanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data Science

The document describes a data science internship project. It includes code to analyze a dataset containing transaction data from a retail store with 129 rows and 7 columns. Various visualizations are created using the code, including heatmaps, relational plots, distplots, scatterplots, bar charts, and countplots. Key observations are that paperclips have the highest sales, laptops have the highest price, and there are more female than male customers. The conclusion recommends selling more of the high-selling categories and increasing quantities of less common products.

Uploaded by

rupesh karanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

DATA SCIENCE INTERNSHIP PROJECT

Contributors:
Karanam Rupesh(Team lead)
Shivansh Srivastava

1
ACKNOWLEDGEMENT

It is with at most pleasure and excitement we submit our project partial fulfilment
of the requirement for the award of Data Science internship. The project is a result of
the cumulativeefforts, support, guidance, encouragement and inspiration from many
of those for whom we must give our truthful honour and express gratitude through
bringing out this project at the outset as per our knowledge. We convey special
thanks to our Project Guide who has guided us and encouraged us to enhance our
knowledge with present working of this project to enrich the quality of project. We
expressed our appreciation to our PR’S Software Services HR HARIOM SINGH
who facilitated us to providing a friendly environment which helped to enhance our
skills in the present project.

2
INDEX
1) CODE
2) VISUALIZATIONS
3) OBSERVATION
4) CONCLUSION

3
DATASET
We have prepared the dataset manually by selecting Transaction Id ,
product ,category , price ,customer age, customer gender
We have taken total 129 rows and 7 columns in our dataset.

4
CODE
df = pd.read_excel('project data.xlsx')
df

df.shape
df.dtypes
df.head()
df.tail()
df.head(15)
df.tail(15)
df.columns
df.PRODUCT
df.CATEGORY
df['PRICE'].head()
df['CUSTOMER AGE'].head(10)

5
df[['CUSTOMER AGE','CUSTOMER GENDER']]
OUTPUTS OF THE ABOVE CODE:

6
7
df[['CUSTOMER AGE','CUSTOMER GENDER']].head(20)
df.loc[0,:]
df.isnull()
df.isnull().sum()
OUTPUTS OF THE ABOVE CODE

8
HeatMaps is about replacing numbers with colors because the human
brain understands visuals better than numbers, text, or any written
data. Human beings are visual learners; therefore, visualizing the data
in any form makes more sense. Heatmaps represent data in an easy-to-
understand manner.
corelation = df.corr()
sns.heatmap(corelation, xticklabels = corelation.columns,
yticklabels = corelation.columns,annot=True)

9
Relational plots are used for visualizing the statistical relationship
between the data points. Visualization is necessary because it allows
the human to see trends and patterns in the data. The process of
understanding how the variables in the dataset relate each other and
their relationships are termed as Statistical analysis.
sns.relplot(x
='PRODUCT',y='CATEGORY',hue='QUANTITY',data=df)

10
DistPlot It is used basically for univariant set of observations and
visualizes it through a histogram i.e. only one observation and hence
we choose one particular column of the dataset.
sns.distplot(df['PRICE'])

11
sns.distplot(df['PRICE'],bins=5)

sns.catplot(x='QUANTITY',kind = 'box', data = df)

12
import plotly.graph_objects as go
fig=go.Figure(data=go.Scatter(x=df.PRODUCT,y=df.QUANTITY,m
ode='lines+markers',marker_color='orange',marker_size=20))
fig.update_layout(title='RELATION BETWEEN PRODUCT AND
QUANTITY',xaxis_title='PRODUCT',yaxis_title='QUANTITY')
fig.show()

This above graph tells us that relation ship between product and
quantity of the items in the retail store scatterplot we have drawn.

13
import plotly.graph_objects as go
import numpy as np

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.PRODUCT,y=df.CATEGORY,name='
CATEGORY'))
fig.add_trace(go.Scatter(x=df.PRODUCT,y=df.PRICE,name='PRIC
E'))

fig.update_traces(mode='lines+markers', line_width=5,
marker_size=10)
fig.update_layout(title="RELATION BETWEEN PRODUCT
CATEGORY AND PRICE"
,xaxis_title="PRODUCT",yaxis_title="CATEGORY/PRICE",
width=1000,
height=500,paper_bgcolor="LightSteelBlue",plot_bgcolor="green"
,showlegend=True)
fig.update_xaxes(showgrid=True)
fig.update_yaxes(showgrid=True)
fig.show()

14
This above graph tells us that relation ship between product and
quantity ,product and price of the items in the retail store scatterplot
we have drawn.

15
import plotly.graph_objects as go
fig=go.Figure(go.Bar(x=df.PRODUCT,y=df.PRICE))
fig.update_layout(title='BARCHART
',xaxis_title='PRODUCT',yaxis_title='PRICE')
fig.show()

This above graph tells us that relation ship between product and
quantity ,product and price of the items in the retail store scatterplot
we have drawn.

This above Bar graph tells us that relation ship between product
and price of the items in the retail store.

16
import plotly.graph_objects as go

fig=go.Figure(go.Bar(x=df.PRODUCT,y=df.QUANTITY))
fig.update_layout(title='BARCHART
',xaxis_title='PRODUCT',yaxis_title='QUANTITY')
fig.show()

This above Bar graph tells us that relation ship between quantity
and product of the items in the retail store.

17
import plotly.graph_objects as go

fig=go.Figure(go.Bar(x=df.QUANTITY,y=df.PRICE))
fig.update_layout(title='BARCHART
',xaxis_title='QUANTITY',yaxis_title='PRICE')
fig.show()

This above Bar graph tells us that relation ship between quantity
and price of the items in the retail store.

18
countplot() method is used to Show the counts of observations in each categorical
bin using bars.

sns.countplot(x='CUSTOMER GENDER',data=df)
plt.title('Distribution of GENDER')

19
20
sns.countplot(x='CUSTOMER AGE',hue='CUSTOMER
GENDER',data=df)
plt.title('DISTRIBUTION OF CUSTOMER AGE BY CUSTOMER
GENDER')

21
OBSERVATIONS:
1) We have observed that the paperclips are in the highest selling
category. Second category is pens ,third category is envelope
2) Last selling categories are monitor ,large sign , small sign
3) Laptop is having highest selling price among all the items in the
retail store.And second highest selling categories are ficus and
monitor.
4) Female customers are more in number compared to the male
customers of the retail store.

Conclusion:
1) It will be beneficial to the owner of the retail store if he sells
by keeping more no of public Areas categories products.
2) He has to increase the quantity of jackets,smartphones
,alaram clock,wall chair as they are less in number compared to
other products.

22

You might also like