Saikat Dey Data Science Project
Saikat Dey Data Science Project
3 3 Summer Analysis: 05
INDEX
Page | 1
[Roll No:- 504122011057| Subhasish Ghosh
1. Weather Data Visualization:Collect monthly average temperatures for your city for the past
year.Plot a line graph to visualize the temperature trend over the year.Bonus: Compare it
with another city and plot both on the same graph.
Code:-
import pandas as pd
data={'Month': ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],
'City1(Kolkata)Temparature':[19,22.7,27.1,29.9,30.6,29.5,28.1,27.9,27.6,26.3,23.3,20.1],
'City2(New Delhi)Temparature':[13.5,16.9,22.5,29.2,32.7,33,29.9,28.7,27.8,25.4,20.5,15.4]}
df=pd.DataFrame(data)
plt.figure(figsize=(10,5))
sns.lineplot(data=df,x='Month',y='City1(Kolkata)Temparature',marker='o')
plt.show()
months = df["Month"]
Temparature1 = df["City1(Kolkata)Temparature"]
plt.figure(figsize=(10,5))
plt.xlabel("Month")
plt.ylabel("Temparature")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Page | 2
[Roll No:- 504122011057| Subhasish Ghosh
2. Car Sales Analysis:Gather data on the number of cars sold by different brands in your
country in the past year.Create a bar plot to visualize which brand sold the most cars.Bonus:
Add a pie chart to show the market share of each brand.
Code:-
import pandas as pd
car_data={'Car_Brand': ['Maruti','Hyundai','Tata','Mahindra','Kia','Toyota','Honda','Renault'],
'Car_sales':[1576.03,552.51,526.82,333.05,254.56,160.38,95.02,87.12]}
df=pd.DataFrame(car_data)
plt.figure(figsize=(10,5))
sns.barplot(data=df,x='Car_Brand',y='Car_sales')
Page | 3
[Roll No:- 504122011057| Subhasish Ghosh
plt.show()
plt.figure(figsize=(10,7))
plt.pie(df['Car_sales'], labels=df['Car_Brand'],autopct='%1.1f%%',
shadow=True)
plt.show()
3. Summer Analysis:Collect data on ice cream sales and drowning incidents for each month of
the summer.Plot a scatterplot to see if there's any correlation between the two.Bonus: Use a
regression line to predict the number of drowning incidents based on ice cream sales
Code:-
import numpy as np
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep','Oct', 'Nov', 'Dec']
ice_cream_sales = [100, 105, 110, 115, 120, 125, 130, 128, 126, 122, 118,115]
drowning_incidents = [5, 5, 6, 6, 7, 8, 8, 8, 7, 6, 5, 5]
plt.figure(figsize=(8, 6))
Page | 4
[Roll No:- 504122011057| Subhasish Ghosh
plt.ylabel('Drowning Incidents')
plt.show()
X = np.array(ice_cream_sales).reshape(-1, 1)
y = np.array(drowning_incidents)
model = LinearRegression()
model.fit(X, y)
slope = model.coef_[0]
intercept = model.intercept_
new_ice_cream_sales = np.array([[900]])
predicted_drownings = model.predict(new_ice_cream_sales)
Page | 5
[Roll No:- 504122011057| Subhasish Ghosh
4. Stock Market Visualization:Choose 5 sectors in the stock market.Collect data on the market
share of each sector.Create a pie chart to visualize the distribution of these sectors in the
market.
Code:-
import pandas as pd
'Market Share':[30,20,15,10,5]}
df=pd.DataFrame(Sector_data)
plt.figure(figsize=(10,5))
shadow=True,startangle=200)
plt.show()
Page | 6
[Roll No:- 504122011057| Subhasish Ghosh
5. City Health Analysis:Gather data on the weight of residents in your city.Plot a histogram to
visualize the weight distribution.Bonus: Add bins to categorize the weights into
underweight, normal, overweight, and obese.
Code:-
import numpy as np
np.random.seed(42)
num_residents = 1000
plt.figure(figsize=(10, 6))
plt.xlabel("Weight (kg)")
plt.ylabel("Frequency")
plt.grid(True)
plt.legend()
plt.show()
Page | 7
[Roll No:- 504122011057| Subhasish Ghosh
6. Product Sales Analysis:For a retail store, gather monthly sales data for two different
products.Plot a line graph to compare the sales trend of these products over the year.
Code:-
import pandas as pd
data = {
"Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
"Product A Sales": [500, 600, 700, 750, 800, 900, 950, 1000, 1100, 1200, 1300, 1400],
"Product B Sales": [350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900]}
df = pd.DataFrame(data)
months = df["Month"]
plt.figure(figsize=(10, 6))
plt.xlabel("Month")
plt.ylabel("Sales")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Page | 8
[Roll No:- 504122011057| Subhasish Ghosh
7. Website Traffic Analysis:Collect data on monthly website visits and sales for an e-commerce
website.Plot a scatterplot to analyze if there's a correlation between website visits and
sales.Bonus: Use different colors or sizes for points to represent different months.
Code:-
import pandas as pd
'Website_Visits':[1000,1050,1100,1150,1200,1250,1300,1280,1260,1220,1180,1150],
'Sales': [200,210,230,250,270,290,300,295,280,260,240,220]}
ecommerce_data = pd.DataFrame(web_data)
visits = ecommerce_data["Website_Visits"]
sales = ecommerce_data["Sales"]
months = ecommerce_data["Month"]
point_sizes = np.sqrt(sales) * 5
plt.figure(figsize=(10, 6))
plt.xlabel("Website Visits")
Page | 9
[Roll No:- 504122011057| Subhasish Ghosh
plt.ylabel("Sales")
plt.colorbar(label="Month Index")
plt.grid(True)
plt.show()
8. Economic Growth Prediction:Collect GDP data for your country for the past 10 years.Plot a
line graph to visualize the economic growth.Bonus: Use regression to predict the GDP for
the next year.
Code:-
import pandas as pd
import numpy as np
data={'Year': ['2013','2014','2015','2016','2017','2018','2019','2020','2021','2022'],
'GDP_Growth':[6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00]}
df=pd.DataFrame(data)
Page | 10
[Roll No:- 504122011057| Subhasish Ghosh
plt.figure(figsize=(10,5))
sns.lineplot(data=df,x='Year',y='GDP_Growth',marker='o')
plt.show()
Years = np.array([2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022])
GDP = np.array([6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00])
years_reshape = Years.reshape(-1, 1)
model = LinearRegression()
model.fit(years_reshape, GDP)
next_year = 2023
predicted_gdp = model.predict([[next_year]])
plt.xlabel('Year')
plt.ylabel('GDP')
plt.show()
Page | 11
[Roll No:- 504122011057| Subhasish Ghosh
9. Movie Genre Popularity:Gather data on the number of movies released in different genres
in the past year.Create a bar plot to visualize which genre is the most popular based on the
number of releases.Bonus: Add a pie chart to show the distribution of movies across genres.
Code:-
import pandas as pd
Movie_data={'Genres':['Drama','Documentry','Comedy','Action','Thriller','Horror','Adventure','Romantic
Comedy','Musical'],
df=pd.DataFrame(Movie_data)
plt.figure(figsize=(10,5))
plt.xticks(rotation=75)
plt.show()
plt.figure(figsize=(10,8))
shadow=True)
Page | 12
[Roll No:- 504122011057| Subhasish Ghosh
plt.show()
10. School Performance Analysis:Collect data on student grades for a particular subject in a
school.Plot a histogram to visualize the distribution of grades.Bonus: Use different colors to
represent different classes or sections.
Code:-
import numpy as np
np.random.seed(42)
Page | 13
[Roll No:- 504122011057| Subhasish Ghosh
plt.figure(figsize=(10, 6))
plt.xlabel("Grades")
plt.ylabel("Frequency")
plt.legend()
plt.grid(True)
plt.show()
Page | 14