0% found this document useful (0 votes)
23 views14 pages

Saikat Dey Data Science Project

The document contains 10 assignments analyzing different datasets using Python visualization tools like Matplotlib and Seaborn. Each assignment asks the student to collect a dataset, create a plot to visualize trends or relationships in the data, and sometimes add additional analysis steps. The assignments cover topics like weather data, car sales, stock markets, city health metrics, website traffic, and more. Code solutions are provided for each assignment question.

Uploaded by

mdluffyyy300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Saikat Dey Data Science Project

The document contains 10 assignments analyzing different datasets using Python visualization tools like Matplotlib and Seaborn. Each assignment asks the student to collect a dataset, create a plot to visualize trends or relationships in the data, and sometimes add additional analysis steps. The assignments cover topics like weather data, car sales, stock markets, city health metrics, website traffic, and more. Code solutions are provided for each assignment question.

Uploaded by

mdluffyyy300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

[Roll No:- 504122011057| Subhasish Ghosh

S.I ASSIGNENT NO QUESTION PAGE NO REMARKS


NO

1 1 Weather Data Visualization: 03

2 2 Car Sales Analysis: 04

3 3 Summer Analysis: 05

4 4 Stock Market Visualization: 06

5 5 City Health Analysis: 07

6 6 Product Sales Analysis: 09

7 7 Website Traffic Analysis: 10

8 8 Economic Growth Prediction: 11

9 9 Movie Genre Popularity: 13

10 10 School Performance Analysis: 14

INDEX
Page | 1
[Roll No:- 504122011057| Subhasish Ghosh

1. Weather Data Visualization:Collect monthly average temperatures for your city for the past
year.Plot a line graph to visualize the temperature trend over the year.Bonus: Compare it
with another city and plot both on the same graph.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

data={'Month': ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],

'City1(Kolkata)Temparature':[19,22.7,27.1,29.9,30.6,29.5,28.1,27.9,27.6,26.3,23.3,20.1],

'City2(New Delhi)Temparature':[13.5,16.9,22.5,29.2,32.7,33,29.9,28.7,27.8,25.4,20.5,15.4]}

df=pd.DataFrame(data)

plt.figure(figsize=(10,5))

sns.lineplot(data=df,x='Month',y='City1(Kolkata)Temparature',marker='o')

plt.title('Visualize the temperature trend over the year')

plt.show()

months = df["Month"]

Temparature1 = df["City1(Kolkata)Temparature"]

Temparature2 = df["City2(New Delhi)Temparature"]

plt.figure(figsize=(10,5))

plt.plot(months, Temparature1, marker='o', label='City1(Kolkata)')

plt.plot(months, Temparature2, marker='o', label='City2(New Delhi)')

plt.title("Monthly Average Temperatures of Two Cities(Kolkata & New Delhi)")

plt.xlabel("Month")

plt.ylabel("Temparature")

plt.legend()

plt.grid(True)

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Page | 2
[Roll No:- 504122011057| Subhasish Ghosh

2. Car Sales Analysis:Gather data on the number of cars sold by different brands in your
country in the past year.Create a bar plot to visualize which brand sold the most cars.Bonus:
Add a pie chart to show the market share of each brand.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

car_data={'Car_Brand': ['Maruti','Hyundai','Tata','Mahindra','Kia','Toyota','Honda','Renault'],

'Car_sales':[1576.03,552.51,526.82,333.05,254.56,160.38,95.02,87.12]}

df=pd.DataFrame(car_data)

plt.figure(figsize=(10,5))

sns.barplot(data=df,x='Car_Brand',y='Car_sales')

Page | 3
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Number of Cars Sold by Different Brands')

plt.show()

plt.figure(figsize=(10,7))

plt.pie(df['Car_sales'], labels=df['Car_Brand'],autopct='%1.1f%%',

shadow=True)

plt.title('Market share of each brand Distributions')

plt.show()

3. Summer Analysis:Collect data on ice cream sales and drowning incidents for each month of
the summer.Plot a scatterplot to see if there's any correlation between the two.Bonus: Use a
regression line to predict the number of drowning incidents based on ice cream sales
Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep','Oct', 'Nov', 'Dec']

ice_cream_sales = [100, 105, 110, 115, 120, 125, 130, 128, 126, 122, 118,115]

drowning_incidents = [5, 5, 6, 6, 7, 8, 8, 8, 7, 6, 5, 5]

plt.figure(figsize=(8, 6))

plt.scatter(ice_cream_sales, drowning_incidents, color='blue')

Page | 4
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Ice Cream Sales vs Drowning Incidents')

plt.xlabel('Ice Cream Sales')

plt.ylabel('Drowning Incidents')

for i, month in enumerate(months):

plt.annotate(month, (ice_cream_sales[i], drowning_incidents[i]))

plt.show()

from sklearn.linear_model import LinearRegression

X = np.array(ice_cream_sales).reshape(-1, 1)

y = np.array(drowning_incidents)

model = LinearRegression()

model.fit(X, y)

slope = model.coef_[0]

intercept = model.intercept_

print(f'Regression Line: y = {slope:.2f}x + {intercept:.2f}')

new_ice_cream_sales = np.array([[900]])

predicted_drownings = model.predict(new_ice_cream_sales)

print(f'Predicted Drowning Incidents for 900 Ice Cream Sales: {predicted_drownings[0]:.2f}')

Regression Line: y = 0.11x + -6.38

Page | 5
[Roll No:- 504122011057| Subhasish Ghosh

Predicted Drowning Incidents for 900 Ice Cream Sales: 90.71

4. Stock Market Visualization:Choose 5 sectors in the stock market.Collect data on the market
share of each sector.Create a pie chart to visualize the distribution of these sectors in the
market.
Code:-

import pandas as pd

import matplotlib.pyplot as plt

Sector_data={'Sector': ['Technology','Healthcare','Finance','Consumer Goods','Energy'],

'Market Share':[30,20,15,10,5]}

df=pd.DataFrame(Sector_data)

plt.figure(figsize=(10,5))

plt.pie(df['Market Share'], labels=df['Sector'],autopct='%1.1f%%',

shadow=True,startangle=200)

plt.title('Stock Market Sector Distributions')

plt.show()

Page | 6
[Roll No:- 504122011057| Subhasish Ghosh

5. City Health Analysis:Gather data on the weight of residents in your city.Plot a histogram to
visualize the weight distribution.Bonus: Add bins to categorize the weights into
underweight, normal, overweight, and obese.
Code:-
import numpy as np

import matplotlib.pyplot as plt

np.random.seed(42)

num_residents = 1000

weights = np.random.normal(70, 10, num_residents)

plt.figure(figsize=(10, 6))

plt.hist(weights, bins=20, color='blue', edgecolor='black')

plt.title("Weight Distribution of Residents")

plt.xlabel("Weight (kg)")

plt.ylabel("Frequency")

plt.grid(True)

plt.axvline(x=18.5, color='red', linestyle='dashed', label='Underweight')

plt.axvline(x=24.9, color='green', linestyle='dashed', label='Normal')

plt.axvline(x=29.9, color='orange', linestyle='dashed', label='Overweight')

plt.axvline(x=30, color='purple', linestyle='dashed', label='Obese')

plt.legend()

plt.show()

Page | 7
[Roll No:- 504122011057| Subhasish Ghosh

6. Product Sales Analysis:For a retail store, gather monthly sales data for two different
products.Plot a line graph to compare the sales trend of these products over the year.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

data = {

"Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],

"Product A Sales": [500, 600, 700, 750, 800, 900, 950, 1000, 1100, 1200, 1300, 1400],

"Product B Sales": [350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900]}

df = pd.DataFrame(data)

months = df["Month"]

product_a_sales = df["Product A Sales"]

product_b_sales = df["Product B Sales"]

plt.figure(figsize=(10, 6))

plt.plot(months, product_a_sales, marker='o', label='Product A')

plt.plot(months, product_b_sales, marker='o', label='Product B')

plt.title("The sales trend of these products over the year")

plt.xlabel("Month")

plt.ylabel("Sales")

plt.legend()

plt.grid(True)

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Page | 8
[Roll No:- 504122011057| Subhasish Ghosh

7. Website Traffic Analysis:Collect data on monthly website visits and sales for an e-commerce
website.Plot a scatterplot to analyze if there's a correlation between website visits and
sales.Bonus: Use different colors or sizes for points to represent different months.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

web_data = {'Month': ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],

'Website_Visits':[1000,1050,1100,1150,1200,1250,1300,1280,1260,1220,1180,1150],

'Sales': [200,210,230,250,270,290,300,295,280,260,240,220]}

ecommerce_data = pd.DataFrame(web_data)

visits = ecommerce_data["Website_Visits"]

sales = ecommerce_data["Sales"]

months = ecommerce_data["Month"]

point_sizes = np.sqrt(sales) * 5

plt.figure(figsize=(10, 6))

plt.scatter(visits, sales, c=range(len(months)), s=point_sizes, cmap='viridis', marker='o')

plt.title("Website Visits vs. Sales")

plt.xlabel("Website Visits")

Page | 9
[Roll No:- 504122011057| Subhasish Ghosh

plt.ylabel("Sales")

plt.colorbar(label="Month Index")

plt.grid(True)

plt.show()

8. Economic Growth Prediction:Collect GDP data for your country for the past 10 years.Plot a
line graph to visualize the economic growth.Bonus: Use regression to predict the GDP for
the next year.
Code:-
import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt

import seaborn as sns

data={'Year': ['2013','2014','2015','2016','2017','2018','2019','2020','2021','2022'],

'GDP_Growth':[6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00]}

df=pd.DataFrame(data)

Page | 10
[Roll No:- 504122011057| Subhasish Ghosh

plt.figure(figsize=(10,5))

sns.lineplot(data=df,x='Year',y='GDP_Growth',marker='o')

plt.title('Visualize The Economic Growth')

plt.show()

Years = np.array([2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022])

GDP = np.array([6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00])

years_reshape = Years.reshape(-1, 1)

model = LinearRegression()

model.fit(years_reshape, GDP)

next_year = 2023

predicted_gdp = model.predict([[next_year]])

print(f"Predicted GDP for {next_year}: {predicted_gdp[0]}")

plt.scatter(Years, GDP, color='blue')

plt.plot(Years, model.predict(years_reshape), color='red')

plt.xlabel('Year')

plt.ylabel('GDP')

plt.title('GDP Growth and Prediction')

plt.show()

Predicted GDP for 2023: 6.65933333333335

Page | 11
[Roll No:- 504122011057| Subhasish Ghosh

9. Movie Genre Popularity:Gather data on the number of movies released in different genres
in the past year.Create a bar plot to visualize which genre is the most popular based on the
number of releases.Bonus: Add a pie chart to show the distribution of movies across genres.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

Movie_data={'Genres':['Drama','Documentry','Comedy','Action','Thriller','Horror','Adventure','Romantic
Comedy','Musical'],

'Movie Genre Popularity':[145,74,57,54,46,39,28,11,6]}

df=pd.DataFrame(Movie_data)

plt.figure(figsize=(10,5))

sns.barplot(data=df,x='Genres',y='Movie Genre Popularity')

plt.xticks(rotation=75)

plt.title('Number of Movies Released')

plt.show()

plt.figure(figsize=(10,8))

plt.pie(df['Movie Genre Popularity'], labels=df['Genres'],autopct='%1.1f%%',

shadow=True)

Page | 12
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Distribution of Movies Across Genres')

plt.show()

10. School Performance Analysis:Collect data on student grades for a particular subject in a
school.Plot a histogram to visualize the distribution of grades.Bonus: Use different colors to
represent different classes or sections.
Code:-
import numpy as np

import matplotlib.pyplot as plt

np.random.seed(42)

Page | 13
[Roll No:- 504122011057| Subhasish Ghosh

class_a_grades = np.random.normal(70, 10, 200)

class_b_grades = np.random.normal(85, 8, 180)

plt.figure(figsize=(10, 6))

plt.hist([class_a_grades, class_b_grades], bins=15, color=['blue', 'green'], label=['Class A', 'Class B'])

plt.title("Grade Distribution by Class")

plt.xlabel("Grades")

plt.ylabel("Frequency")

plt.legend()

plt.grid(True)

plt.show()

Page | 14

You might also like