0% found this document useful (0 votes)

23 views14 pages

Saikat Dey Data Science Project

The document contains 10 assignments analyzing different datasets using Python visualization tools like Matplotlib and Seaborn. Each assignment asks the student to collect a dataset, create a plot to visualize trends or relationships in the data, and sometimes add additional analysis steps. The assignments cover topics like weather data, car sales, stock markets, city health metrics, website traffic, and more. Code solutions are provided for each assignment question.

Uploaded by

mdluffyyy300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views14 pages

Saikat Dey Data Science Project

Uploaded by

mdluffyyy300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

[Roll No:- 504122011057| Subhasish Ghosh

S.I ASSIGNENT NO QUESTION PAGE NO REMARKS

1 1 Weather Data Visualization: 03

2 2 Car Sales Analysis: 04

3 3 Summer Analysis: 05

4 4 Stock Market Visualization: 06

5 5 City Health Analysis: 07

6 6 Product Sales Analysis: 09

7 7 Website Traffic Analysis: 10

8 8 Economic Growth Prediction: 11

9 9 Movie Genre Popularity: 13

10 10 School Performance Analysis: 14

INDEX
Page | 1
[Roll No:- 504122011057| Subhasish Ghosh

1. Weather Data Visualization:Collect monthly average temperatures for your city for the past
year.Plot a line graph to visualize the temperature trend over the year.Bonus: Compare it
with another city and plot both on the same graph.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

data={'Month': ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],

'City1(Kolkata)Temparature':[19,22.7,27.1,29.9,30.6,29.5,28.1,27.9,27.6,26.3,23.3,20.1],

'City2(New Delhi)Temparature':[13.5,16.9,22.5,29.2,32.7,33,29.9,28.7,27.8,25.4,20.5,15.4]}

df=pd.DataFrame(data)

plt.figure(figsize=(10,5))

sns.lineplot(data=df,x='Month',y='City1(Kolkata)Temparature',marker='o')

plt.title('Visualize the temperature trend over the year')

plt.show()

months = df["Month"]

Temparature1 = df["City1(Kolkata)Temparature"]

Temparature2 = df["City2(New Delhi)Temparature"]

plt.figure(figsize=(10,5))

plt.plot(months, Temparature1, marker='o', label='City1(Kolkata)')

plt.plot(months, Temparature2, marker='o', label='City2(New Delhi)')

plt.title("Monthly Average Temperatures of Two Cities(Kolkata & New Delhi)")

plt.xlabel("Month")

plt.ylabel("Temparature")

plt.legend()

plt.grid(True)

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Page | 2
[Roll No:- 504122011057| Subhasish Ghosh

2. Car Sales Analysis:Gather data on the number of cars sold by different brands in your
country in the past year.Create a bar plot to visualize which brand sold the most cars.Bonus:
Add a pie chart to show the market share of each brand.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

car_data={'Car_Brand': ['Maruti','Hyundai','Tata','Mahindra','Kia','Toyota','Honda','Renault'],

'Car_sales':[1576.03,552.51,526.82,333.05,254.56,160.38,95.02,87.12]}

df=pd.DataFrame(car_data)

plt.figure(figsize=(10,5))

sns.barplot(data=df,x='Car_Brand',y='Car_sales')

Page | 3
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Number of Cars Sold by Different Brands')

plt.show()

plt.figure(figsize=(10,7))

plt.pie(df['Car_sales'], labels=df['Car_Brand'],autopct='%1.1f%%',

shadow=True)

plt.title('Market share of each brand Distributions')

plt.show()

3. Summer Analysis:Collect data on ice cream sales and drowning incidents for each month of
the summer.Plot a scatterplot to see if there's any correlation between the two.Bonus: Use a
regression line to predict the number of drowning incidents based on ice cream sales
Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep','Oct', 'Nov', 'Dec']

ice_cream_sales = [100, 105, 110, 115, 120, 125, 130, 128, 126, 122, 118,115]

drowning_incidents = [5, 5, 6, 6, 7, 8, 8, 8, 7, 6, 5, 5]

plt.figure(figsize=(8, 6))

plt.scatter(ice_cream_sales, drowning_incidents, color='blue')

Page | 4
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Ice Cream Sales vs Drowning Incidents')

plt.xlabel('Ice Cream Sales')

plt.ylabel('Drowning Incidents')

for i, month in enumerate(months):

plt.annotate(month, (ice_cream_sales[i], drowning_incidents[i]))

plt.show()

from sklearn.linear_model import LinearRegression

X = np.array(ice_cream_sales).reshape(-1, 1)

y = np.array(drowning_incidents)

model = LinearRegression()

model.fit(X, y)

slope = model.coef_[0]

intercept = model.intercept_

print(f'Regression Line: y = {slope:.2f}x + {intercept:.2f}')

new_ice_cream_sales = np.array([[900]])

predicted_drownings = model.predict(new_ice_cream_sales)

print(f'Predicted Drowning Incidents for 900 Ice Cream Sales: {predicted_drownings[0]:.2f}')

Regression Line: y = 0.11x + -6.38

Page | 5
[Roll No:- 504122011057| Subhasish Ghosh

Predicted Drowning Incidents for 900 Ice Cream Sales: 90.71

4. Stock Market Visualization:Choose 5 sectors in the stock market.Collect data on the market
share of each sector.Create a pie chart to visualize the distribution of these sectors in the
market.
Code:-

import pandas as pd

import matplotlib.pyplot as plt

Sector_data={'Sector': ['Technology','Healthcare','Finance','Consumer Goods','Energy'],

'Market Share':[30,20,15,10,5]}

df=pd.DataFrame(Sector_data)

plt.figure(figsize=(10,5))

plt.pie(df['Market Share'], labels=df['Sector'],autopct='%1.1f%%',

shadow=True,startangle=200)

plt.title('Stock Market Sector Distributions')

plt.show()

Page | 6
[Roll No:- 504122011057| Subhasish Ghosh

5. City Health Analysis:Gather data on the weight of residents in your city.Plot a histogram to
visualize the weight distribution.Bonus: Add bins to categorize the weights into
underweight, normal, overweight, and obese.
Code:-
import numpy as np

import matplotlib.pyplot as plt

np.random.seed(42)

num_residents = 1000

weights = np.random.normal(70, 10, num_residents)

plt.figure(figsize=(10, 6))

plt.hist(weights, bins=20, color='blue', edgecolor='black')

plt.title("Weight Distribution of Residents")

plt.xlabel("Weight (kg)")

plt.ylabel("Frequency")

plt.grid(True)

plt.axvline(x=18.5, color='red', linestyle='dashed', label='Underweight')

plt.axvline(x=24.9, color='green', linestyle='dashed', label='Normal')

plt.axvline(x=29.9, color='orange', linestyle='dashed', label='Overweight')

plt.axvline(x=30, color='purple', linestyle='dashed', label='Obese')

plt.legend()

plt.show()

Page | 7
[Roll No:- 504122011057| Subhasish Ghosh

6. Product Sales Analysis:For a retail store, gather monthly sales data for two different
products.Plot a line graph to compare the sales trend of these products over the year.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

data = {

"Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],

"Product A Sales": [500, 600, 700, 750, 800, 900, 950, 1000, 1100, 1200, 1300, 1400],

"Product B Sales": [350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900]}

df = pd.DataFrame(data)

months = df["Month"]

product_a_sales = df["Product A Sales"]

product_b_sales = df["Product B Sales"]

plt.figure(figsize=(10, 6))

plt.plot(months, product_a_sales, marker='o', label='Product A')

plt.plot(months, product_b_sales, marker='o', label='Product B')

plt.title("The sales trend of these products over the year")

plt.xlabel("Month")

plt.ylabel("Sales")

plt.legend()

plt.grid(True)

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Page | 8
[Roll No:- 504122011057| Subhasish Ghosh

7. Website Traffic Analysis:Collect data on monthly website visits and sales for an e-commerce
website.Plot a scatterplot to analyze if there's a correlation between website visits and
sales.Bonus: Use different colors or sizes for points to represent different months.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

web_data = {'Month': ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],

'Website_Visits':[1000,1050,1100,1150,1200,1250,1300,1280,1260,1220,1180,1150],

'Sales': [200,210,230,250,270,290,300,295,280,260,240,220]}

ecommerce_data = pd.DataFrame(web_data)

visits = ecommerce_data["Website_Visits"]

sales = ecommerce_data["Sales"]

months = ecommerce_data["Month"]

point_sizes = np.sqrt(sales) * 5

plt.figure(figsize=(10, 6))

plt.scatter(visits, sales, c=range(len(months)), s=point_sizes, cmap='viridis', marker='o')

plt.title("Website Visits vs. Sales")

plt.xlabel("Website Visits")

Page | 9
[Roll No:- 504122011057| Subhasish Ghosh

plt.ylabel("Sales")

plt.colorbar(label="Month Index")

plt.grid(True)

plt.show()

8. Economic Growth Prediction:Collect GDP data for your country for the past 10 years.Plot a
line graph to visualize the economic growth.Bonus: Use regression to predict the GDP for
the next year.
Code:-
import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt

import seaborn as sns

data={'Year': ['2013','2014','2015','2016','2017','2018','2019','2020','2021','2022'],

'GDP_Growth':[6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00]}

df=pd.DataFrame(data)

Page | 10
[Roll No:- 504122011057| Subhasish Ghosh

plt.figure(figsize=(10,5))

sns.lineplot(data=df,x='Year',y='GDP_Growth',marker='o')

plt.title('Visualize The Economic Growth')

plt.show()

Years = np.array([2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022])

GDP = np.array([6.39,7.41,8.00,8.26,6.80,6.45,3.87,5.83,9.05,7.00])

years_reshape = Years.reshape(-1, 1)

model = LinearRegression()

model.fit(years_reshape, GDP)

next_year = 2023

predicted_gdp = model.predict([[next_year]])

print(f"Predicted GDP for {next_year}: {predicted_gdp[0]}")

plt.scatter(Years, GDP, color='blue')

plt.plot(Years, model.predict(years_reshape), color='red')

plt.xlabel('Year')

plt.ylabel('GDP')

plt.title('GDP Growth and Prediction')

plt.show()

Predicted GDP for 2023: 6.65933333333335

Page | 11
[Roll No:- 504122011057| Subhasish Ghosh

9. Movie Genre Popularity:Gather data on the number of movies released in different genres
in the past year.Create a bar plot to visualize which genre is the most popular based on the
number of releases.Bonus: Add a pie chart to show the distribution of movies across genres.
Code:-
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

Movie_data={'Genres':['Drama','Documentry','Comedy','Action','Thriller','Horror','Adventure','Romantic
Comedy','Musical'],

'Movie Genre Popularity':[145,74,57,54,46,39,28,11,6]}

df=pd.DataFrame(Movie_data)

plt.figure(figsize=(10,5))

sns.barplot(data=df,x='Genres',y='Movie Genre Popularity')

plt.xticks(rotation=75)

plt.title('Number of Movies Released')

plt.show()

plt.figure(figsize=(10,8))

plt.pie(df['Movie Genre Popularity'], labels=df['Genres'],autopct='%1.1f%%',

shadow=True)

Page | 12
[Roll No:- 504122011057| Subhasish Ghosh

plt.title('Distribution of Movies Across Genres')

plt.show()

10. School Performance Analysis:Collect data on student grades for a particular subject in a
school.Plot a histogram to visualize the distribution of grades.Bonus: Use different colors to
represent different classes or sections.
Code:-
import numpy as np

import matplotlib.pyplot as plt

np.random.seed(42)

Page | 13
[Roll No:- 504122011057| Subhasish Ghosh

class_a_grades = np.random.normal(70, 10, 200)

class_b_grades = np.random.normal(85, 8, 180)

plt.figure(figsize=(10, 6))

plt.hist([class_a_grades, class_b_grades], bins=15, color=['blue', 'green'], label=['Class A', 'Class B'])

plt.title("Grade Distribution by Class")

plt.xlabel("Grades")

plt.ylabel("Frequency")

plt.legend()

plt.grid(True)

plt.show()

Page | 14

CLG816C Service Manual 201404000-EN PDF
100% (1)
CLG816C Service Manual 201404000-EN PDF
423 pages
Cns-Atm Resource Guide
100% (1)
Cns-Atm Resource Guide
131 pages
Code - Cap 3
No ratings yet
Code - Cap 3
5 pages
Chennai, Bangalore and Hyderabad
50% (2)
Chennai, Bangalore and Hyderabad
52 pages
Line Chart
No ratings yet
Line Chart
33 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Matplotlib Cheat Sheet
100% (7)
Matplotlib Cheat Sheet
8 pages
2016 2017 Term1 Course Outline
No ratings yet
2016 2017 Term1 Course Outline
2 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Visualisation All
0% (1)
Visualisation All
70 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
World Happiness Report
No ratings yet
World Happiness Report
7 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
7 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Matplotlib
No ratings yet
Matplotlib
7 pages
ML
No ratings yet
ML
17 pages
Codes
No ratings yet
Codes
8 pages
Ventures Regression
No ratings yet
Ventures Regression
19 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
A9bf73 - Introduction To Matplotlib
No ratings yet
A9bf73 - Introduction To Matplotlib
18 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
9 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Practical File Artificial Intelligence Class 10
No ratings yet
Practical File Artificial Intelligence Class 10
11 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Matplotlib
No ratings yet
Matplotlib
5 pages
Numpy
No ratings yet
Numpy
9 pages
Forage 1
No ratings yet
Forage 1
9 pages
Class XII-IP-Practical File 1
No ratings yet
Class XII-IP-Practical File 1
28 pages
Fods QA
No ratings yet
Fods QA
2 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
Coca Cola Start
No ratings yet
Coca Cola Start
1 page
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
No ratings yet
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
23 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
American Graffiti
No ratings yet
American Graffiti
194 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
Assignment 4 On Visualization On Graph With Solution
No ratings yet
Assignment 4 On Visualization On Graph With Solution
14 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Lab Py
No ratings yet
Lab Py
9 pages
DVPD Final Lab Word PDF
No ratings yet
DVPD Final Lab Word PDF
93 pages
Eda Lab Assignment2
No ratings yet
Eda Lab Assignment2
10 pages
Practical File Class 12 2025-26
No ratings yet
Practical File Class 12 2025-26
19 pages
Brushing and Flossing The Teeth of Conscious and Unconscious Client Procedure Checklist
No ratings yet
Brushing and Flossing The Teeth of Conscious and Unconscious Client Procedure Checklist
6 pages
Certificate
No ratings yet
Certificate
25 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
External
No ratings yet
External
11 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Introductory Notes: Matplotlib: Preliminaries
No ratings yet
Introductory Notes: Matplotlib: Preliminaries
8 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
Comprehensive Data Visualization With Matplotlib and Seaborn
No ratings yet
Comprehensive Data Visualization With Matplotlib and Seaborn
40 pages
Fds QB
No ratings yet
Fds QB
6 pages
Chirayu (1) Merged Merged
No ratings yet
Chirayu (1) Merged Merged
76 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
DA Lab
No ratings yet
DA Lab
27 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Hadoroh 3 Tarikat
No ratings yet
Hadoroh 3 Tarikat
2 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
DVT Lab
No ratings yet
DVT Lab
15 pages
Health Hygiene Policy
No ratings yet
Health Hygiene Policy
2 pages
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
No ratings yet
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
7 pages
Module 2 Notes
No ratings yet
Module 2 Notes
41 pages
The Different Hypotheses Explaining The Origin of The Universe
No ratings yet
The Different Hypotheses Explaining The Origin of The Universe
29 pages
Model Test 133
No ratings yet
Model Test 133
16 pages
Manual WFDJ7010
No ratings yet
Manual WFDJ7010
33 pages
Distribution Strategy at Walmart
No ratings yet
Distribution Strategy at Walmart
10 pages
Construction Presentation-008 First Project
No ratings yet
Construction Presentation-008 First Project
14 pages
Outline
No ratings yet
Outline
41 pages
Allama Iqbal Open University, Islamabad (Department of English Language & Applied Linguistics) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of English Language & Applied Linguistics) Warning
2 pages
UNIT 8 GRADE 10 MOCK TEST - Key
No ratings yet
UNIT 8 GRADE 10 MOCK TEST - Key
6 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
M.a.mmuthaib Unit 45 Internet of Things 22
No ratings yet
M.a.mmuthaib Unit 45 Internet of Things 22
94 pages
Lesson 6 Evan S Dela Rosa
No ratings yet
Lesson 6 Evan S Dela Rosa
6 pages
Course Title: Economics For Engineers Credit Units: 2 Course Code: ECON132
No ratings yet
Course Title: Economics For Engineers Credit Units: 2 Course Code: ECON132
2 pages
Maths 3d Geometry
No ratings yet
Maths 3d Geometry
2 pages
Senior Two Notes - Sculpture in The Round
No ratings yet
Senior Two Notes - Sculpture in The Round
5 pages
311 Application SC Rout
No ratings yet
311 Application SC Rout
5 pages
Poverty and Mental Health Final
No ratings yet
Poverty and Mental Health Final
25 pages
Living Things & Non - Living Things
No ratings yet
Living Things & Non - Living Things
8 pages
Customization Customization Customization: 450-2200SF 1000-1200SF 450-2200SF 1000-1200SF 450-2200SF 1000-1200SF
No ratings yet
Customization Customization Customization: 450-2200SF 1000-1200SF 450-2200SF 1000-1200SF 450-2200SF 1000-1200SF
21 pages
AVC (Average Variable Cost) ATC (Average Total Cost) MC (Marginal Cost)
No ratings yet
AVC (Average Variable Cost) ATC (Average Total Cost) MC (Marginal Cost)
2 pages
Influence of Apparatus Geometry and Deposition Conditions On The Structure and Topography of Thick Sputtered Coatings
No ratings yet
Influence of Apparatus Geometry and Deposition Conditions On The Structure and Topography of Thick Sputtered Coatings
6 pages
Visualizing Financial Data
From Everand
Visualizing Financial Data
Julie Rodriguez
No ratings yet
Monetizing Your Data: A Guide to Turning Data into Profit-Driving Strategies and Solutions
From Everand
Monetizing Your Data: A Guide to Turning Data into Profit-Driving Strategies and Solutions
Kathy Williams Chiang
No ratings yet