0% found this document useful (0 votes)

13 views23 pages

NM

Uploaded by

vetrikumar6380268095

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views23 pages

NM

Uploaded by

vetrikumar6380268095

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

1.

Here is a simplified Python code to address the given problem. The code assumes you have a dataset
(e.g., a CSV file) with student names and their scores in various subjects (e.g., Math, Science, English).

### Python Code

```python

# Import necessary libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Load the dataset into a Pandas DataFrame

# Replace 'your_file.csv' with the path to your dataset

df = pd.read_csv('your_file.csv')

# Handle missing values by replacing them with the mean of the respective column

df.fillna(df.mean(), inplace=True)
# Calculate the average score for each student

df['Average_Score'] = df.iloc[:, 1:].mean(axis=1) # Assuming the first column is student names

# Categorize students into performance levels

def categorize_performance(avg_score):

if avg_score >= 80:

return 'High'

elif avg_score >= 50:

return 'Medium'

else:

return 'Low'

df['Performance_Category'] = df['Average_Score'].apply(categorize_performance)

# Identify the subject with the highest average score across students

subject_avg_scores = df.iloc[:, 1:-2].mean()

highest_avg_subject = subject_avg_scores.idxmax()

# Determine the number of students in each performance category

category_counts = df['Performance_Category'].value_counts()

# Visualization: Bar chart for average score per subject

subject_avg_scores.plot(kind='bar', title='Average Score Per Subject', ylabel='Average Score',

xlabel='Subjects', color='skyblue')

plt.show()

# Visualization: Pie chart for performance category distribution

category_counts.plot(kind='pie', autopct='%1.1f%%', title='Performance Category Distribution',

ylabel='')

plt.show()

```

### Explanation

1. Data Loading and Cleaning:

- Loads a CSV file into a Pandas DataFrame.

- Handles missing values by replacing them with the column mean.

2. **Data Manipulation:**

- Calculates the average score for each student.

- Categorizes students based on their average score into "High," "Medium," or "Low."

3. **Analysis:**

- Finds the subject with the highest average score across all students.

- Counts the number of students in each performance category.

4. **Visualization:**

- Creates a bar chart showing the average scores for each subject.

- Creates a pie chart showing the percentage of students in each performance category.

### Instructions to Run

1. Upload your dataset (e.g., `your_file.csv`) to Google Colab.

2. Replace `'your_file.csv'` in the code with the actual file path.

3. Run the code cells step-by-step in Google Colab.

### Sample Output (Assuming Example Dataset)

**Bar Chart:**

Displays a bar chart with average scores for Math, Science, and English.

**Pie Chart:**

Shows a pie chart with categories like "High" (30%), "Medium" (50%), and "Low" (20%).

**Console Output:**

- Subject with the highest average score: `Science`

- Performance category counts:

```

Medium 5

High 3

Low 2

Name: Performance_Category, dtype: int64

```
Here’s a concise Python code that you can run in Google Colab to analyze a COVID-19 dataset as
described in the question.

### Python Code

```python

# Import necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'covid_data.csv' with the path to your dataset

df = pd.read_csv('covid_data.csv')

# Handle missing values and duplicates

df.fillna(0, inplace=True)

df.drop_duplicates(inplace=True)

# Add a new column for daily new cases

df['New_Cases'] = df['Total_Cases'].diff().fillna(0)

# Extract 'Date' into separate columns for Year, Month, and Day

df['Date'] = pd.to_datetime(df['Date'])

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

df['Day'] = df['Date'].dt.day

# Calculate total cases and deaths globally

total_cases = df['Total_Cases'].sum()
total_deaths = df['Total_Deaths'].sum()

# Identify the country with the highest number of cases and deaths

country_cases = df.groupby('Country')['Total_Cases'].max()

country_deaths = df.groupby('Country')['Total_Deaths'].max()

highest_cases_country = country_cases.idxmax()

highest_deaths_country = country_deaths.idxmax()

# Analyze daily new cases trend (last 30 days)

last_30_days = df[df['Date'] >= (df['Date'].max() - pd.Timedelta(days=30))]

# Visualization: Line chart for total cases trend

df.groupby('Date')['Total_Cases'].sum().plot(kind='line', title='Trend of Total COVID-19 Cases Over

Time', ylabel='Total Cases', xlabel='Date')

plt.show()

# Bar chart for top 5 countries with the highest cases

top_5_countries = country_cases.nlargest(5)

top_5_countries.plot(kind='bar', title='Top 5 Countries with Highest Cases', ylabel='Total Cases',

xlabel='Countries', color='orange')

plt.show()

# Pie chart for proportion of cases in continents

continent_cases = df.groupby('Continent')['Total_Cases'].sum()

continent_cases.plot(kind='pie', autopct='%1.1f%%', title='Proportion of Cases by Continent', ylabel='')

plt.show()

# Print key results

print(f"Total cases globally: {total_cases}")

print(f"Total deaths globally: {total_deaths}")

print(f"Country with highest cases: {highest_cases_country}")

print(f"Country with highest deaths: {highest_deaths_country}")

```

---

### Explanation

1. Data Loading and Cleaning:

- The dataset is loaded into a DataFrame, missing values are replaced with 0, and duplicates are
dropped.

2. **Data Manipulation:**

- Calculates daily new cases (`New_Cases`).

- Extracts `Year`, `Month`, and `Day` from the `Date` column for analysis.

3. **Analysis:**

- Computes total global cases and deaths.

- Identifies the countries with the highest cases and deaths.

- Filters data for the last 30 days to analyze trends.

4. **Visualization:**

- Line Chart: Shows the trend of total cases over time.

- **Bar Chart:** Displays the top 5 countries with the highest cases.

- Pie Chart: Shows the proportion of cases by continent.

---
### Instructions to Run

1. Upload your dataset (e.g., `covid_data.csv`) to Google Colab.

2. Replace `'covid_data.csv'` in the code with the file name.

3. Run each code cell step-by-step to load, analyze, and visualize the data.

---

### Sample Output (Assuming Example Dataset)

**Console Output:**

```

Total cases globally: 500,000,000

Total deaths globally: 5,000,000

Country with highest cases: USA

Country with highest deaths: Brazil

```

**Visualizations:**

1. Line chart showing the rising trend of total cases globally.

2. Bar chart highlighting the top 5 countries with the highest total cases.

3. Pie chart dividing the proportion of cases by continent.

Here’s a simple Python code that you can run in Google Colab to analyze a sales dataset as described
in the question.

### Python Code

```python

# Import necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'sales_data.csv' with the path to your dataset

df = pd.read_csv('sales_data.csv')

# Handle missing values and duplicates

df.fillna(0, inplace=True)

df.drop_duplicates(inplace=True)

# Add a new column for total revenue

df['Total_Revenue'] = df['Quantity'] * df['Price']

# Group by product category to calculate total revenue and number of items sold

category_summary = df.groupby('Product_Category').agg(

Total_Revenue=('Total_Revenue', 'sum'),

Total_Quantity=('Quantity', 'sum')

# Identify the top 3 products generating the highest revenue

top_products = df.groupby('Product').agg(Total_Revenue=('Total_Revenue', 'sum')).nlargest(3,

'Total_Revenue')

# Determine the month with the highest total sales

df['Date'] = pd.to_datetime(df['Date'])

df['Month'] = df['Date'].dt.to_period('M')

monthly_sales = df.groupby('Month').agg(Total_Revenue=('Total_Revenue', 'sum'))

highest_sales_month = monthly_sales.idxmax()

# Visualization: Bar chart for total revenue by product category

category_summary['Total_Revenue'].plot(kind='bar', title='Total Revenue by Product Category',

ylabel='Total Revenue', xlabel='Product Category', color='green')

plt.show()

# Visualization: Line graph for monthly sales trends

monthly_sales.plot(kind='line', title='Monthly Sales Trends', ylabel='Total Revenue', xlabel='Month',

marker='o', color='blue')
plt.show()

# Print key results

print("Top 3 products generating highest revenue:")

print(top_products)

print(f"Month with highest total sales: {highest_sales_month}")

```

---

### Explanation

1. Data Loading and Cleaning:

- Loads the sales dataset into a Pandas DataFrame.

- Handles missing values by replacing them with 0 and removes duplicate entries.

2. **Data Manipulation:**

- Calculates `Total_Revenue` for each transaction as `Quantity × Price`.

- Groups the data by `Product_Category` to calculate total revenue and number of items sold.

3. **Analysis:**

- Identifies the top 3 products generating the highest revenue.

- Determines the month with the highest total sales.

4. **Visualization:**

- Bar Chart: Displays total revenue by product category.

- Line Graph: Shows monthly sales trends.

---
### Instructions to Run

1. Upload your dataset (e.g., `sales_data.csv`) to Google Colab.

2. Replace `'sales_data.csv'` in the code with your dataset's filename.

3. Run each code cell step-by-step to analyze and visualize the data.

---

### Sample Output (Assuming Example Dataset)

**Console Output:**

```

Top 3 products generating highest revenue:

Total_Revenue

Product

Product_A 100000.00

Product_B 80000.00

Product_C 75000.00

Month with highest total sales: 2024-05

```

**Visualizations:**

1. **Bar Chart:** Shows total revenue for categories like "Electronics," "Furniture," etc.

2. **Line Graph:** Displays sales trends over months with peaks and valleys.
[12/6, 8:53 PM] : Below is a Python code template to solve the tourism data analysis problem
described. You'll need a tourism dataset in CSV format to run this code. The code will include the
required steps, explanations, and instructions to execute it in Google Colab.

### Code

```python

# Step 1: Import Libraries

import pandas as pd

import matplotlib.pyplot as plt

# Step 2: Load the Dataset

# Replace 'tourism_data.csv' with your actual file name

from google.colab import files

uploaded = files.upload() # Upload the dataset

data = pd.read_csv(list(uploaded.keys())[0])

# Step 3: Data Cleaning

data.drop_duplicates(inplace=True) # Remove duplicate rows

data.dropna(inplace=True) # Drop rows with missing values

# Step 4: Data Manipulation

# Add Total Visitors column

data['Total_Visitors'] = data['Domestic_Visitors'] + data['International_Visitors']

# Extract year and month from the 'Date' column

data['Date'] = pd.to_datetime(data['Date'])

data['Year'] = data['Date'].dt.year

data['Month'] = data['Date'].dt.month

# Step 5: Analysis

# Identify the month with the highest total visitors

highest_month = data.loc[data['Total_Visitors'].idxmax()]

# Calculate the average number of visitors per year

average_visitors_per_year = data.groupby('Year')['Total_Visitors'].mean()

# Proportion of domestic vs international visitors by year

proportion = data.groupby('Year')[['Domestic_Visitors', 'International_Visitors']].sum()

proportion['Domestic_Proportion'] = proportion['Domestic_Visitors'] /
(proportion['Domestic_Visitors'] + proportion['International_Visitors'])

proportion['International_Proportion'] = proportion['International_Visitors'] /
(proportion['Domestic_Visitors'] + proportion['International_Visitors'])

# Step 6: Visualization

# Bar Chart - Total Visitors per Month

monthly_totals = data.groupby('Month')['Total_Visitors'].sum()

monthly_totals.plot(kind='bar', title='Total Visitors Per Month', ylabel='Visitors', xlabel='Month')

plt.show()

# Pie Chart - Proportion of Domestic vs International Visitors

latest_year = data['Year'].max()

latest_data = proportion.loc[latest_year]

latest_data[['Domestic_Proportion', 'International_Proportion']].plot(kind='pie', autopct='%1.1f%%',

title=f'Domestic vs International Visitors ({latest_year})', ylabel='')

plt.show()

# Line Graph - Trend of Total Visitors Over the Years

yearly_totals = data.groupby('Year')['Total_Visitors'].sum()

yearly_totals.plot(kind='line', title='Total Visitors Over the Years', ylabel='Visitors', xlabel='Year')

plt.show()

# Step 7: Output Results

print("Month with Highest Total Visitors:")

print(highest_month)

print("\nAverage Visitors Per Year:")

print(average_visitors_per_year)

print("\nProportion of Domestic vs International Visitors by Year:")

print(proportion)

```

---

### Instructions to Run in Google Colab

1. **Upload the Dataset**: Replace the placeholder `tourism_data.csv` with your dataset. When you
run the `files.upload()` block, it will prompt you to upload your file.

2. **Install Required Libraries**: Google Colab already includes `pandas` and `matplotlib`. No
additional installations are necessary.
3. **Run the Cells**: Copy and paste the code into Google Colab and execute each cell sequentially.

---

### Explanation of the Code

1. **Data Cleaning**: Handles duplicates and missing values to ensure data consistency.

2. **Data Manipulation**: Calculates the total visitors and extracts `Year` and `Month` for analysis.

3. **Analysis**:

- Finds the month with the highest visitors.

- Calculates average yearly visitors.

- Analyzes proportions of domestic and international visitors.

4. **Visualization**: Uses bar, pie, and line plots to display results graphically.

---

### Output

The output includes:

1. **Text Outputs**:

- The month with the highest visitors.

- The average number of visitors per year.

- Proportion data.

2. **Graphs**:

- A bar chart for total visitors per month.

- A pie chart for domestic vs international visitor proportions.

- A line graph showing the trend of total visitors over the years.

Run the code to view the exact outputs based on your dataset. If you'd like me to adjust the code or
work with a sample dataset, let me know!
[12/6, 8:53 PM] : Here's an example of what the output might look like if we use a fictional
dataset for tourism data. This will give you an idea of the expected results:

### Sample Text Output

#### Month with Highest Total Visitors

```

Date 2023-07-01

Domestic_Visitors 500,000

International_Visitors 300,000

Total_Visitors 800,000

Year 2023

Month 7

Name: 189, dtype: object

```

#### Average Visitors Per Year

```

Year

2019 450,000.0

2020 200,000.0

2021 350,000.0

2022 500,000.0

2023 600,000.0

Name: Total_Visitors, dtype: float64

```

#### Proportion of Domestic vs International Visitors by Year

```
Domestic_Visitors International_Visitors Domestic_Proportion International_Proportion

Year

2019 2,000,000 700,000 0.740 0.260

2020 1,200,000 500,000 0.706 0.294

2021 1,500,000 700,000 0.682 0.318

2022 2,000,000 1,000,000 0.667 0.333

2023 2,500,000 1,300,000 0.658 0.342

```

---

### Sample Visualizations

1. Bar Chart: Total Visitors Per Month

A bar chart showing total visitors for each month, with July as the peak month.

2. Pie Chart: Proportion of Domestic vs International Visitors (2023)

A pie chart for 2023 might show:

- 65.8% Domestic Visitors

- 34.2% International Visitors

3. Line Graph: Total Visitors Over the Years

A line graph showing a general upward trend in tourism, with a dip in 2020 (likely due to external
factors like a pandemic) and steady growth afterward.

---

### Key Notes

- The outputs will vary depending on your dataset.

- If you don't have real tourism data, you can simulate data by creating a CSV file with columns like
`Date`, `Domestic_Visitors`, and `International_Visitors`.

- Let me know if you’d like help generating sample data for testing!

Here is how you can run the code on **Google Colab**, step-by-step:

---

### Step 1: Open Google Colab

1. Go to [Google Colab](https://colab.research.google.com/).

2. Create a new notebook by clicking on "File > New Notebook".

---

### Step 2: Upload the Dataset

1. Save your dataset (e.g., `bank_campaign_data.csv`) on your local machine.

2. In Google Colab, click on the folder icon in the left sidebar.

3. Click the upload icon and upload your dataset.

---

### Step 3: Run the Code

1. Copy and paste the following Python code into a code cell in Colab:

```python

# Install necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Load the dataset

from google.colab import files

uploaded = files.upload() # Upload the dataset here

file_path = list(uploaded.keys())[0] # Get the uploaded file name

data = pd.read_csv(file_path)

# Step 2: Data Cleaning

# Handle missing values

data.fillna(method='ffill', inplace=True)

# Drop duplicate entries

data.drop_duplicates(inplace=True)

# Step 3: Data Manipulation

# Add a column for Contacted_Last_Month

data['Contacted_Last_Month'] = data['campaign'].apply(lambda x: 'Yes' if x > 0 else 'No')

# Convert categorical variables to numeric using one-hot encoding

categorical_cols = ['job', 'marital', 'education']

data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)

# Step 4: Analysis

# Average age of customers who subscribed

avg_age = data[data['y'] == 'yes']['age'].mean()

# Most common job category for subscribed customers

most_common_job = data[data['y'] == 'yes']['job'].mode()[0]

# Proportion of subscribed customers

subscribed_proportion = len(data[data['y'] == 'yes']) / len(data)

# Step 5: Visualization

# Bar chart showing subscription rate by job

sns.countplot(x='job', hue='y', data=data)

plt.title('Subscription Rate by Job')

plt.xticks(rotation=45)

plt.show()

# Pie chart showing subscription proportion

data['y'].value_counts().plot.pie(autopct='%1.1f%%', labels=['Not Subscribed', 'Subscribed'])

plt.title('Subscription Proportion')

plt.ylabel('')

plt.show()
# Histogram for age distribution

data['age'].plot.hist(bins=10)

plt.title('Distribution of Customer Ages')

plt.xlabel('Age')

plt.show()

# Print analysis results

print(f"Average Age of Subscribed Customers: {avg_age:.2f}")

print(f"Most Common Job for Subscribed Customers: {most_common_job}")

print(f"Proportion of Subscribed Customers: {subscribed_proportion:.2%}")

```

2. Run the cell.

3. When prompted, upload your dataset (e.g., `bank_campaign_data.csv`).

---

### Sample Output:

1. The console will display:

```

Average Age of Subscribed Customers: 41.20

Most Common Job for Subscribed Customers: admin

Proportion of Subscribed Customers: 12.50%

```

2. Visualizations:

- Bar Chart: Subscription rate by job category.

- Pie Chart: Proportion of subscribed vs. not subscribed customers.

- **Histogram**: Age distribution of customers.

---

### **Note**:

Make sure your dataset includes the necessary columns like `age`, `job`, `campaign`, `y`, and other
required fields. Adjust column names in the code if they differ in your dataset.

Summer Internship Project Report
100% (1)
Summer Internship Project Report
66 pages
Chemical Engineering, March 2014
100% (1)
Chemical Engineering, March 2014
92 pages
Abraham Wondale
No ratings yet
Abraham Wondale
73 pages
Industrial Report
No ratings yet
Industrial Report
56 pages
Media Studies
No ratings yet
Media Studies
44 pages
Agri-Fishery Arts: Module 1: Importance of Planting Trees
No ratings yet
Agri-Fishery Arts: Module 1: Importance of Planting Trees
22 pages
Digital Touchpoints - SMO - Digital Economy
No ratings yet
Digital Touchpoints - SMO - Digital Economy
8 pages
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
No ratings yet
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
9 pages
WS10. LETTER OF COMPLAINT ClassX
No ratings yet
WS10. LETTER OF COMPLAINT ClassX
4 pages
Hydrotest Formula
100% (2)
Hydrotest Formula
17 pages
Ip Project
No ratings yet
Ip Project
27 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
X. 509 Certificate Directory: Compiled & Prepared by Dr. Sambhaji Sarode CSE, MIT ADT University Pune
No ratings yet
X. 509 Certificate Directory: Compiled & Prepared by Dr. Sambhaji Sarode CSE, MIT ADT University Pune
14 pages
Avionics Data Buses & Architectures
No ratings yet
Avionics Data Buses & Architectures
27 pages
4 交易之王语录
No ratings yet
4 交易之王语录
98 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
23 pages
Corporate Structure and Culture
No ratings yet
Corporate Structure and Culture
4 pages
Intro To Py and ML - Part 2
No ratings yet
Intro To Py and ML - Part 2
10 pages
User Manual 2195612
No ratings yet
User Manual 2195612
2 pages
From: Sent: To: Subject
No ratings yet
From: Sent: To: Subject
2 pages
Otondro Prohori, Guarding Who, Against What
No ratings yet
Otondro Prohori, Guarding Who, Against What
10 pages
Habeas
No ratings yet
Habeas
5 pages
Syllabus For Paper I & Ii For M.B.B.S Course Subject - Anatomy
No ratings yet
Syllabus For Paper I & Ii For M.B.B.S Course Subject - Anatomy
10 pages
px840t 12 Dfu Eng
No ratings yet
px840t 12 Dfu Eng
19 pages
Chapter1 Corporation and Corporate Governance
No ratings yet
Chapter1 Corporation and Corporate Governance
4 pages
Framemaker Has Two Ways of Approaching Documents: and Unstructured
No ratings yet
Framemaker Has Two Ways of Approaching Documents: and Unstructured
3 pages
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
No ratings yet
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
2 pages
Assignment 4 On Visualization On Graph With Solution
No ratings yet
Assignment 4 On Visualization On Graph With Solution
14 pages
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
No ratings yet
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
14 pages
Text
No ratings yet
Text
7 pages
Grade 12 - IP Practicals (1 To 9)
No ratings yet
Grade 12 - IP Practicals (1 To 9)
12 pages
Practicals
No ratings yet
Practicals
42 pages
Volume Profile 部分20
No ratings yet
Volume Profile 部分20
5 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Cap 793
No ratings yet
Cap 793
17 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Certificate
No ratings yet
Certificate
25 pages
Computer Science Ip
No ratings yet
Computer Science Ip
16 pages
2024 Emerging Space Brief Satellite Servicing
No ratings yet
2024 Emerging Space Brief Satellite Servicing
6 pages
My P Report
No ratings yet
My P Report
14 pages
Exams and Training - Cisco
No ratings yet
Exams and Training - Cisco
6 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Python Pandas Project
No ratings yet
Python Pandas Project
17 pages
IP Practical
No ratings yet
IP Practical
24 pages
Report MSA Practice02
No ratings yet
Report MSA Practice02
29 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Profitanalysis
No ratings yet
Profitanalysis
18 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Project File - A
No ratings yet
Project File - A
20 pages
Untitled 5
No ratings yet
Untitled 5
10 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
AI Practical Project
No ratings yet
AI Practical Project
15 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
7 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
Codes
No ratings yet
Codes
44 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Tax Receipt Transport Department, Government of Bihar Registration Authority PATNA, Bihar
No ratings yet
Tax Receipt Transport Department, Government of Bihar Registration Authority PATNA, Bihar
1 page
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
9 pages
Marking Scheme Practical Paper
No ratings yet
Marking Scheme Practical Paper
5 pages
EXP 5 DE Lab
No ratings yet
EXP 5 DE Lab
5 pages
A Roadmap For A 3PL
No ratings yet
A Roadmap For A 3PL
2 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Rough Note Text
No ratings yet
Rough Note Text
4 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
DVT Exp - 7
No ratings yet
DVT Exp - 7
11 pages
Assignment 8
No ratings yet
Assignment 8
2 pages
1 2 Merged
No ratings yet
1 2 Merged
12 pages
Naan Mudhalvan - Google Cloud Data Analytics
No ratings yet
Naan Mudhalvan - Google Cloud Data Analytics
33 pages
INDEX
No ratings yet
INDEX
16 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
The Hippocampus in Clinical Neuroscience Frontiers of Neurology and Neuroscience Vol 34
No ratings yet
The Hippocampus in Clinical Neuroscience Frontiers of Neurology and Neuroscience Vol 34
306 pages
ML Report
No ratings yet
ML Report
12 pages
Practice Questions2
No ratings yet
Practice Questions2
2 pages
DAV ESX Answer
No ratings yet
DAV ESX Answer
58 pages
Practical File Class 12 2025-26
No ratings yet
Practical File Class 12 2025-26
19 pages
DMV600 Mock Test
No ratings yet
DMV600 Mock Test
3 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Even Students
No ratings yet
Even Students
36 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet

NM

Uploaded by

NM

Uploaded by

1.

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset into a Pandas DataFrame

# Replace 'your_file.csv' with the path to your dataset

df['Average_Score'] = df.iloc[:, 1:].mean(axis=1) # Assuming the first column is student names

# Categorize students into performance levels

if avg_score >= 80:

elif avg_score >= 50:

subject_avg_scores = df.iloc[:, 1:-2].mean()

# Determine the number of students in each performance category

# Visualization: Bar chart for average score per subject

subject_avg_scores.plot(kind='bar', title='Average Score Per Subject', ylabel='Average Score',

# Visualization: Pie chart for performance category distribution

category_counts.plot(kind='pie', autopct='%1.1f%%', title='Performance Category Distribution',

1. **Data Loading and Cleaning:**

- Loads a CSV file into a Pandas DataFrame.

- Handles missing values by replacing them with the column mean.

- Calculates the average score for each student.

- Counts the number of students in each performance category.

### Instructions to Run

1. Upload your dataset (e.g., `your_file.csv`) to Google Colab.

2. Replace `'your_file.csv'` in the code with the actual file path.

3. Run the code cells step-by-step in Google Colab.

### Sample Output (Assuming Example Dataset)

- Subject with the highest average score: `Science`

- Performance category counts:

Name: Performance_Category, dtype: int64

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'covid_data.csv' with the path to your dataset

# Handle missing values and duplicates

# Add a new column for daily new cases

# Calculate total cases and deaths globally

# Analyze daily new cases trend (last 30 days)

last_30_days = df[df['Date'] >= (df['Date'].max() - pd.Timedelta(days=30))]

# Visualization: Line chart for total cases trend

df.groupby('Date')['Total_Cases'].sum().plot(kind='line', title='Trend of Total COVID-19 Cases Over

# Bar chart for top 5 countries with the highest cases

top_5_countries.plot(kind='bar', title='Top 5 Countries with Highest Cases', ylabel='Total Cases',

# Pie chart for proportion of cases in continents

continent_cases.plot(kind='pie', autopct='%1.1f%%', title='Proportion of Cases by Continent', ylabel='')

# Print key results

print(f"Total cases globally: {total_cases}")

print(f"Country with highest cases: {highest_cases_country}")

print(f"Country with highest deaths: {highest_deaths_country}")

1. **Data Loading and Cleaning:**

- Calculates daily new cases (`New_Cases`).

- Computes total global cases and deaths.

- Identifies the countries with the highest cases and deaths.

- Filters data for the last 30 days to analyze trends.

- **Line Chart:** Shows the trend of total cases over time.

- **Pie Chart:** Shows the proportion of cases by continent.

1. Upload your dataset (e.g., `covid_data.csv`) to Google Colab.

2. Replace `'covid_data.csv'` in the code with the file name.

### Sample Output (Assuming Example Dataset)

Total cases globally: 500,000,000

Total deaths globally: 5,000,000

Country with highest cases: USA

Country with highest deaths: Brazil

1. Line chart showing the rising trend of total cases globally.

3. Pie chart dividing the proportion of cases by continent.

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'sales_data.csv' with the path to your dataset

# Handle missing values and duplicates

# Add a new column for total revenue

df['Total_Revenue'] = df['Quantity'] * df['Price']

# Identify the top 3 products generating the highest revenue

top_products = df.groupby('Product').agg(Total_Revenue=('Total_Revenue', 'sum')).nlargest(3,

# Determine the month with the highest total sales

1. Data Loading and Cleaning:

1. Data Loading and Cleaning:

- Line Chart: Shows the trend of total cases over time.

- Pie Chart: Shows the proportion of cases by continent.

1. Data Loading and Cleaning:

- Bar Chart: Displays total revenue by product category.

- Line Graph: Shows monthly sales trends.

### Sample Text Output

#### Month with Highest Total Visitors

#### Average Visitors Per Year

#### Proportion of Domestic vs International Visitors by Year

### Sample Visualizations

1. Bar Chart: Total Visitors Per Month

2. Pie Chart: Proportion of Domestic vs International Visitors (2023)