Python Class 6 Assignment Solution
Python Class 6 Assignment Solution
1- Create a Pandas Data frame from the given data and create a new column “Voter” based on
voter age, i.e., if age >18 then voter column should be “Yes” otherwise if age <18 then voter
column should be “No”
ills
Solution:
import pandas as pd
Sk
import numpy as np
print(df)
ro
2 – Create a Pandas Data frame from the given data and collapse First and Last column into
G
one column as Full Name, so the output contains Full Name and Age, then convert column age
to index
Solution:
raw_Data = {'First': ['Manan', 'Raghav', 'Sunny'],
df = pd.DataFrame(raw_Data)
ills
df['Full Name'] = df['First'] + ' ' + df['Last']
Sk
df.set_index('Age', inplace=True)
print(df)
a
3- Create a Pandas Data frame from the given data -
at
raw_Data = {'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Product':['Umbrella', 'Matress', 'Badminton','Shuttle'],
D
b- Find the index labels of all items whose ‘Price’ is greater than 1000.
c- Replace products using Map() with respective codes- Umbrella : ‘U’, Matress : 'M', Badminton
ro
e- Create a new column called ‘Discounted_Price’ after applying a 10% discount on the existing
‘price’ column.(try using lambda function)
g- Create a column rank which ranks the products based on the price (one with highest price will
be rank 1).
Solution:
raw_Data = {'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
df = pd.DataFrame(raw_Data)
ills
# Task a: Add Index as Item1, Item2, Item3, Item4
Sk
# Task b: Find index labels with Price > 1000
a
# Task c: Replace products using Map()
df['Expense'] = df['Expense'].round(2)
w
df['Date'] = pd.to_datetime(df['Date'])
df['Rank'] = df['Price'].rank(ascending=False).astype(int)
print(df)
Assignment: Exploring NBA Player Data
Download the nba.csv file containing NBA player data Complete the following tasks using
Python, Pandas, and data visualization libraries:
1. Load Data:
ills
● Display basic information about the DataFrame.
2. Data Cleaning:
Sk
● Remove duplicate rows.
3. Data Transformation:
● Create a new column 'BMI' (Body Mass Index) using the formula: BMI = (weight in
pounds / (height in inches)^2) * 703.(Assuming a fixed height value of 70 inches (5 feet
a
10 inches)
● Calculate the average age, weight, and salary of players in each 'position' category.
D
5. Data Visualization:
● Plot a scatter plot of 'age' vs. 'salary' with a different color for each 'position'.
ro
6. Top Players:
7. College Analysis:
8. Position Distribution:
● Plot a pie chart to show the distribution of players across different 'positions'.
9. Team Analysis:
10. Extras
Guidelines:
ills
1. Write Python code to complete each task.
Sk
4. Include necessary library imports.
Solution:
a
at
1. Load Data:
import pandas as pd
D
df = pd.read_csv('nba.csv')
ro
print(df.info())
G
print(df.head())
2. Data Cleaning:
df.dropna(inplace=True)
# Remove duplicate rows
df.drop_duplicates(inplace=True)
fixed_height = 70
ills
# Create 'BMI' column
Sk
4. Exploratory Data Analysis (EDA):
# Summary statistics
print(avg_by_position)
D
5. Data Visualization:
w
plt.hist(df['Age'], bins=20)
G
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
df.boxplot(column='Salary', by='Position')
plt.ylabel('Salary')
plt.suptitle('')
plt.xticks(rotation=45)
plt.show()
ills
# Scatter plot of 'age' vs. 'salary' by position
plt.figure(figsize=(10, 6))
Sk
colors = {'PG': 'red', 'SG': 'blue', 'SF': 'green', 'PF': 'purple', 'C': 'orange'}
plt.xlabel('Age')
plt.ylabel('Salary')
a
plt.title('Age vs. Salary by Position')
at
plt.legend(colors)
plt.show()
D
6. Top Players:
w
print(top_players)
ro
7. College Analysis:
G
top_colleges = df['College'].value_counts().nlargest(5)
print(top_colleges)
8. Position Distribution:
position_counts = df['Position'].value_counts()
plt.pie(position_counts, labels=position_counts.index, autopct='%1.1f%%', startangle=140)
plt.axis('equal')
plt.show()
9. Team Analysis:
ills
avg_salary_by_team = df.groupby('Team')['Salary'].mean()
print(avg_salary_by_team)
Sk
plt.figure(figsize=(10, 6))
avg_salary_by_team.plot(kind='bar')
plt.xlabel('Team')
plt.ylabel('Average Salary')
a
plt.title('Average Salary of Players by Team')
at
plt.xticks(rotation=45)
plt.show()
D
10.Extras:
w
min_weight_index = df['Weight'].idxmin()
print(df_sorted)
name_series = df['Name']