0% found this document useful (0 votes)
76 views

Python Class 6 Assignment Solution

Uploaded by

Arpit Dubey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Python Class 6 Assignment Solution

Uploaded by

Arpit Dubey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

PANDAS:

1- Create a Pandas Data frame from the given data and create a new column “Voter” based on
voter age, i.e., if age >18 then voter column should be “Yes” otherwise if age <18 then voter
column should be “No”

raw_Data = {'Voter_name': ['Geek1', 'Geek2', 'Geek3', 'Geek4',


'Geek5', 'Geek6', 'Geek7', 'Geek8'],
'Voter_age': [15, 23, 25, 9, 67, 54, 42, np.NaN]}

ills
Solution:

import pandas as pd

Sk
import numpy as np

raw_Data = {'Voter_name': ['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5', 'Geek6', 'Geek7',


'Geek8'], a
'Voter_age': [15, 23, 25, 9, 67, 54, 42, np.NaN]}
at
df = pd.DataFrame(raw_Data)
D

# Create a new column "Voter" based on voter age

df['Voter'] = np.where(df['Voter_age'] > 18, 'Yes', 'No')


w

print(df)
ro

2 – Create a Pandas Data frame from the given data and collapse First and Last column into
G

one column as Full Name, so the output contains Full Name and Age, then convert column age
to index

raw_Data = {'First': ['Manan ', 'Raghav ', 'Sunny '],


'Last': ['Goel', 'Sharma', 'Chawla'],
'Age' : [12, 24, 56]}

Solution:
raw_Data = {'First': ['Manan', 'Raghav', 'Sunny'],

'Last': ['Goel', 'Sharma', 'Chawla'],

'Age': [12, 24, 56]}

df = pd.DataFrame(raw_Data)

# Combine First and Last columns into Full Name

ills
df['Full Name'] = df['First'] + ' ' + df['Last']

# Set Age as index

Sk
df.set_index('Age', inplace=True)

print(df)

a
3- Create a Pandas Data frame from the given data -
at
raw_Data = {'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Product':['Umbrella', 'Matress', 'Badminton','Shuttle'],
D

'Price':[1250, 1450, 1550, 400],


'Expense': [ 21525220.653, 31125840.875, 23135428.768, 56245263.942]}

a- Add Index as Item1, Item2, Item3, Item4


w

b- Find the index labels of all items whose ‘Price’ is greater than 1000.

c- Replace products using Map() with respective codes- Umbrella : ‘U’, Matress : 'M', Badminton
ro

: 'B', Shuttle: 'S'

d- Round off the Expense column values to two decimal places.


G

e- Create a new column called ‘Discounted_Price’ after applying a 10% discount on the existing
‘price’ column.(try using lambda function)

f- Convert the column type of “Date” to datetime format

g- Create a column rank which ranks the products based on the price (one with highest price will
be rank 1).

Solution:
raw_Data = {'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],

'Product': ['Umbrella', 'Matress', 'Badminton', 'Shuttle'],

'Price': [1250, 1450, 1550, 400],

'Expense': [21525220.653, 31125840.875, 23135428.768, 56245263.942]}

df = pd.DataFrame(raw_Data)

ills
# Task a: Add Index as Item1, Item2, Item3, Item4

df.index = ['Item1', 'Item2', 'Item3', 'Item4']

Sk
# Task b: Find index labels with Price > 1000

indexes_with_price_gt_1000 = df[df['Price'] > 1000].index.tolist()

a
# Task c: Replace products using Map()

product_map = {'Umbrella': 'U', 'Matress': 'M', 'Badminton': 'B', 'Shuttle': 'S'}


at
df['Product'] = df['Product'].map(product_map)
D

# Task d: Round off Expense column to two decimal places

df['Expense'] = df['Expense'].round(2)
w

# Task e: Create 'Discounted_Price' column with 10% discount


ro

df['Discounted_Price'] = df['Price'] * 0.9


G

# Task f: Convert 'Date' column to datetime format

df['Date'] = pd.to_datetime(df['Date'])

# Task g: Create 'Rank' column based on price

df['Rank'] = df['Price'].rank(ascending=False).astype(int)

print(df)
Assignment: Exploring NBA Player Data

Download the nba.csv file containing NBA player data Complete the following tasks using
Python, Pandas, and data visualization libraries:

1. Load Data:

● Load the nba.csv data into a Pandas DataFrame.

ills
● Display basic information about the DataFrame.

2. Data Cleaning:

● Handle missing values by either removing or imputing them.

Sk
● Remove duplicate rows.

3. Data Transformation:

● Create a new column 'BMI' (Body Mass Index) using the formula: BMI = (weight in
pounds / (height in inches)^2) * 703.(Assuming a fixed height value of 70 inches (5 feet
a
10 inches)

4. Exploratory Data Analysis (EDA):


at
● Display summary statistics of the 'age', 'weight', and 'salary' columns.

● Calculate the average age, weight, and salary of players in each 'position' category.
D

5. Data Visualization:

● Create a histogram of player ages.


w

● Create a box plot of player salaries for each 'position'.

● Plot a scatter plot of 'age' vs. 'salary' with a different color for each 'position'.
ro

6. Top Players:

● Display the top 10 players with the highest salaries.


G

7. College Analysis:

● Determine the top 5 colleges with the most represented players.

8. Position Distribution:

● Plot a pie chart to show the distribution of players across different 'positions'.

9. Team Analysis:

● Display the average salary of players for each 'team'.


● Plot a bar chart to visualize the average salary of players for each 'team'.

10. Extras

● Get the index at which minimum weight value is present.


● Sort values based on name in alphabetical order for the rows (the original Dataframe
sorting should not change)
● Create a series from given dataframe on “name” column and display top and last 10

Guidelines:

ills
1. Write Python code to complete each task.

2. Provide comments explaining your code.

3. Use meaningful variable names.

Sk
4. Include necessary library imports.

5. Present your findings in a clear and organized manner.

6. Feel free to use additional code cells for each task.

Solution:
a
at
1. Load Data:

import pandas as pd
D

# Load the data into a Pandas DataFrame


w

df = pd.read_csv('nba.csv')
ro

# Display basic information about the DataFrame

print(df.info())
G

print(df.head())

2. Data Cleaning:

# Handle missing values

df.dropna(inplace=True)
# Remove duplicate rows

df.drop_duplicates(inplace=True)

3. Data Transformation: Create 'BMI' column using a fixed height value

# Assuming a fixed height value of 70 inches (5 feet 10 inches)

fixed_height = 70

ills
# Create 'BMI' column

df['BMI'] = (df['Weight'] / (fixed_height ** 2)) * 703

Sk
4. Exploratory Data Analysis (EDA):

# Summary statistics

print(df[['Age', 'Weight', 'Salary']].describe())


a
# Average age, weight, and salary by position
at
avg_by_position = df.groupby('Position')[['Age', 'Weight', 'Salary']].mean()

print(avg_by_position)
D

5. Data Visualization:
w

import matplotlib.pyplot as plt


ro

# Histogram of player ages

plt.hist(df['Age'], bins=20)
G

plt.xlabel('Age')

plt.ylabel('Frequency')

plt.title('Distribution of Player Ages')

plt.show()

# Box plot of player salaries by position


plt.figure(figsize=(10, 6))

df.boxplot(column='Salary', by='Position')

plt.ylabel('Salary')

plt.title('Box Plot of Player Salaries by Position')

plt.suptitle('')

plt.xticks(rotation=45)

plt.show()

ills
# Scatter plot of 'age' vs. 'salary' by position

plt.figure(figsize=(10, 6))

Sk
colors = {'PG': 'red', 'SG': 'blue', 'SF': 'green', 'PF': 'purple', 'C': 'orange'}

plt.scatter(df['Age'], df['Salary'], c=df['Position'].map(colors), alpha=0.5)

plt.xlabel('Age')

plt.ylabel('Salary')
a
plt.title('Age vs. Salary by Position')
at
plt.legend(colors)

plt.show()
D

6. Top Players:
w

top_players = df.nlargest(10, 'Salary')

print(top_players)
ro

7. College Analysis:
G

top_colleges = df['College'].value_counts().nlargest(5)

print(top_colleges)

8. Position Distribution:

position_counts = df['Position'].value_counts()
plt.pie(position_counts, labels=position_counts.index, autopct='%1.1f%%', startangle=140)

plt.title('Position Distribution of Players')

plt.axis('equal')

plt.show()

9. Team Analysis:

ills
avg_salary_by_team = df.groupby('Team')['Salary'].mean()

print(avg_salary_by_team)

Sk
plt.figure(figsize=(10, 6))

avg_salary_by_team.plot(kind='bar')

plt.xlabel('Team')

plt.ylabel('Average Salary')
a
plt.title('Average Salary of Players by Team')
at
plt.xticks(rotation=45)

plt.show()
D

10.Extras:
w

min_weight_index = df['Weight'].idxmin()

print("Index with minimum weight value:", min_weight_index)


ro

df_sorted = df.sort_values(by='Name', ignore_index=True)


G

print(df_sorted)

name_series = df['Name']

print("Top 10 names:\n", name_series.head(10))

print("\nLast 10 names:\n", name_series.tail(10))


G
ro
w
D
at
a
Sk
ills

You might also like