Assignment 2
Assignment 2
Objective:
Making changes to the data (called manipulation) – like updating values, adding new
columns, removing duplicates, filtering rows, etc.
Saving or exporting the modified data into a new file (CSV, Excel, etc.) for reporting, sharing,
or backup.
This step is super important in data analysis and test data preparation.
Topics to Cover:
👉 Save to CSV
df.to_csv('output.csv', index=False)
👉 Save to Excel
df.to_excel('output.xlsx', index=False)
✅ 2. Data Manipulation with Pandas
🔹 b. Deleting a column
This shows only those rows where Age is more than 25.
df.fillna(0)
df.dropna()
🔹 e. Renaming columns
🔹 f. Sorting data
df.sort_values(by='Age', ascending=False)
df['Age'] = df['Age'].astype(float)
🔸 Import NumPy
import numpy as np
Else → 'Junior'
np.mean(df['Salary'])
np.median(df['Salary'])
np.std(df['Salary'])
📋 Summary Table:
✅ Real-Life Example:
df = pd.read_csv('customers.csv')
# Filter
df = df[df['City'] == 'Pune']
# New column
df['Age_in_Months'] = df['Age'] * 12
df['City'] = df['City'].fillna('Unknown')
# Save new file
df.to_csv('pune_customers_cleaned.csv', index=False)
Questions&Answers
Q1. How do you save a DataFrame to a CSV file without the index column?
Ans:
df.to_csv('filename.csv', index=False)
Q2. How to add a new column Age_in_Months by multiplying Age column by 12?
Ans:
df['Age_in_Months'] = df['Age'] * 12
Ans:
Q4. Write the code to replace missing values in the City column with 'NaN'.
Ans:
df['City'] = df['City'].fillna('NaN')
Ans:
df.dropna()
Ans:
df['Age'] = df['Age'].astype(float)
Ans:
df.to_excel('data.xlsx', index=False)
Q9. Create a NumPy array from list [10, 20, 30, 40]
Ans:
Ans:
inplace=True means the changes will be applied directly to the original DataFrame, without creating
a copy.
Ans:
axis=1 means we are dropping a column (not a row).
axis=0 → row
axis=1 → column