0% found this document useful (0 votes)
8 views6 pages

Assignment 2

The document outlines the process of exporting and manipulating data in Python using Pandas and NumPy. It covers key topics such as saving data to CSV and Excel, various data manipulation techniques like adding or dropping columns, filtering rows, handling missing values, and basic NumPy operations. Additionally, it provides code examples and a real-life scenario for practical application.

Uploaded by

themanhector24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Assignment 2

The document outlines the process of exporting and manipulating data in Python using Pandas and NumPy. It covers key topics such as saving data to CSV and Excel, various data manipulation techniques like adding or dropping columns, filtering rows, handling missing values, and basic NumPy operations. Additionally, it provides code examples and a real-life scenario for practical application.

Uploaded by

themanhector24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment 2: Exporting and Manipulating Data in

Python using Pandas & NumPy

Objective:

After loading and exploring data, the next steps are:

 Making changes to the data (called manipulation) – like updating values, adding new
columns, removing duplicates, filtering rows, etc.

 Saving or exporting the modified data into a new file (CSV, Excel, etc.) for reporting, sharing,
or backup.

This step is super important in data analysis and test data preparation.

Topics to Cover:

1. Exporting Data (Save to CSV, Excel)

2. Data Manipulation with Pandas


a. Adding / Dropping Columns
b. Filtering Rows
c. Handling Missing Values
d. Renaming Columns
e. Sorting Data
f. Changing Data Types

3. Basic NumPy Operations with Data

✅ 1. Exporting Data (Saving data to files)

👉 Save to CSV

df.to_csv('output.csv', index=False)

🔹 index=False means we don’t save the row numbers (optional).

👉 Save to Excel

df.to_excel('output.xlsx', index=False)
✅ 2. Data Manipulation with Pandas

🔹 a. Adding a new column

df['Age in Months'] = df['Age'] * 12

This creates a new column by modifying an existing one.

🔹 b. Deleting a column

df.drop('Age in Months', axis=1, inplace=True)

 axis=1 → means column

 inplace=True → changes happen directly in the original dataframe

🔹 c. Filtering rows using condition

df[df['Age'] > 25]

This shows only those rows where Age is more than 25.

🔹 d. Handling missing values

Fill missing values:

df.fillna(0)

Drop rows with missing values:

df.dropna()

🔹 e. Renaming columns

df.rename(columns={'Age': 'Customer Age'}, inplace=True)

🔹 f. Sorting data

df.sort_values(by='Age', ascending=False)

This will sort the dataframe by Age in descending order.


🔹 g. Changing data types

df['Age'] = df['Age'].astype(float)

This changes the datatype of Age column to float.

✅ 3. NumPy Operations with Data

Sometimes, we use NumPy (Numerical Python) for mathematical operations on data.

🔸 Import NumPy

import numpy as np

🔸 Replace values using NumPy

df['Age'] = np.where(df['Age'] > 30, 'Senior', 'Junior')

This replaces values in 'Age':

 If Age > 30 → 'Senior'

 Else → 'Junior'

🔸 Creating Arrays from List

arr = np.array([1, 2, 3, 4])

🔸 Get Mean, Median, Std

np.mean(df['Salary'])

np.median(df['Salary'])

np.std(df['Salary'])

📋 Summary Table:

Action Code Example

Save as CSV df.to_csv('file.csv', index=False)

Save as Excel df.to_excel('file.xlsx', index=False)


Action Code Example

Add new column df['new'] = df['old'] * 2

Delete column df.drop('column', axis=1)

Filter rows df[df['Age'] > 25]

Fill missing values df.fillna(0)

Drop missing values df.dropna()

Rename column df.rename(columns={'A': 'B'})

Sort by column df.sort_values(by='Age')

Change datatype df['Age'] = df['Age'].astype(float)

Replace using NumPy np.where(condition, value_if_true, value_if_false)

Mean / Median / Std with NumPy np.mean(df['col']), np.median(...) etc.

✅ Real-Life Example:

Imagine you're working on a file containing customer data:

 You want to filter only customers from Pune

 Add a column showing Age in months

 Replace missing cities with 'Unknown'

 And save the cleaned data into a new file

Here’s how you’d do it:

df = pd.read_csv('customers.csv')

# Filter

df = df[df['City'] == 'Pune']

# New column

df['Age_in_Months'] = df['Age'] * 12

# Handle missing values

df['City'] = df['City'].fillna('Unknown')
# Save new file

df.to_csv('pune_customers_cleaned.csv', index=False)

Questions&Answers

Q1. How do you save a DataFrame to a CSV file without the index column?

Ans:

df.to_csv('filename.csv', index=False)

Q2. How to add a new column Age_in_Months by multiplying Age column by 12?

Ans:

df['Age_in_Months'] = df['Age'] * 12

Q3. Write the code to delete a column named Age_in_Months.

Ans:

df.drop('Age_in_Months', axis=1, inplace=True)

Q4. Write the code to replace missing values in the City column with 'NaN'.

Ans:

df['City'] = df['City'].fillna('NaN')

Q5. How do you remove rows that contain missing values?

Ans:

df.dropna()

Q6. Rename column Age to Customer_Age.

Ans:

df.rename(columns={'Age': 'Customer_Age'}, inplace=True)

Q7. Convert the datatype of the Age column to float.


Ans:

df['Age'] = df['Age'].astype(float)

Q8. Write the command to save DataFrame into Excel file.

Ans:

df.to_excel('data.xlsx', index=False)

Q9. Create a NumPy array from list [10, 20, 30, 40]

Ans:

arr = np.array([10, 20, 30, 40])

Q10. What is the use of inplace=True in Pandas?

Ans:
inplace=True means the changes will be applied directly to the original DataFrame, without creating
a copy.

Q11. What does axis=1 mean in drop() function?

Ans:
axis=1 means we are dropping a column (not a row).

 axis=0 → row

 axis=1 → column

You might also like