Practice Questions2
Practice Questions2
import numpy as np
# Create DataFrame
df = pd.DataFrame(data)
print(df)
# Remove duplicates
df = df.drop_duplicates()
Find the customer row who has made the highest total purchase amount (after removing duplicates)
In [ ]: # 1. Find the customer who has made the highest total purchase amount
highest_purchase_customer = df.groupby('Customer_ID')['Purchase_Amount'].sum().idxmax()
print("1. Customer with highest total purchase amount:", highest_purchase_customer)
Identify customers whose salary is within the top 10% of all salaries
In [ ]: # 2. Identify customers whose salary is within the top 10% of all salaries
salary_threshold = df['Salary'].quantile(0.9)
top_salary_customers = df[df['Salary'] >= salary_threshold]
print("2. Customers with top 10% salaries:\n", top_salary_customers)
Find the city with the most customers and calculate its average purchase amount
In [ ]: # 3. Find the city with the most customers and calculate its average purchase amount
most_common_city = df['City'].mode()[0]
avg_purchase_in_city = df[df['City'] == most_common_city]['Purchase_Amount'].mean()
print("3. City with most customers:", most_common_city, "Average Purchase Amount:", avg_purchase_in_city)
Create a new column indicating if the customer is a high spender (if purchase amount > median)
In [ ]: # 4. Create a new column indicating if the customer is a high spender (if purchase amount > median)
purchase_median = df['Purchase_Amount'].median()
df['High_Spender'] = df['Purchase_Amount'] > purchase_median
print("4. DataFrame with High Spender column:\n", df)
Find the age group (bins) with the highest average salary
In [ ]: # 5. Find the age group (bins) with the highest average salary
bins = [20, 30, 40, 50]
labels = ['20-30', '30-40', '40-50']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
highest_avg_salary_group = df.groupby('Age_Group')['Salary'].mean().idxmax()
print("5. Age group with highest average salary:", highest_avg_salary_group)
Replace missing salary values using the median salary for that customer’s city
In [ ]: # 6. Replace missing salary values using the median salary for that customer’s city
df['Salary'] = df.groupby('City')['Salary'].apply(lambda x: x.fillna(x.median()))
print("6. DataFrame after filling missing salaries:\n", df)
Problem:
x+2y+3z=14
4x+5y+6z=32
7x+8y+10z=50
In [ ]: # 7. Solve a system of equations with multiple variables (3x3 system)
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 10]])
B = np.array([14, 32, 50])
solution = np.linalg.solve(A, B)
print("7. Solution to the system of equations:", solution)
In [ ]:
In [ ]:
In [ ]:
You are given a dataset containing historical house price data. The price of a house depends on square footage, number of
bedrooms,
and age of the house. Your task is to predict the house prices based on these features using linear algebra techniques
In [ ]: import numpy as np
import pandas as pd
# Given dataset
data = {
"Square_Feet": [1500, 1800, 2400, 3000],
"Bedrooms": [3, 4, 4, 5],
"Age_Years": [10, 5, 2, 1],
"Price": [300000, 400000, 500000, 600000]
}
df = pd.DataFrame(data)
# Display dataset
print(df)
# Predict price
predicted_price = np.matmul(new_house, theta)
In [ ]:
In [ ]:
Perform filling of nan values based on linear interpolation,time based and index based on ids
In [29]: import pandas as pd
import numpy as np
df = pd.DataFrame(data)
In [ ]: df_linear = df.copy()
df_linear["Sales"] = df_linear["Sales"].interpolate(method="linear")
In [ ]: df_time = df.copy()
df_time.set_index("Date", inplace=True) # Set Date as index
df_time["Sales"] = df_time["Sales"].interpolate(method="time") # Time-based interpolation
df_time.reset_index(inplace=True)
In [ ]: df_index = df.copy()
df_index.set_index("ID", inplace=True) # Set ID as index
df_index["Sales"] = df_index["Sales"].interpolate(method="index") # Index-based interpolation
df_index.reset_index(inplace=True)
In [ ]:
In [ ]:
You have a dataset containing employee details with the following columns:
Employee_ID: A unique identifier for each employee.
Department: The department where the employee works (e.g., HR, IT, Sales).
Work_Location: The office location of the employee (e.g., New York, San Francisco, Chicago).
find the count of employees in each department across different work location using crosstab
In [31]: import pandas as pd
# Sample dataset
data = {
"Employee_ID": range(1, 11),
"Department": ["HR", "IT", "Sales", "IT", "HR", "Sales", "IT", "HR", "Sales", "IT"],
"Work_Location": ["New York", "San Francisco", "Chicago", "New York", "Chicago",
"San Francisco", "Chicago", "New York", "San Francisco", "Chicago"]
}
df = pd.DataFrame(data)
print("Employee Dataset:")
print(df)
Employee Dataset:
Employee_ID Department Work_Location
0 1 HR New York
1 2 IT San Francisco
2 3 Sales Chicago
3 4 IT New York
4 5 HR Chicago
5 6 Sales San Francisco
6 7 IT Chicago
7 8 HR New York
8 9 Sales San Francisco
9 10 IT Chicago
In [ ]: