0% found this document useful (0 votes)

4 views2 pages

Practice Questions2

The document contains Python code that demonstrates data manipulation using pandas and numpy, including handling missing values, removing duplicates, and performing group operations. It also covers solving a system of equations, predicting house prices using linear regression, and filling missing values through various interpolation methods. Additionally, it includes creating a crosstab to count employees in different departments across work locations.

Uploaded by

Rishit Gandha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views2 pages

Practice Questions2

Uploaded by

Rishit Gandha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

In [32]: import pandas as pd

import numpy as np

# Creating a dataset with missing values, duplicates, and categorical data

data = {
'Customer_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 101, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ivy', 'Jack', 'Alice', 'Eva'],
'Age': [25, 34, 29, 40, 29, np.nan, 32, np.nan, 28, 45, 25, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Chicago', 'San Francisco',
'San Francisco', 'Los Angeles', 'New York', 'Chicago', 'New York', 'Chicago'],
'Salary': [50000, 70000, 80000, np.nan, 65000, 72000, 81000, 62000, 77000, 50000, 50000, 65000],
'Purchase_Amount': [200, 150, np.nan, 300, 250, 400, 350, np.nan, 500, 450, 200, 250]
}

# Create DataFrame
df = pd.DataFrame(data)

print(df)

# Remove duplicates
df = df.drop_duplicates()

Customer_ID Name Age City Salary Purchase_Amount

0 101 Alice 25.0 New York 50000.0 200.0
1 102 Bob 34.0 Los Angeles 70000.0 150.0
2 103 Charlie 29.0 Chicago 80000.0 NaN
3 104 David 40.0 New York NaN 300.0
4 105 Eva 29.0 Chicago 65000.0 250.0
5 106 Frank NaN San Francisco 72000.0 400.0
6 107 Grace 32.0 San Francisco 81000.0 350.0
7 108 Hannah NaN Los Angeles 62000.0 NaN
8 109 Ivy 28.0 New York 77000.0 500.0
9 110 Jack 45.0 Chicago 50000.0 450.0
10 101 Alice 25.0 New York 50000.0 200.0
11 105 Eva 29.0 Chicago 65000.0 250.0

Find the customer row who has made the highest total purchase amount (after removing duplicates)
In [ ]: # 1. Find the customer who has made the highest total purchase amount
highest_purchase_customer = df.groupby('Customer_ID')['Purchase_Amount'].sum().idxmax()
print("1. Customer with highest total purchase amount:", highest_purchase_customer)

Identify customers whose salary is within the top 10% of all salaries
In [ ]: # 2. Identify customers whose salary is within the top 10% of all salaries
salary_threshold = df['Salary'].quantile(0.9)
top_salary_customers = df[df['Salary'] >= salary_threshold]
print("2. Customers with top 10% salaries:\n", top_salary_customers)

Find the city with the most customers and calculate its average purchase amount
In [ ]: # 3. Find the city with the most customers and calculate its average purchase amount
most_common_city = df['City'].mode()[0]
avg_purchase_in_city = df[df['City'] == most_common_city]['Purchase_Amount'].mean()
print("3. City with most customers:", most_common_city, "Average Purchase Amount:", avg_purchase_in_city)

Create a new column indicating if the customer is a high spender (if purchase amount > median)
In [ ]: # 4. Create a new column indicating if the customer is a high spender (if purchase amount > median)
purchase_median = df['Purchase_Amount'].median()
df['High_Spender'] = df['Purchase_Amount'] > purchase_median
print("4. DataFrame with High Spender column:\n", df)

Find the age group (bins) with the highest average salary
In [ ]: # 5. Find the age group (bins) with the highest average salary
bins = [20, 30, 40, 50]
labels = ['20-30', '30-40', '40-50']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
highest_avg_salary_group = df.groupby('Age_Group')['Salary'].mean().idxmax()
print("5. Age group with highest average salary:", highest_avg_salary_group)

Replace missing salary values using the median salary for that customer’s city
In [ ]: # 6. Replace missing salary values using the median salary for that customer’s city
df['Salary'] = df.groupby('City')['Salary'].apply(lambda x: x.fillna(x.median()))
print("6. DataFrame after filling missing salaries:\n", df)

Solve a system of equations with multiple variables (3x3 system)

Problem:

Solve the following system using NumPy:

x+2y+3z=14

4x+5y+6z=32

7x+8y+10z=50
In [ ]: # 7. Solve a system of equations with multiple variables (3x3 system)
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 10]])
B = np.array([14, 32, 50])
solution = np.linalg.solve(A, B)
print("7. Solution to the system of equations:", solution)

In [ ]:

You are given a dataset containing historical house price data. The price of a house depends on square footage, number of
bedrooms,

and age of the house. Your task is to predict the house prices based on these features using linear algebra techniques

(np.linalg.solve and np.matmul).

Predict the price of a new house with:

2200 sq ft, 3 bedrooms, and 5 years old

In [27]: data = {
"Square_Feet": [1500, 1800, 2400, 3000],
"Bedrooms": [3, 4, 4, 5],
"Age_Years": [10, 5, 2, 1],
"Price": [300000, 400000, 500000, 600000]
}
df = pd.DataFrame(data)
# Display the DataFrame
print(df)

Square_Feet Bedrooms Age_Years Price

0 1500 3 10 300000
1 1800 4 5 400000
2 2400 4 2 500000
3 3000 5 1 600000

In [ ]: import numpy as np
import pandas as pd

# Given dataset
data = {
"Square_Feet": [1500, 1800, 2400, 3000],
"Bedrooms": [3, 4, 4, 5],
"Age_Years": [10, 5, 2, 1],
"Price": [300000, 400000, 500000, 600000]
}

df = pd.DataFrame(data)

# Extract features (X) and target variable (y)

X = df[["Square_Feet", "Bedrooms", "Age_Years"]].values
y = df["Price"].values

# Display dataset
print(df)

# Add a bias column (intercept term) to X

X_b = np.c_[np.ones((X.shape[0], 1)), X] # Adding a column of ones

# Compute the coefficients using the normal equation

theta = np.linalg.solve(X_b.T @ X_b, X_b.T @ y)

print("Coefficients (theta):", theta)

# Define new house data

new_house = np.array([1, 2200, 3, 5]) # Include bias term

# Predict price
predicted_price = np.matmul(new_house, theta)

print("Predicted price for the new house:", predicted_price)

In [ ]:

Perform filling of nan values based on linear interpolation,time based and index based on ids
In [29]: import pandas as pd
import numpy as np

# Create dataset with missing values

data = {
"ID": [101, 102, 103, 104, 105, 106, 107, 108],
"Date": pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-04",
"2024-01-05", "2024-01-07", "2024-01-08",
"2024-01-10", "2024-01-11"]),
"Sales": [500, np.nan, 700, np.nan, 850, np.nan, 920, 980],
"Category": ["Electronics", "Furniture", "Electronics", "Clothing",
"Furniture", "Electronics", "Clothing", "Furniture"]
}

df = pd.DataFrame(data)

print("Original Dataset with Missing Values:")

Original Dataset with Missing Values:

Out[29]: ID Date Sales Category

0 101 2024-01-01 500.0 Electronics

1 102 2024-01-02 NaN Furniture

2 103 2024-01-04 700.0 Electronics

3 104 2024-01-05 NaN Clothing

4 105 2024-01-07 850.0 Furniture

5 106 2024-01-08 NaN Electronics

6 107 2024-01-10 920.0 Clothing

7 108 2024-01-11 980.0 Furniture

In [ ]: df_linear = df.copy()
df_linear["Sales"] = df_linear["Sales"].interpolate(method="linear")

print("\nDataset after Linear Interpolation:")

print(df_linear)

In [ ]: df_time = df.copy()
df_time.set_index("Date", inplace=True) # Set Date as index
df_time["Sales"] = df_time["Sales"].interpolate(method="time") # Time-based interpolation
df_time.reset_index(inplace=True)

print("\nDataset after Time-Based Interpolation:")

print(df_time)

In [ ]: df_index = df.copy()
df_index.set_index("ID", inplace=True) # Set ID as index
df_index["Sales"] = df_index["Sales"].interpolate(method="index") # Index-based interpolation
df_index.reset_index(inplace=True)

print("\nDataset after Index-Based Interpolation:")

print(df_index)

In [ ]:

You have a dataset containing employee details with the following columns:
Employee_ID: A unique identifier for each employee.

Department: The department where the employee works (e.g., HR, IT, Sales).

Work_Location: The office location of the employee (e.g., New York, San Francisco, Chicago).

find the count of employees in each department across different work location using crosstab
In [31]: import pandas as pd

# Sample dataset
data = {
"Employee_ID": range(1, 11),
"Department": ["HR", "IT", "Sales", "IT", "HR", "Sales", "IT", "HR", "Sales", "IT"],
"Work_Location": ["New York", "San Francisco", "Chicago", "New York", "Chicago",
"San Francisco", "Chicago", "New York", "San Francisco", "Chicago"]
}

df = pd.DataFrame(data)

print("Employee Dataset:")
print(df)

Employee Dataset:
Employee_ID Department Work_Location
0 1 HR New York
1 2 IT San Francisco
2 3 Sales Chicago
3 4 IT New York
4 5 HR Chicago
5 6 Sales San Francisco
6 7 IT Chicago
7 8 HR New York
8 9 Sales San Francisco
9 10 IT Chicago

In [ ]: # Create a crosstab of Department vs Work_Location

crosstab_result = pd.crosstab(df["Department"], df["Work_Location"])

print("\nCount of Employees in Each Department Across Work Locations:")

print(crosstab_result)
In [ ]:

In [ ]:

Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Dataframe
No ratings yet
Dataframe
19 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Assignment 7
No ratings yet
Assignment 7
1 page
Even Students
No ratings yet
Even Students
36 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
Practical File Ip
No ratings yet
Practical File Ip
27 pages
DV Mid Internal 1
No ratings yet
DV Mid Internal 1
8 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
DA Lab
No ratings yet
DA Lab
27 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Set B
No ratings yet
Set B
8 pages
Task 6
No ratings yet
Task 6
14 pages
PRACTICALS
No ratings yet
PRACTICALS
52 pages
Lab Programmes Adwaith
No ratings yet
Lab Programmes Adwaith
18 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Ip Project
No ratings yet
Ip Project
27 pages
Ip Worksheet 3 - Q'S
No ratings yet
Ip Worksheet 3 - Q'S
6 pages
Certificate
No ratings yet
Certificate
25 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
IP Practical
No ratings yet
IP Practical
24 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
EXP 5 DE Lab
No ratings yet
EXP 5 DE Lab
5 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
West Rox
No ratings yet
West Rox
29 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Functionapplicationp PDF
No ratings yet
Functionapplicationp PDF
6 pages
Python Assignment-2
No ratings yet
Python Assignment-2
3 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Practical - With Solution - XII - IP
No ratings yet
Practical - With Solution - XII - IP
13 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Practical 1
No ratings yet
Practical 1
65 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
IP Practical
No ratings yet
IP Practical
28 pages
Edp 3
No ratings yet
Edp 3
16 pages
Exp 3
No ratings yet
Exp 3
10 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
House Price Prediction
No ratings yet
House Price Prediction
63 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Ip Practical 123
No ratings yet
Ip Practical 123
24 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Lab File
No ratings yet
Lab File
96 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Survival Rate After Heart Transplant
No ratings yet
Survival Rate After Heart Transplant
7 pages
Patient Reports - Download Sample Report
No ratings yet
Patient Reports - Download Sample Report
19 pages
Dozee Brochure - Download Brochure
No ratings yet
Dozee Brochure - Download Brochure
4 pages
Out Come of Heaart Transplantation
No ratings yet
Out Come of Heaart Transplantation
3 pages
Thermal Lag Windmill Report Clean
No ratings yet
Thermal Lag Windmill Report Clean
2 pages
Unit 2-Numpy Advanced
No ratings yet
Unit 2-Numpy Advanced
23 pages
Numpy Revision Exercise
No ratings yet
Numpy Revision Exercise
2 pages
Pandas Revision1
No ratings yet
Pandas Revision1
2 pages
M Sol Ch-13 Mathematical Reasoning
No ratings yet
M Sol Ch-13 Mathematical Reasoning
10 pages
UC Berkeley EECS: Cal Day, April 18, 2015
No ratings yet
UC Berkeley EECS: Cal Day, April 18, 2015
50 pages
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
No ratings yet
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
15 pages
Aptitude Questions: BIT Placement Center
No ratings yet
Aptitude Questions: BIT Placement Center
17 pages
COMP 6651 Slides F24 Week 04 2up
No ratings yet
COMP 6651 Slides F24 Week 04 2up
33 pages
Important Questions - Sets QB365
No ratings yet
Important Questions - Sets QB365
3 pages
Simulink Exercises For - Digital Communications - A Discrete-Time Approach, - by M
No ratings yet
Simulink Exercises For - Digital Communications - A Discrete-Time Approach, - by M
2 pages
Initial Project Proposal
No ratings yet
Initial Project Proposal
16 pages
21-22 Course Catalog
No ratings yet
21-22 Course Catalog
30 pages
Lecture Notes Mechanics 5 03 26 21
No ratings yet
Lecture Notes Mechanics 5 03 26 21
8 pages
MATH9 Q1 M9 W9 Revised Final
No ratings yet
MATH9 Q1 M9 W9 Revised Final
17 pages
NMO Round1 2021
No ratings yet
NMO Round1 2021
8 pages
Term 2 Syllabus Class Xi
No ratings yet
Term 2 Syllabus Class Xi
6 pages
Khanz e Zhang
No ratings yet
Khanz e Zhang
14 pages
5 Integration of Inverse Trigonometric Functions
No ratings yet
5 Integration of Inverse Trigonometric Functions
4 pages
Linear Measurement
No ratings yet
Linear Measurement
12 pages
முதலாம் தவணை - வவுனியா தெற்கு
No ratings yet
முதலாம் தவணை - வவுனியா தெற்கு
6 pages
01 - The Design Process - NASA PDF
No ratings yet
01 - The Design Process - NASA PDF
42 pages
اسئلة السنوات الماضية للمقاومة (الكورس الثاني)
No ratings yet
اسئلة السنوات الماضية للمقاومة (الكورس الثاني)
7 pages
ITT Evaluation Matrix
No ratings yet
ITT Evaluation Matrix
18 pages
Model QP
No ratings yet
Model QP
6 pages
1 4 Packet
No ratings yet
1 4 Packet
7 pages
Chapter 1 Challenges: Questions
No ratings yet
Chapter 1 Challenges: Questions
6 pages
DETAILED LESSON PLAN in MATH IV Parts of
100% (1)
DETAILED LESSON PLAN in MATH IV Parts of
3 pages
Preparing For The Future of Ai
No ratings yet
Preparing For The Future of Ai
58 pages
Unit Standards and Competencies Diagram: Immaculate Heart of Mary School (Bulacan), Inc
No ratings yet
Unit Standards and Competencies Diagram: Immaculate Heart of Mary School (Bulacan), Inc
6 pages
Boiler Level Control
100% (1)
Boiler Level Control
20 pages
Formula Sheet SB
No ratings yet
Formula Sheet SB
17 pages
MA 20104 Probability and Statistics Assignment No. 3: e M T T e
No ratings yet
MA 20104 Probability and Statistics Assignment No. 3: e M T T e
6 pages

Practice Questions2

Uploaded by

Practice Questions2

Uploaded by

In [32]: import pandas as pd

# Creating a dataset with missing values, duplicates, and categorical data

Customer_ID Name Age City Salary Purchase_Amount

Solve a system of equations with multiple variables (3x3 system)

Solve the following system using NumPy:

(np.linalg.solve and np.matmul).

Predict the price of a new house with:

2200 sq ft, 3 bedrooms, and 5 years old

Square_Feet Bedrooms Age_Years Price

# Extract features (X) and target variable (y)

# Add a bias column (intercept term) to X

# Compute the coefficients using the normal equation

print("Coefficients (theta):", theta)

# Define new house data

print("Predicted price for the new house:", predicted_price)

# Create dataset with missing values

print("Original Dataset with Missing Values:")

Original Dataset with Missing Values:

0 101 2024-01-01 500.0 Electronics

1 102 2024-01-02 NaN Furniture

2 103 2024-01-04 700.0 Electronics

3 104 2024-01-05 NaN Clothing

4 105 2024-01-07 850.0 Furniture

5 106 2024-01-08 NaN Electronics

6 107 2024-01-10 920.0 Clothing

7 108 2024-01-11 980.0 Furniture

print("\nDataset after Linear Interpolation:")

print("\nDataset after Time-Based Interpolation:")

print("\nDataset after Index-Based Interpolation:")

In [ ]: # Create a crosstab of Department vs Work_Location

print("\nCount of Employees in Each Department Across Work Locations:")

You might also like