0% found this document useful (0 votes)
4 views

Python practice questions (1)

The document outlines a series of programming tasks using Python, focusing on data manipulation and analysis with NumPy and Pandas. It includes exercises on reshaping arrays, handling missing values, calculating statistics, and visualizing data through plots. Additionally, it covers advanced topics such as feature selection using ANOVA, correlation analysis, and building regression and classification models with employee performance data.

Uploaded by

S.Y Suprabhath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python practice questions (1)

The document outlines a series of programming tasks using Python, focusing on data manipulation and analysis with NumPy and Pandas. It includes exercises on reshaping arrays, handling missing values, calculating statistics, and visualizing data through plots. Additionally, it covers advanced topics such as feature selection using ANOVA, correlation analysis, and building regression and classification models with employee performance data.

Uploaded by

S.Y Suprabhath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

.

Additional programs

Note: Practice these Questions

Q1.Write a Python program to

1. Create a NumPy array with the product sales below and reshape it to a 2x4 matrix.
2. Calculate the sum and mean of sales.
3. Convert the data into a Pandas DataFrame with columns "Product" and "Sales" and
export it to a CSV file.
4. Load the CSV file and print the DataFrame.

Dataset:
Product Sales
A 120
B 200
C 150
D 180

Q2. Given the data of student scores, write a Python program that:

1. Detects missing values and fills them with the average score of the available data in each
subject.
2. Creates a dictionary with students as keys and their average scores as values.
3. Converts the dictionary values to a NumPy array, calculates mean and standard deviation,
and plots a bar chart of scores.

Dataset:
Student Math Science English
Alex 85 78 92
Ben 74 - 81
Cara 90 87 -
Dana 88 75 83

Q3.Given a dataset on car prices and mileages, write a Python program that:

1. Creates a Pandas DataFrame from the data below.


2. Finds the sum, mean, and standard deviation of the "Price" column.
3. Normalizes the "Mileage" and "Price" columns to a 0-1 range and adds them as new
columns.
Dataset:
Car Model Price Mileage (in 1000s)
Sedan 20000 35
SUV 30000 45
Truck 25000 60
Coupe 27000 40

Q4.Data Cleaning, Standardization, and DataFrame Manipulation with Plotting

Using the dataset of test scores, write a Python program that:

1. Identifies and replaces missing values with the median score for each subject.
2. Calculates the total and average scores for each student.
3. Standardizes the scores in each subject (mean=0, std=1) and stores them in a new
DataFrame.
4. Plots a line graph showing standardized scores across subjects for each student.

Dataset:
Student Math Science English
Tom 78 85 -
Lucy 92 88 80
Max 85 - 75
Zoe 70 78 82

Q5. Given the data on house listings, write a Python program that:

1. Creates a Pandas DataFrame from the data below.


2. Calculates the sum, mean, and standard deviation of the "Price" and "Area" columns.
3. Standardizes the "Price" and "Area" columns (mean=0, std=1) and adds them as new
columns.

Dataset:
House Type Price (in 1000s) Area (sqft)
Apartment 150 900
Villa 300 2200
House Type Price (in 1000s) Area (sqft)
Townhouse 200 1500
Cottage 180 1200

Additional Programs

Dataset: Employee Performance and Salary Dataset

Years_of_Exp Performance_ Promotions_L


Employee_ID Age Department Salary (in₹) Education_Level
erience Score ast_5_Years

1 25 HR 2 45000 75 0 Bachelors
2 28 Finance 5 65000 82 1 Masters
3 35 IT 10 90000 88 1 PhD
4 40 HR 15 75000 70 0 Masters
5 29 Marketing 6 72000 85 1 Bachelors
6 50 IT 25 120000 92 1 PhD
7 32 Finance 8 84000 78 1 Masters
8 42 Marketing 18 97000 90 0 PhD
9 23 HR 1 40000 65 0 Bachelors
10 37 IT 12 88000 83 1 Masters

Question 1: Basic Data Preprocessing

Using the given Employee Performance and Salary Dataset, write a Python program to
implement the following basic data preprocessing steps:

1. Check for and handle any missing values in the dataset.


2. Perform Outlier Detection on the following numeric columns:

 Age
 Salary
 Performance_Score

Question 2: Skewness and Kurtosis with Box Plot


Using the given Employee Performance and Salary Dataset, write a Python program to

1. Calculate the Skewness and Kurtosis for the following columns:

 Age
 Salary
 Performance_Score

2. Plot a Box Plot for the Salary column to analyze its distribution.
3. Interpret the Skewness and Kurtosis results and the Box Plot to describe the nature of
the Salary distribution.

Question 3: Feature Selection using ANOVA

Task:
Using the given Employee Performance and Salary Dataset, write a Python program to perform
feature selection using the ANOVA F-test.

1. Use the column Performance_Score as the target variable.


2. Identify the significance of the independent features (Age, Years_of_Experience, Salary,
etc.) in predicting Performance_Score.
3. Display the F-values and p-values for each feature.
4. Interpret the results to identify which features are statistically significant.

Question 4: Heatmap for Correlation

Using the given Employee Performance and Salary Dataset, write a Python program to:

1. Calculate the correlation matrix for all numeric columns.


2. Generate a Heatmap using Seaborn to visualize the correlation between the numeric
features.
3. Identify and print:

 The pair of features with the highest positive correlation.


 The pair of features with the highest negative correlation.

Question 5: Regression and Classification Models


Task:
Using the given Employee Performance and Salary Dataset, write Python programs for the
following tasks:

a) Regression Task:

 Build a Linear Regression model to predict Salary using the following input features:
Age, Years_of_Experience, and Performance_Score.

b) Classification Task:

 Develop a Logistic Regression model to classify whether an employee has been


promoted in the last 5 years (Promotions_Last_5_Years column) based on all other
features.
 Evaluate the model using accuracy and Confusion Matrix.

You might also like