0% found this document useful (0 votes)
5 views4 pages

PAMLSET1 New

The document outlines an examination paper for Predictive Analytics and Machine Learning, consisting of various questions on data manipulation, visualization, and machine learning techniques. Candidates must attempt mandatory questions and choose between two options for question 3. The exam includes tasks involving datasets such as Youtuber.csv, FIFA19.csv, and HR_3A.csv, focusing on data analysis, logistic regression, clustering, and model evaluation.

Uploaded by

nishthathakkar21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

PAMLSET1 New

The document outlines an examination paper for Predictive Analytics and Machine Learning, consisting of various questions on data manipulation, visualization, and machine learning techniques. Candidates must attempt mandatory questions and choose between two options for question 3. The exam includes tasks involving datasets such as Youtuber.csv, FIFA19.csv, and HR_3A.csv, focusing on data analysis, logistic regression, clustering, and model evaluation.

Uploaded by

nishthathakkar21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PUSASQF602

PREDICTIVE ANALYTICS & MACHINE LEARNING​


​ ​ Time:​ 2 Hours
​ Total Marks: 60 Marks​
Note:​

E
-​ The candidate has the option to either question 3A or question 3B. Rest all
questions are mandatory.​

EG
-​ Numbers to the right indicate full marks​
-​ The candidates will be provided with the formula sheet and graphs (if
required) for the examination.​

LL
-​ Use of approved scientific calculator is allowed.​
​ ​ ​ ​

O
Q1. Attempt the following​ ​

A. ​ ​ ​ ​ ​ ​ ​
C ​
Perform the following operations mentioned below on the diamonds dataset.​
​ ​ 5 Marks​
E
i. Read the data “Youtuber.csv” using Pandas​ ​ ​ ​ ​ ​ (1)
ii. Generate a bar plot of Top 5 Youtube Channels by subscribers.
D
The graph should have titles as mentioned below​ ​ ​ ​ ​ (2)
​ ​ Title: Top 5 YouTube Channels by Subscribers​
R

​ ​ X Axis Title: Channel Name​


​ ​ Y Axis Title: Subscribers (in millions)​​ ​ ​ ​ ​
VA

iii. Generate a plot for Distribution of Channels by Country


The graph should have titles as mentioned below​ ​ ​ ​ ​ (2)
​ ​ Title: Distribution of Channels by Country​
​ ​ X Axis Title: Country​
AR

​ ​ Y Axis Title: Number of Channels​



B. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks​
TK

Load the dataset FIFA19.csv​​ ​ ​ ​ ​ ​


i. Filter the data to include only the 'Name', 'Age', 'Nationality', 'Club', 'Value', 'Wage', and
'Overall' columns​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (1)​
ii. Drop any rows with missing values​ ​ ​ ​ ​ ​ ​ ​ (1)
PA

iii. Derive any 2 insights from the data ​ ​ ​ ​ ​ ​ ​ (3)

1 of 4
C. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks​
i. Run a logistic regression in the below given dataframe​ ​ ​
df = pd.DataFrame({​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (1)
​ 'Cust_ID': [1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14,15],
​ 'Salary': [1000, 1100, 10000, 1000, 11000, 1110,21000,
30000,2100,33000,21000,21000,25000,21000,45000],
​ 'EMI': [0, 0, 0, 1, 1, 1,0, 0, 0, 1, 1, 1,1,1,1]
})

E
The data frame consists of 6 employees along with their monthly salaries to check their eligibility

EG
for No Cost EMI​ ​ ​ ​ ​ ​ ​ ​ ​
​ ​ Cust_ID: Customer ID for the inquiry​
​ ​ Salary: Customer's monthly take home salary​
​ ​ EMI: Checks eligibility for the EMI​

LL
ii. Predict whether the customer is EMI worthy or not​ ​ ​ ​ ​ (2)​
iii. Provide the confusion matrix and score​ ​ ​ ​ ​ ​ ​ (2)​
​ ​ ​

O
Q2. Answer the following ​​ ​ ​ ​ ​ ​ ​ 15 Marks

A. ​ ​ ​ ​ ​ ​ ​ ​
Generate a random dataset using the below code:​ ​ C ​



​ 5 Marks​
E
D
i. X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) (1)

R

ii. Plot the dataset.​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (1)



VA

iii. Apply K Means clustering with suitable number of clusters​ ​ ​ ​ (3)​

​ ​ ​
B. ​ ​ 5 Marks ​
AR

Apply Principal Component Analysis on “diamonds.csv” to derive 3 principal components.

​ ​ ​
C. ​ ​ 5 Marks​
TK

Load the covid_19_india dataset in python and perform the below mentioned steps​ ​
i. Provide the summarised view of “Cured","Deaths","Confirmed" cases per state​ ​ (3)
PA

ii. Show no. of covid cases with respect to YYYYMM(Year-Month) on x-axis​ ​ (2)​

2 of 4
Q3. Answer the Following ​ ​ ​ ​ ​ ​ ​ 30 Marks​
​ ​ ​ ​ ​ ​ ​
A.
Predict “left” using the “HR_3A.csv”.​ ​ ​ ​ ​ ​ ​
​ ​
Below is the data dictionary:​

Satisfaction level: Satisfaction level of employee

E
Last Evaluation: Last Evaluation(Rating given by the manager)

EG
Number Project: Number of Projects done by the employee
Average Monthly Hours: No. Of hours worked employee worked monthly(average)
Time Spend Company: No. Of years employee worked in the organisation
Promotion Last 5 Years: 1= Promoted in last 5 years, 0= Did not get promoted in last 5 years

LL
Department: Department of the employee
Salary: Salary Scale of Employee(Low/Medium/High)
Left: (1=yes, 0=no)​

O
​ ​ ​
i.​ Load the dataset​ ​ ​ ​ ​ ​ ​ ​ (1)​

C
ii.​ Get the insights & Correlation for each column vs the output column (5)​
E
iii.​Do the outlier treatment & Null imputation if required. (2)​
D

iv.​Shortlist the most important features for predicting the “Left” (3)​
R

v.​ Split the data into features and target​ ​ (2)​


VA

vi.​Perform train test split with a ratio 20%​ ​ ​ ​ ​ ​ ​ (2)​

vii.​Define any 3 classifier models & Train the model on train dataset and predict the model on
AR

test dataset​ ​ ​ ​ ​ ​ (5)​

viii.​ Calculate the accuracy of the model​ ​ ​ ​ ​ ​ ​ (2)​


TK

ix.​Generate the classification report​ ​ ​ ​ ​ ​ ​ ​ (2)​

x.​ Generate the confusion matrix​ ​ ​ ​ ​ ​ ​ ​ (2)​


PA

xi.​Which model is the most suitable one in predicting the output column ​ ​ ​ (4)
​ ​ ​
​ ​ ​
OR

3 of 4
B.​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 30 Marks​
Predict “Car Purchase Amount” using the “Car_Purchasing_Data.csv”.
​ ​ ​
i.​ Load the dataset​ ​ ​ ​ ​ ​ ​ ​ (1)​

ii.​ Get the insights & Correlation for each column vs the output column (5)​

iii.​Do the outlier treatment & Null imputation if required. (2)​

E
EG
iv.​Shortlist the most important features for predicting the “Car Purchase amount” ​ (5)​

v.​ Split the data into features and target​ ​ (2)​

LL
vi.​Perform train test split with a ratio 20%​ ​ ​ ​ ​ ​ ​ (2)​

vii.​Define any 3 regression models & Train the model on train dataset and predict the model on

O
test dataset​ ​ ​ ​ ​ ​ (5)​

C
viii.​ Calculate the accuracy of the model​ ​ ​ ​ ​ ​ ​ (2)​

ix.​Which model is the most suitable one in predicting the output column (6)
E
D
R
VA
AR
TK
PA

4 of 4

You might also like