0% found this document useful (0 votes)
31 views2 pages

Assess Your Project Knowledge

Uploaded by

Saadaoui Mayssa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views2 pages

Assess Your Project Knowledge

Uploaded by

Saadaoui Mayssa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assess Your Project Knowledge

Question 1

Which of the following is a technique used in data preprocessing? (select all that apply)

 Filling up the missing values


 Data exploration
 Model training
 Removing outliers

Question 2

What is the purpose of exploratory data analysis?

 To build a machine learning model


 To transform the data
 To identify patterns and trends in the data
 To clean the data

Question 3

Which method in PySpark DataFrames is used to print column data types in a DataFrame?

 show()
 printSchema()
 describe()
 head()

Question 4

What is the purpose of the Train-Test split method for training a machine learning model?
(select all that apply)

 To evaluate the performance of a model on unseen data


 To preprocess the data before training a model
 To handle missing values in the data
 To create a model using only a subset of the available data

Question 5

What is Overfitting in machine learning?

 The model performs poorly on both training and test data


 The model is too simple and doesn't fit the data well
 The model is too simple and fits the noise in the data
 The model performs well on the training data but poorly on the test data

Question 6

What is Feature Importance in machine learning?

 A measure of how complex a model is


 A measure of how many features are in a dataset
 A measure of how much each feature contributes to the target variable
 A measure of how well a model performs on a test dataset

Question 7

What is the purpose of the area under ROC curve?

 It's a metric to measure the accuracy of a Linear Regression model


 It's a metric to measure the accuracy of a Binary Classification model
 It's a metric to measure the error of a Binary Classification model
 It's a hyper parameter for a decision tree classifier

Question 8

What the purpose of String Indexer

 To convert string values in categorical features to feature vectors


 To convert string values in categorical features to lowercase
 To convert string values in categorical features to unique numerical values
 To convert string values in categorical features to upercase

Question 9

What are the steps needed to be taken to prepare the Numerical Features for PySpark
machine learning model?

 Finding the outliers


 Vector Assembling
 String Indexing
 Standard Scaling

Question 10

How do you find the count of each unique value in a categorical column in a dataframe called
df?

 df.groupby(column_name).count()
 df.uniquecount(column_name)
 df.uniques(column_name)
 df.count(column_name)

You might also like