0% found this document useful (0 votes)
35 views9 pages

Data Science MCQs

The document contains a set of multiple-choice questions (MCQs) related to data science concepts, focusing on data preprocessing, feature scaling, handling missing values, and techniques for dealing with class imbalance. Each question presents four options, from which the correct answer must be selected. The total time for completing the MCQs is 30 minutes, and the total marks available are 30.

Uploaded by

ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views9 pages

Data Science MCQs

The document contains a set of multiple-choice questions (MCQs) related to data science concepts, focusing on data preprocessing, feature scaling, handling missing values, and techniques for dealing with class imbalance. Each question presents four options, from which the correct answer must be selected. The total time for completing the MCQs is 30 minutes, and the total marks available are 30.

Uploaded by

ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Science MCQs

Time: 30 Minutes Marks: 30

[email protected] Switch account Saving…

* Indicates required question

Email *

[email protected]

Which of the following is a method for dealing with high cardinality categorical *
variables?

A) One-hot encoding

B) Min-Max Scaling

C) Frequency encoding

D) Imputation

What is the purpose of standardization in data preprocessing? *

A) To scale data to a range of 0 to 1

B) To remove outliers from the dataset

C) To ensure the data has a mean of 0 and a standard deviation of 1

D) To handle missing values

How does feature scaling help in machine learning models like k-NN or SVM? *

A) It reduces the dataset size

B) It prevents overfitting

C) It ensures that features contribute equally to distance calculations

D) It improves model interpretability


Which of the following is NOT a common technique to handle class imbalance in *
a dataset?

A) Oversampling

B) Undersampling

C) One-hot encoding

D) Synthetic data generation (SMOTE)

Why is it important to shuffle data before splitting it into training and testing sets? *

A) To remove outliers

B) To avoid any bias due to the order of the data

C) To improve model accuracy

D) To reduce dimensionality

What type of scaling method would you use for features that follow a normal *
distribution?

A) Min-Max scaling

B) Z-score normalization

C) One-hot encoding

D) Log scaling

Which data preprocessing step is essential when dealing with categorical *


variables in a linear regression model?

A) One-hot encoding

B) Min-Max Scaling

C) Log transformation

D) Imputation
Which technique is commonly used to handle missing data? *

A) One-hot encoding

B) Imputation

C) Dimensionality reduction

D) PCA

Which of the following can cause a machine learning model to perform poorly? *

A) Feature scaling

B) Feature engineering

C) Irrelevant or redundant features

D) Data splitting

Which of the following is NOT a data preprocessing step? *

A) Data normalization

B) Data augmentation

C) Model evaluation

D) Missing value imputation

Which of the following methods can be used to detect outliers in a dataset? *

A) Min-Max Scaling

B) Z-Score Method

C) One-hot encoding

D) Imputation
One-hot encoding is typically applied to which type of data? *

A) Numerical data

B) Ordinal data

C) Categorical data

D) Continuous data

In which situation would you apply dimensionality reduction techniques like PCA? *

A) When the dataset contains missing values

B) When the dataset contains a large number of correlated features

C) When you want to remove outliers

D) When the dataset has no categorical variables

Which of the following is a method to reduce overfitting in decision trees? *

A) Feature scaling

B) Pruning

C) Z-Score normalization

D) One-hot encoding

Which of the following is used to deal with multicollinearity in regression *


problems?

A) Standardization

B) L2 Regularization

C) One-hot encoding

D) Min-Max scaling
What is the result of applying Principal Component Analysis (PCA) on a dataset? *

A) Reduced number of features while retaining as much variance as possible

B) Elimination of duplicate rows in the dataset

C) Removal of outliers

D) Increase in the number of features

Which of the following is an example of feature extraction? *

A) Scaling numeric features

B) Transforming categorical data into numerical format

C) Using PCA to reduce feature dimensionality

D) Filling missing values in the dataset

Which of the following can be used to fill missing numerical values? *

A) Mean, median, or mode

B) One-hot encoding

C) PCA

D) Z-Score

What does it mean if a dataset is said to be “highly imbalanced”? *

A) The dataset contains a large number of features

B) One or more classes occur much more frequently than others

C) The dataset has many missing values

D) The dataset contains outliers


Which technique reduces the dimensionality of a dataset by creating new *
features based on the old ones?

A) Feature scaling

B) Feature selection

C) Feature extraction

D) Data augmentation

What is the main reason for splitting a dataset into training and testing sets? *

A) To improve the performance of the model

B) To prevent overfitting

C) To assess the model’s generalization ability

D) To generate more data

Min-Max scaling transforms the data by bringing all values between: *

A) 0 and 10

B) -1 and 1

C) 0 and 1

D) -10 and 10

How does SMOTE (Synthetic Minority Over-sampling Technique) handle *


imbalanced datasets?

A) By undersampling the majority class

B) By oversampling the majority class

C) By generating synthetic examples for the minority class

D) By removing outliers in the dataset


What is the primary goal of feature selection? *

A) To remove noise from the data

B) To select features that have the most impact on the target variable

C) To standardize features

D) To impute missing data

Which of the following is NOT a common strategy for dealing with missing data? *

A) Deleting rows with missing values

B) Filling missing values with zeros

C) Filling missing values using a machine learning model

D) Ignoring the missing values during training

What is the main purpose of data preprocessing in machine learning? *

A) To create new features

B) To improve the quality of the data

C) To discard irrelevant data

D) To balance the dataset

What is the primary function of the Box-Cox transformation? *

A) To reduce the number of features

B) To normalize a distribution to make it more Gaussian

C) To handle missing values

D) To encode categorical variables


When would you apply log transformation to a feature in a dataset? *

A) When the feature contains negative values

B) When the feature has a normal distribution

C) When the feature has a skewed distribution

D) When the feature is categorical

Which of the following is NOT a characteristic of robust scaling? *

A) It uses the median for centering the data

B) It scales the data based on percentiles

C) It is highly sensitive to outliers

D) It handles data with outliers better than Min-Max scaling

What is the main purpose of data normalization? *

A) To reduce the number of features

B) To encode categorical variables

C) To scale numeric features to a common range

D) To remove missing values

Submit Page 1 of 1 Clear form

This form was created inside of Indian Institute of Information Technology, Nagpur. Report Abuse

Forms

You might also like