0% found this document useful (0 votes)
14 views3 pages

Exam 1

Uploaded by

usman gujjer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views3 pages

Exam 1

Uploaded by

usman gujjer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Which technique converts categorical variables into a form that can be provided to

ML models?
A) Normalization
B) One-hot encoding
C) PCA
D) Correlation analysis

What is the first step typically taken when beginning to clean a new dataset?
A) Apply normalization
B) Remove duplicate entries
C) Conduct an initial data assessment
D) Encode categorical variables

Why might you use target encoding for categorical variables?


A) To preserve the ordinal relationship between categories
B) To reduce the computational cost of the model
C) To incorporate the mean of the target variable into the feature
D) To increase the number of features

Which method is suggested for handling missing data in a dataset?


A) Deleting all rows with any missing data
B) Filling missing values with the mean or median
C) Using a predictive model to fill in missing values
D) Ignoring the presence of missing data during model training

What is the benefit of synthesizing date-related features from timestamp data?


A) Reduces the overall dataset size
B) Helps in identifying trends over time
C) Simplifies the model’s architecture
D) Reduces training time significantly

What role does one-hot encoding play in feature preparation?


A) It scales the data to a fixed range
B) It transforms categorical data into binary variables
C) It combines multiple features into one
D) It reduces the number of categories

How does injecting external information into a dataset typically affect model
performance?
A) Decreases accuracy due to increased complexity
B) No impact, as external data is usually irrelevant
C) Can increase accuracy by adding relevant contextual details
D) Always leads to overfitting

What is a common approach to determine the initial performance of a new dataset?


A) Building a complex model
B) Setting up a baseline model
C) Performing deep learning techniques
D) Implementing reinforcement learning

How is dealing with ordinal categorical data different from nominal?


A) It is encoded using binary encoding
B) It requires the categories to maintain an order
C) It is always treated as continuous data
D) It does not require encoding

What is an effective strategy to handle large datasets with many missing values?
A) Use only complete cases for analysis
B) Create synthetic data to fill gaps
C) Apply a robust scaler
D) Use imputation techniques based on the data distribution

What impact does feature engineering have on machine learning models?


A) It generally decreases model accuracy
B) It has no impact on model performance
C) It can significantly improve model performance
D) It increases the computational load without benefits

What does synthesizing numeric features from categorical data involve?


A) Directly mapping numbers to categories based on their frequency
B) Extracting numerical value from categorical descriptions
C) Assigning random numbers to categories
D) Generating statistical summaries for each category

Which encoding technique might lead to a dimensionality explosion in large


datasets?
A) Binary encoding
B) Label encoding
C) One-hot encoding
D) Hashing

Why might log transformation be used on a continuous variable?


A) To normalize the data
B) To encode categorical attributes
C) To handle underflows in computations
D) To correct multicollinearity

What is the primary goal of creating a baseline model?


A) To win machine learning competitions
B) To serve as a reference point for model improvement
C) To deploy the model into production
D) To test different types of neural networks

What does splitting apart complex descriptions into usable features typically
involve?
A) Separating text into different data types
B) Extracting key phrases using NLP techniques
C) Creating features based on the length of descriptions
D) Parsing structured data from unstructured text

How does feature engineering affect the training time of a model?


A) Increases significantly due to more data processing
B) Decreases as models become more efficient
C) No change, as models are independent of features
D) Varies depending on the type of features created

Which technique is used to prevent model overfitting?


A) Increasing the number of features
B) Using simpler baseline models
C) Applying regularization methods
D) Enhancing model training time

Short Answer Questions


Explain the concept of 'baseline model' and its importance in the model development
process.

Describe how external neighborhood information can be used in feature engineering


to enhance a model's predictive accuracy.

Answers:

B) One-hot encoding
C) Conduct an initial data assessment
C) To incorporate the mean of the target variable into the feature
B) Filling missing values with the mean or median
B) Helps in identifying trends over time
B) It transforms categorical data into binary variables
C) Can increase accuracy by adding relevant contextual details
B) Setting up a baseline model
B) It requires the categories to maintain an order
D) Use imputation techniques based on the data distribution
C) It can significantly improve model performance
B) Extracting numerical value from categorical descriptions
C) One-hot encoding
A) To normalize the data
B) To serve as a reference point for model improvement
D) Parsing structured data from unstructured text
A) Increases significantly due to more data processing
C) Applying regularization methods

Short Answer Questions Answers

Concept of 'Baseline Model':


A baseline model is a simple initial model that is set up at the beginning of the
machine learning workflow to serve as a reference point for all subsequent modeling
efforts. Its primary purpose is to provide a benchmark to compare the performance
of more complex models. By establishing a baseline, data scientists can measure the
incremental value added by more sophisticated algorithms and feature engineering.
This helps in understanding the effectiveness of different approaches and ensuring
that any increases in model complexity are justified by substantial improvements in
performance.

Using External Neighborhood Information in Feature Engineering:


External neighborhood information can significantly enhance a model’s predictive
accuracy by adding context that is not available from the internal dataset alone.
For instance, in real estate pricing models, incorporating neighborhood crime
rates, school district quality, and public transportation accessibility can provide
more accurate predictions of housing prices. This additional information helps the
model capture variations in prices due to external factors, leading to a more
nuanced understanding of the data. Feature engineering with external data involves
creating new features or modifying existing ones based on this external
information, which can be integrated during the data preprocessing stage.

You might also like