Exam 1
Exam 1
ML models?
A) Normalization
B) One-hot encoding
C) PCA
D) Correlation analysis
What is the first step typically taken when beginning to clean a new dataset?
A) Apply normalization
B) Remove duplicate entries
C) Conduct an initial data assessment
D) Encode categorical variables
How does injecting external information into a dataset typically affect model
performance?
A) Decreases accuracy due to increased complexity
B) No impact, as external data is usually irrelevant
C) Can increase accuracy by adding relevant contextual details
D) Always leads to overfitting
What is an effective strategy to handle large datasets with many missing values?
A) Use only complete cases for analysis
B) Create synthetic data to fill gaps
C) Apply a robust scaler
D) Use imputation techniques based on the data distribution
What does splitting apart complex descriptions into usable features typically
involve?
A) Separating text into different data types
B) Extracting key phrases using NLP techniques
C) Creating features based on the length of descriptions
D) Parsing structured data from unstructured text
Answers:
B) One-hot encoding
C) Conduct an initial data assessment
C) To incorporate the mean of the target variable into the feature
B) Filling missing values with the mean or median
B) Helps in identifying trends over time
B) It transforms categorical data into binary variables
C) Can increase accuracy by adding relevant contextual details
B) Setting up a baseline model
B) It requires the categories to maintain an order
D) Use imputation techniques based on the data distribution
C) It can significantly improve model performance
B) Extracting numerical value from categorical descriptions
C) One-hot encoding
A) To normalize the data
B) To serve as a reference point for model improvement
D) Parsing structured data from unstructured text
A) Increases significantly due to more data processing
C) Applying regularization methods