0% found this document useful (0 votes)
20 views15 pages

Ai ML Unit 1

The document consists of multiple-choice questions (MCQs) focused on data and feature engineering concepts, including data vs. information, types of data, data labeling, feature selection, and feature extraction. It covers essential topics such as the importance of labeled data in supervised learning, various feature selection techniques, and the role of dimensionality reduction in machine learning. The questions aim to test understanding and knowledge of key principles and methods in data science and machine learning.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

Ai ML Unit 1

The document consists of multiple-choice questions (MCQs) focused on data and feature engineering concepts, including data vs. information, types of data, data labeling, feature selection, and feature extraction. It covers essential topics such as the importance of labeled data in supervised learning, various feature selection techniques, and the role of dimensionality reduction in machine learning. The questions aim to test understanding and knowledge of key principles and methods in data science and machine learning.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

MCQs on Data & Feature Engineering

1. Data vs. Information

1. What is the key difference between data and information?

o a) Data is processed, whereas information is raw

o b) Data is raw facts, whereas information is processed data ✅

o c) Data is always numerical, whereas information is textual

o d) There is no difference

2. Which of the following best describes information?

o a) Unorganized raw numbers

o b) Processed data that is meaningful ✅

o c) A collection of random symbols

o d) All of the above

3. In AI, what is an essential step before converting data into information?

o a) Preprocessing and cleaning ✅

o b) Ignoring missing values

o c) Using only numerical data

o d) None of the above

2. Types of Data

4. Which of the following is an example of numerical data?

o a) Customer reviews

o b) Temperature readings ✅

o c) Social media posts

o d) Gender

5. Discrete numerical data represents:

o a) Continuous values
o b) Countable values ✅

o c) Infinite possible values

o d) All of the above

6. What type of data is "Age in years"?

o a) Discrete numerical ✅

o b) Continuous numerical

o c) Ordinal

o d) Nominal

7. Which of the following is an example of continuous numerical data?

o a) Number of students in a class

o b) Height of a person ✅

o c) Zip code

o d) Gender

8. What type of data is "Education level (Primary, Secondary, Tertiary)"?

o a) Nominal

o b) Ordinal ✅

o c) Discrete

o d) Continuous

9. Which of the following is an example of categorical data?

o a) Blood pressure readings

o b) Eye color ✅

o c) Number of books sold

o d) Temperature in Celsius

10. Which category does time series data fall into?

 a) Discrete numerical

 b) Continuous numerical ✅
 c) Nominal

 d) Unstructured

3. Data Labeling

11. What is data labeling?

 a) Process of assigning categories to unstructured data

 b) Process of manually tagging data for supervised learning ✅

 c) A method to remove outliers

 d) None of the above

12. Labeled data is essential for:

 a) Supervised learning ✅

 b) Unsupervised learning

 c) Reinforcement learning

 d) Clustering

13. Which of the following is NOT a challenge in data labeling?

 a) Cost and time

 b) Labeling errors

 c) Data duplication ✅

 d) Subjectivity in labeling

4. Feature and Feature Selection

14. In machine learning, what is a feature?

 a) An attribute or characteristic of data ✅

 b) A type of model

 c) A type of supervised learning algorithm

 d) None of the above


15. Why is feature selection important?

 a) Reduces overfitting ✅

 b) Improves model accuracy ✅

 c) Enhances model interpretability ✅

 d) All of the above ✅

16. Which of the following is NOT a feature selection method?

 a) Sequential forward selection

 b) Data augmentation ✅

 c) Bidirectional feature selection

 d) Sequential backward selection

5. Feature Selection Algorithms

17. What is the goal of feature selection?

 a) To increase model complexity

 b) To select the most relevant features while reducing redundancy ✅

 c) To add more irrelevant features

 d) To increase the number of features

18. Which algorithm is used in feature selection?

 a) Recursive Feature Elimination (RFE) ✅

 b) K-Means

 c) Decision Trees

 d) Random Forest

19. Which feature selection method starts with an empty feature set and adds features one
by one?

 a) Sequential Forward Selection (SFS) ✅

 b) Sequential Backward Selection


 c) Random Selection

 d) Genetic Algorithm

20. In Sequential Backward Selection (SBS), how are features removed?

 a) One by one starting with the most important feature

 b) One by one starting with the least important feature ✅

 c) Randomly

 d) Based on feature correlation

21. What is bidirectional feature selection?

 a) Selecting features randomly

 b) A combination of forward and backward selection methods ✅

 c) A supervised learning algorithm

 d) None of the above

6. Feature Extraction

22. What is feature extraction?

 a) Transforming raw data into a set of meaningful features ✅

 b) Randomly selecting features

 c) Removing missing values

 d) None of the above

23. Principal Component Analysis (PCA) is used for:

 a) Feature extraction ✅

 b) Feature selection

 c) Data labeling

 d) Data augmentation

24. What is the main advantage of feature extraction?

 a) Reduces dimensionality while preserving important information ✅


 b) Increases computational complexity

 c) Adds irrelevant features

 d) None of the above

25. Which of the following is NOT a feature extraction technique?

 a) Principal Component Analysis

 b) Linear Discriminant Analysis

 c) K-Means Clustering ✅

 d) Autoencoders

More MCQs

26. Which type of data is commonly used in time-series analysis?

 a) Continuous numerical ✅

 b) Discrete numerical

 c) Ordinal

 d) Nominal

27. Which machine learning approach benefits the most from labeled data?

 a) Supervised Learning ✅

 b) Unsupervised Learning

 c) Reinforcement Learning

 d) Semi-supervised Learning

28. What is a major challenge in feature selection?

 a) Computational cost for high-dimensional data ✅

 b) Low memory usage

 c) Lack of machine learning models

 d) Removing missing values

29. What is the role of correlation in feature selection?


 a) Helps identify redundant features ✅

 b) Increases computational complexity

 c) Reduces model accuracy

 d) None of the above

30. Feature engineering is important because:

 a) It improves model performance and interpretability ✅

 b) It reduces the need for training

 c) It increases data redundancy

 d) It is not necessary in AI

31. Which of the following is an example of ordinal data?

 a) Customer satisfaction ratings (e.g., Poor, Average, Good, Excellent) ✅

 b) Eye color

 c) Credit card number

 d) Phone number

32. In data preprocessing, what is the purpose of normalization?

 a) To scale data within a specific range (e.g., 0 to 1 or -1 to 1) ✅

 b) To increase the dimensionality of data

 c) To remove missing values

 d) To replace categorical variables

33. One-hot encoding is commonly used for:

 a) Converting categorical data into numerical format ✅

 b) Normalization

 c) Data cleaning

 d) Handling missing values

34. Which method is commonly used to handle missing data?

 a) Imputation with mean, median, or mode ✅


 b) Removing all missing values

 c) Ignoring missing values

 d) Feature scaling

32. Data Labeling & Supervised Learning

35. Data labeling is crucial for which type of machine learning?

 a) Supervised learning ✅

 b) Unsupervised learning

 c) Reinforcement learning

 d) Self-learning

36. What is weakly labeled data?

 a) Data that has labels generated automatically with some errors ✅

 b) Data with no labels

 c) Data with perfect manual labeling

 d) Completely irrelevant data

37. Which AI field benefits the most from semi-supervised learning?

 a) Medical image classification (few labeled images available) ✅

 b) Clustering algorithms

 c) Reinforcement learning in games

 d) Search engine indexing

33. Feature Engineering & Selection

38. What is the main purpose of feature selection?

 a) To remove redundant or irrelevant features to improve model performance ✅

 b) To add more features to improve accuracy

 c) To reduce training time by increasing dimensionality


 d) To make the dataset more complex

39. Recursive Feature Elimination (RFE) works by:

 a) Removing the least important feature iteratively ✅

 b) Adding new features iteratively

 c) Clustering the most important features

 d) Selecting features randomly

40. What is feature extraction mainly used for?

 a) Transforming raw data into meaningful features ✅

 b) Increasing model complexity

 c) Ignoring missing data

 d) Converting numerical data to categorical

41. Which of the following is an advantage of feature extraction?

 a) Reduces dimensionality and computational cost ✅

 b) Increases the number of features

 c) Increases overfitting

 d) Makes the dataset more complex

34. Feature Selection Techniques

42. What is a filter-based feature selection method?

 a) A technique that selects features based on statistical tests ✅

 b) A technique that selects features based on model performance

 c) A method that selects features randomly

 d) A method that removes missing values

43. Information Gain is commonly used in:

 a) Feature selection for decision trees ✅

 b) Feature extraction
 c) Clustering algorithms

 d) Neural networks

44. Which feature selection method ranks features based on correlation?

 a) Pearson’s correlation coefficient ✅

 b) Principal Component Analysis

 c) K-Means Clustering

 d) Autoencoders

45. The curse of dimensionality refers to:

 a) A situation where too many features reduce model performance ✅

 b) When data preprocessing fails

 c) When supervised learning is not applicable

 d) A type of machine learning model

35. Feature Engineering in AI

46. Which technique is used for feature engineering in deep learning?

 a) Autoencoders ✅

 b) Decision trees

 c) One-hot encoding

 d) Pearson correlation

47. Why is dimensionality reduction important in machine learning?

 a) To remove noise and improve computational efficiency ✅

 b) To increase feature space

 c) To slow down training speed

 d) To reduce model accuracy

48. In text-based AI models, what is the most commonly used feature extraction technique?

 a) TF-IDF (Term Frequency-Inverse Document Frequency) ✅


 b) Mean normalization

 c) Data augmentation

 d) Variance thresholding

49. Principal Component Analysis (PCA) works by:

 a) Finding new axes that maximize variance while reducing dimensions ✅

 b) Removing missing values

 c) Sorting features alphabetically

 d) Selecting the most correlated features

50. Which machine learning models automatically perform feature selection?

 a) Decision Trees & Random Forest ✅

 b) K-Means Clustering

 c) Neural Networks

 d) Principal Component Analysis

51. Which of the following is an example of unstructured data?

 a) Social media posts ✅

 b) Temperature readings

 c) Customer age data

 d) ZIP codes

52. Which technique is used to handle outliers in a dataset?

 a) Winsorization (Capping extreme values) ✅

 b) Min-Max Scaling

 c) One-hot encoding

 d) Removing missing values

53. Which of the following statements is true about categorical data?

 a) Categorical data can be nominal or ordinal ✅

 b) Categorical data must always be numerical


 c) One-hot encoding is not applicable to categorical data

 d) Ordinal data has no meaningful order

54. What is the main problem when working with imbalanced datasets?

 a) The model may be biased toward the majority class ✅

 b) The dataset contains too many features

 c) Missing values increase

 d) Feature selection becomes impossible

37. Data Labeling & Feature Engineering

55. In supervised learning, the quality of data labels directly affects:

 a) Model performance and accuracy ✅

 b) Feature extraction only

 c) The number of missing values

 d) Unsupervised learning models

56. A dataset with multiple dependent variables is known as:

 a) Multivariate data ✅

 b) Bivariate data

 c) Univariate data

 d) Structured data

57. Which of the following is a disadvantage of manual data labeling?

 a) Time-consuming and expensive ✅

 b) Not applicable to supervised learning

 c) Requires high computational power

 d) Unnecessary for high-dimensional data

58. Which term describes the process of generating new labeled training data from existing
data?
 a) Data augmentation ✅

 b) Feature selection

 c) Principal Component Analysis

 d) Clustering

38. Feature Selection & Feature Extraction

59. The primary goal of feature selection is to:

 a) Identify the most relevant features while removing redundant ones ✅

 b) Increase the number of features in a dataset

 c) Reduce the training set size

 d) Convert numerical data into categorical data

60. What is the difference between feature selection and feature extraction?

 a) Feature selection removes redundant features, whereas feature extraction


transforms data into new features ✅

 b) Feature selection is only for categorical data

 c) Feature extraction is used in classification but not regression

 d) Both are the same

61. Which algorithm can perform automatic feature selection?

 a) LASSO Regression ✅

 b) K-Means Clustering

 c) K-Nearest Neighbors

 d) Principal Component Analysis

62. Which of the following is NOT a feature selection method?

 a) Cross-validation ✅

 b) Recursive Feature Elimination

 c) Information Gain
 d) Mutual Information

39. Feature Engineering in Machine Learning

63. What is one of the key benefits of feature engineering?

 a) Improves model performance by creating meaningful features ✅

 b) Increases overfitting risk

 c) Reduces dataset size

 d) Prevents data cleaning

64. When should feature scaling be applied?

 a) When using distance-based algorithms like KNN and SVM ✅

 b) When working with categorical data

 c) Only when dealing with large datasets

 d) Feature scaling is not necessary in machine learning

65. What is an important step when performing text feature engineering?

 a) Tokenization and vectorization ✅

 b) One-hot encoding

 c) Feature scaling

 d) Min-Max Normalization

40. Advanced Feature Engineering Techniques

66. Principal Component Analysis (PCA) is most useful when:

 a) There are many correlated features in a dataset ✅

 b) The dataset is already well-structured

 c) We want to perform data labeling

 d) The dataset has missing values

67. What is the main purpose of t-SNE (t-Distributed Stochastic Neighbor Embedding)?
 a) Dimensionality reduction for data visualization ✅

 b) Feature selection

 c) Text classification

 d) Clustering

68. Which feature selection method is commonly used for handling high-dimensional data?

 a) LASSO Regression ✅

 b) K-Means

 c) Decision Trees

 d) SVM

69. In time series feature engineering, which of the following is a commonly used feature?

 a) Rolling mean ✅

 b) One-hot encoding

 c) Word embeddings

 d) Data augmentation

41. Feature Engineering in Deep Learning & AI

70. How does deep learning handle feature engineering?

 a) It learns features automatically from raw data ✅

 b) Requires manual feature selection

 c) Uses predefined statistical tests

 d) Ignores feature extraction

You might also like