0% found this document useful (0 votes)

20 views15 pages

Ai ML Unit 1

The document consists of multiple-choice questions (MCQs) focused on data and feature engineering concepts, including data vs. information, types of data, data labeling, feature selection, and feature extraction. It covers essential topics such as the importance of labeled data in supervised learning, various feature selection techniques, and the role of dimensionality reduction in machine learning. The questions aim to test understanding and knowledge of key principles and methods in data science and machine learning.

Uploaded by

rachitmadhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Ai ML Unit 1

Uploaded by

rachitmadhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

MCQs on Data & Feature Engineering

1. Data vs. Information

1. What is the key difference between data and information?

o a) Data is processed, whereas information is raw

o b) Data is raw facts, whereas information is processed data ✅

o c) Data is always numerical, whereas information is textual

o d) There is no difference

2. Which of the following best describes information?

o a) Unorganized raw numbers

o b) Processed data that is meaningful ✅

o c) A collection of random symbols

o d) All of the above

3. In AI, what is an essential step before converting data into information?

o a) Preprocessing and cleaning ✅

o b) Ignoring missing values

o c) Using only numerical data

o d) None of the above

2. Types of Data

4. Which of the following is an example of numerical data?

o a) Customer reviews

o b) Temperature readings ✅

o c) Social media posts

o d) Gender

5. Discrete numerical data represents:

o a) Continuous values
o b) Countable values ✅

o c) Infinite possible values

o d) All of the above

6. What type of data is "Age in years"?

o a) Discrete numerical ✅

o b) Continuous numerical

o c) Ordinal

o d) Nominal

7. Which of the following is an example of continuous numerical data?

o a) Number of students in a class

o b) Height of a person ✅

o c) Zip code

o d) Gender

8. What type of data is "Education level (Primary, Secondary, Tertiary)"?

o a) Nominal

o b) Ordinal ✅

o c) Discrete

o d) Continuous

9. Which of the following is an example of categorical data?

o a) Blood pressure readings

o b) Eye color ✅

o c) Number of books sold

o d) Temperature in Celsius

10. Which category does time series data fall into?

 a) Discrete numerical

 b) Continuous numerical ✅
 c) Nominal

 d) Unstructured

3. Data Labeling

11. What is data labeling?

 a) Process of assigning categories to unstructured data

 b) Process of manually tagging data for supervised learning ✅

 c) A method to remove outliers

 d) None of the above

12. Labeled data is essential for:

 a) Supervised learning ✅

 b) Unsupervised learning

 c) Reinforcement learning

 d) Clustering

13. Which of the following is NOT a challenge in data labeling?

 a) Cost and time

 b) Labeling errors

 c) Data duplication ✅

 d) Subjectivity in labeling

4. Feature and Feature Selection

14. In machine learning, what is a feature?

 a) An attribute or characteristic of data ✅

 b) A type of model

 c) A type of supervised learning algorithm

 d) None of the above

15. Why is feature selection important?

 a) Reduces overfitting ✅

 b) Improves model accuracy ✅

 c) Enhances model interpretability ✅

 d) All of the above ✅

16. Which of the following is NOT a feature selection method?

 a) Sequential forward selection

 b) Data augmentation ✅

 c) Bidirectional feature selection

 d) Sequential backward selection

5. Feature Selection Algorithms

17. What is the goal of feature selection?

 a) To increase model complexity

 b) To select the most relevant features while reducing redundancy ✅

 c) To add more irrelevant features

 d) To increase the number of features

18. Which algorithm is used in feature selection?

 a) Recursive Feature Elimination (RFE) ✅

 b) K-Means

 c) Decision Trees

 d) Random Forest

19. Which feature selection method starts with an empty feature set and adds features one
by one?

 a) Sequential Forward Selection (SFS) ✅

 b) Sequential Backward Selection

 c) Random Selection

 d) Genetic Algorithm

20. In Sequential Backward Selection (SBS), how are features removed?

 a) One by one starting with the most important feature

 b) One by one starting with the least important feature ✅

 c) Randomly

 d) Based on feature correlation

21. What is bidirectional feature selection?

 a) Selecting features randomly

 b) A combination of forward and backward selection methods ✅

 c) A supervised learning algorithm

 d) None of the above

6. Feature Extraction

22. What is feature extraction?

 a) Transforming raw data into a set of meaningful features ✅

 b) Randomly selecting features

 c) Removing missing values

 d) None of the above

23. Principal Component Analysis (PCA) is used for:

 a) Feature extraction ✅

 b) Feature selection

 c) Data labeling

 d) Data augmentation

24. What is the main advantage of feature extraction?

 a) Reduces dimensionality while preserving important information ✅

 b) Increases computational complexity

 c) Adds irrelevant features

 d) None of the above

25. Which of the following is NOT a feature extraction technique?

 a) Principal Component Analysis

 b) Linear Discriminant Analysis

 c) K-Means Clustering ✅

 d) Autoencoders

More MCQs

26. Which type of data is commonly used in time-series analysis?

 a) Continuous numerical ✅

 b) Discrete numerical

 c) Ordinal

 d) Nominal

27. Which machine learning approach benefits the most from labeled data?

 a) Supervised Learning ✅

 b) Unsupervised Learning

 c) Reinforcement Learning

 d) Semi-supervised Learning

28. What is a major challenge in feature selection?

 a) Computational cost for high-dimensional data ✅

 b) Low memory usage

 c) Lack of machine learning models

 d) Removing missing values

29. What is the role of correlation in feature selection?

 a) Helps identify redundant features ✅

 b) Increases computational complexity

 c) Reduces model accuracy

 d) None of the above

30. Feature engineering is important because:

 a) It improves model performance and interpretability ✅

 b) It reduces the need for training

 c) It increases data redundancy

 d) It is not necessary in AI

31. Which of the following is an example of ordinal data?

 a) Customer satisfaction ratings (e.g., Poor, Average, Good, Excellent) ✅

 b) Eye color

 c) Credit card number

 d) Phone number

32. In data preprocessing, what is the purpose of normalization?

 a) To scale data within a specific range (e.g., 0 to 1 or -1 to 1) ✅

 b) To increase the dimensionality of data

 c) To remove missing values

 d) To replace categorical variables

33. One-hot encoding is commonly used for:

 a) Converting categorical data into numerical format ✅

 b) Normalization

 c) Data cleaning

 d) Handling missing values

34. Which method is commonly used to handle missing data?

 a) Imputation with mean, median, or mode ✅

 b) Removing all missing values

 c) Ignoring missing values

 d) Feature scaling

32. Data Labeling & Supervised Learning

35. Data labeling is crucial for which type of machine learning?

 a) Supervised learning ✅

 b) Unsupervised learning

 c) Reinforcement learning

 d) Self-learning

36. What is weakly labeled data?

 a) Data that has labels generated automatically with some errors ✅

 b) Data with no labels

 c) Data with perfect manual labeling

 d) Completely irrelevant data

37. Which AI field benefits the most from semi-supervised learning?

 a) Medical image classification (few labeled images available) ✅

 b) Clustering algorithms

 c) Reinforcement learning in games

 d) Search engine indexing

33. Feature Engineering & Selection

38. What is the main purpose of feature selection?

 a) To remove redundant or irrelevant features to improve model performance ✅

 b) To add more features to improve accuracy

 c) To reduce training time by increasing dimensionality

 d) To make the dataset more complex

39. Recursive Feature Elimination (RFE) works by:

 a) Removing the least important feature iteratively ✅

 b) Adding new features iteratively

 c) Clustering the most important features

 d) Selecting features randomly

40. What is feature extraction mainly used for?

 a) Transforming raw data into meaningful features ✅

 b) Increasing model complexity

 c) Ignoring missing data

 d) Converting numerical data to categorical

41. Which of the following is an advantage of feature extraction?

 a) Reduces dimensionality and computational cost ✅

 b) Increases the number of features

 c) Increases overfitting

 d) Makes the dataset more complex

34. Feature Selection Techniques

42. What is a filter-based feature selection method?

 a) A technique that selects features based on statistical tests ✅

 b) A technique that selects features based on model performance

 c) A method that selects features randomly

 d) A method that removes missing values

43. Information Gain is commonly used in:

 a) Feature selection for decision trees ✅

 b) Feature extraction
 c) Clustering algorithms

 d) Neural networks

44. Which feature selection method ranks features based on correlation?

 a) Pearson’s correlation coefficient ✅

 b) Principal Component Analysis

 c) K-Means Clustering

 d) Autoencoders

45. The curse of dimensionality refers to:

 a) A situation where too many features reduce model performance ✅

 b) When data preprocessing fails

 c) When supervised learning is not applicable

 d) A type of machine learning model

35. Feature Engineering in AI

46. Which technique is used for feature engineering in deep learning?

 a) Autoencoders ✅

 b) Decision trees

 c) One-hot encoding

 d) Pearson correlation

47. Why is dimensionality reduction important in machine learning?

 a) To remove noise and improve computational efficiency ✅

 b) To increase feature space

 c) To slow down training speed

 d) To reduce model accuracy

48. In text-based AI models, what is the most commonly used feature extraction technique?

 a) TF-IDF (Term Frequency-Inverse Document Frequency) ✅

 b) Mean normalization

 c) Data augmentation

 d) Variance thresholding

49. Principal Component Analysis (PCA) works by:

 a) Finding new axes that maximize variance while reducing dimensions ✅

 b) Removing missing values

 c) Sorting features alphabetically

 d) Selecting the most correlated features

50. Which machine learning models automatically perform feature selection?

 a) Decision Trees & Random Forest ✅

 b) K-Means Clustering

 c) Neural Networks

 d) Principal Component Analysis

51. Which of the following is an example of unstructured data?

 a) Social media posts ✅

 b) Temperature readings

 c) Customer age data

 d) ZIP codes

52. Which technique is used to handle outliers in a dataset?

 a) Winsorization (Capping extreme values) ✅

 b) Min-Max Scaling

 c) One-hot encoding

 d) Removing missing values

53. Which of the following statements is true about categorical data?

 a) Categorical data can be nominal or ordinal ✅

 b) Categorical data must always be numerical

 c) One-hot encoding is not applicable to categorical data

 d) Ordinal data has no meaningful order

54. What is the main problem when working with imbalanced datasets?

 a) The model may be biased toward the majority class ✅

 b) The dataset contains too many features

 c) Missing values increase

 d) Feature selection becomes impossible

37. Data Labeling & Feature Engineering

55. In supervised learning, the quality of data labels directly affects:

 a) Model performance and accuracy ✅

 b) Feature extraction only

 c) The number of missing values

 d) Unsupervised learning models

56. A dataset with multiple dependent variables is known as:

 a) Multivariate data ✅

 b) Bivariate data

 c) Univariate data

 d) Structured data

57. Which of the following is a disadvantage of manual data labeling?

 a) Time-consuming and expensive ✅

 b) Not applicable to supervised learning

 c) Requires high computational power

 d) Unnecessary for high-dimensional data

58. Which term describes the process of generating new labeled training data from existing
data?
 a) Data augmentation ✅

 b) Feature selection

 c) Principal Component Analysis

 d) Clustering

38. Feature Selection & Feature Extraction

59. The primary goal of feature selection is to:

 a) Identify the most relevant features while removing redundant ones ✅

 b) Increase the number of features in a dataset

 c) Reduce the training set size

 d) Convert numerical data into categorical data

60. What is the difference between feature selection and feature extraction?

 a) Feature selection removes redundant features, whereas feature extraction

transforms data into new features ✅

 b) Feature selection is only for categorical data

 c) Feature extraction is used in classification but not regression

 d) Both are the same

61. Which algorithm can perform automatic feature selection?

 a) LASSO Regression ✅

 b) K-Means Clustering

 c) K-Nearest Neighbors

 d) Principal Component Analysis

62. Which of the following is NOT a feature selection method?

 a) Cross-validation ✅

 b) Recursive Feature Elimination

 c) Information Gain
 d) Mutual Information

39. Feature Engineering in Machine Learning

63. What is one of the key benefits of feature engineering?

 a) Improves model performance by creating meaningful features ✅

 b) Increases overfitting risk

 c) Reduces dataset size

 d) Prevents data cleaning

64. When should feature scaling be applied?

 a) When using distance-based algorithms like KNN and SVM ✅

 b) When working with categorical data

 c) Only when dealing with large datasets

 d) Feature scaling is not necessary in machine learning

65. What is an important step when performing text feature engineering?

 a) Tokenization and vectorization ✅

 b) One-hot encoding

 c) Feature scaling

 d) Min-Max Normalization

40. Advanced Feature Engineering Techniques

66. Principal Component Analysis (PCA) is most useful when:

 a) There are many correlated features in a dataset ✅

 b) The dataset is already well-structured

 c) We want to perform data labeling

 d) The dataset has missing values

67. What is the main purpose of t-SNE (t-Distributed Stochastic Neighbor Embedding)?
 a) Dimensionality reduction for data visualization ✅

 b) Feature selection

 c) Text classification

 d) Clustering

68. Which feature selection method is commonly used for handling high-dimensional data?

 a) LASSO Regression ✅

 b) K-Means

 c) Decision Trees

 d) SVM

69. In time series feature engineering, which of the following is a commonly used feature?

 a) Rolling mean ✅

 b) One-hot encoding

 c) Word embeddings

 d) Data augmentation

41. Feature Engineering in Deep Learning & AI

70. How does deep learning handle feature engineering?

 a) It learns features automatically from raw data ✅

 b) Requires manual feature selection

 c) Uses predefined statistical tests

 d) Ignores feature extraction

Illustrated Microsoft Office 365 and Word 2016 Comprehensive 1st Edition
No ratings yet
Illustrated Microsoft Office 365 and Word 2016 Comprehensive 1st Edition
405 pages
Game Engine Gems 2 1st Edition Eric Lengyel Instant Download
100% (3)
Game Engine Gems 2 1st Edition Eric Lengyel Instant Download
81 pages
Parrot OS Tools
No ratings yet
Parrot OS Tools
56 pages
Nocd Just Replace The Original Oald8.exe in The Folder of Dictionary Installation
No ratings yet
Nocd Just Replace The Original Oald8.exe in The Folder of Dictionary Installation
1 page
AccurioPress C4080 C4070 CEG
100% (1)
AccurioPress C4080 C4070 CEG
54 pages
Tunmi Project
No ratings yet
Tunmi Project
51 pages
What Is CRM
No ratings yet
What Is CRM
4 pages
Ai ML
No ratings yet
Ai ML
12 pages
Data Science 100 MCQs
No ratings yet
Data Science 100 MCQs
16 pages
IT602
No ratings yet
IT602
13 pages
Data Struture and Alghorithem
No ratings yet
Data Struture and Alghorithem
46 pages
Data Science Quiz Questions
No ratings yet
Data Science Quiz Questions
7 pages
Datasheet Modem 6 Transceiver (Surface)
No ratings yet
Datasheet Modem 6 Transceiver (Surface)
2 pages
Digital Literacy Skills Framework Accessible FSSP Edits October 2021
No ratings yet
Digital Literacy Skills Framework Accessible FSSP Edits October 2021
47 pages
Stacks Notes
No ratings yet
Stacks Notes
21 pages
Downloads F Biomassehacker en
No ratings yet
Downloads F Biomassehacker en
8 pages
Imp Qs
No ratings yet
Imp Qs
10 pages
Raja Shankar Shah University, Chhindwara (M.P.)
No ratings yet
Raja Shankar Shah University, Chhindwara (M.P.)
2 pages
KX-21N Operator's Manual N American 05.2010
No ratings yet
KX-21N Operator's Manual N American 05.2010
386 pages
Unit4 Mcqs
No ratings yet
Unit4 Mcqs
7 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
Ai ML Unit 3
No ratings yet
Ai ML Unit 3
15 pages
Image Classification
No ratings yet
Image Classification
4 pages
STEM Kits: Build & Have Fun With Makerzoid STEM Coding Kits
No ratings yet
STEM Kits: Build & Have Fun With Makerzoid STEM Coding Kits
1 page
Final Quiz Statistical Modeling ML Ai
No ratings yet
Final Quiz Statistical Modeling ML Ai
15 pages
Unit 1 - Capstone Project-Answer Key
No ratings yet
Unit 1 - Capstone Project-Answer Key
21 pages
MCQ 3 Aiml
No ratings yet
MCQ 3 Aiml
2 pages
Chapter 2 - Network Basics
No ratings yet
Chapter 2 - Network Basics
64 pages
Machine Learning Question Bank
No ratings yet
Machine Learning Question Bank
7 pages
Remote Work New Normal Communication Challenges
No ratings yet
Remote Work New Normal Communication Challenges
7 pages
MCQS ML
No ratings yet
MCQS ML
27 pages
Data Mining Algorithms MCQs
No ratings yet
Data Mining Algorithms MCQs
9 pages
Khoi KHDL - de On
No ratings yet
Khoi KHDL - de On
6 pages
Ds
No ratings yet
Ds
22 pages
NASHEEEEYYYYYY
No ratings yet
NASHEEEEYYYYYY
30 pages
Unit 4 Chapter 11 Test
No ratings yet
Unit 4 Chapter 11 Test
2 pages
Quiz 1
No ratings yet
Quiz 1
3 pages
UNIT 1-Capstone Project Practice Questions
No ratings yet
UNIT 1-Capstone Project Practice Questions
14 pages
Sem3 Asmt Answers
No ratings yet
Sem3 Asmt Answers
20 pages
Capstone Project
No ratings yet
Capstone Project
17 pages
Michael Dubois III: Dataclay - Motion Graphics Artist
No ratings yet
Michael Dubois III: Dataclay - Motion Graphics Artist
2 pages
AI Revision Questions
No ratings yet
AI Revision Questions
15 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
AIL Quiz
No ratings yet
AIL Quiz
30 pages
Unit 4 - Question Bank
No ratings yet
Unit 4 - Question Bank
11 pages
CAPSTONE
No ratings yet
CAPSTONE
16 pages
Digital Systems Design Using VHDL
No ratings yet
Digital Systems Design Using VHDL
1 page
E Advanced Service Functional Blocks C-Arm C-Arm
100% (3)
E Advanced Service Functional Blocks C-Arm C-Arm
78 pages
Test DS
No ratings yet
Test DS
7 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
No ratings yet
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
8 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
Unit 3 Question Bank
No ratings yet
Unit 3 Question Bank
8 pages
Exam 1
No ratings yet
Exam 1
3 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
Mcqs Unit 3
No ratings yet
Mcqs Unit 3
6 pages
PFISTER
No ratings yet
PFISTER
1,238 pages
Data Mining
No ratings yet
Data Mining
8 pages
Fetal Brain Ultrasound Image Classification Using Deep Learning
100% (1)
Fetal Brain Ultrasound Image Classification Using Deep Learning
5 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
6 pages
Machine Learning MCQ
No ratings yet
Machine Learning MCQ
11 pages
AIL Quiz Loc
No ratings yet
AIL Quiz Loc
33 pages
BE Information Technology 0
No ratings yet
BE Information Technology 0
655 pages
Rizwan Shoukat PDF
No ratings yet
Rizwan Shoukat PDF
3 pages
Lecture 3 Mcqs
No ratings yet
Lecture 3 Mcqs
7 pages
DMW MCQ
No ratings yet
DMW MCQ
388 pages
HCIA-Intelligent Computing V1.0 Lab Guide
No ratings yet
HCIA-Intelligent Computing V1.0 Lab Guide
213 pages
Practice MCQ AI
No ratings yet
Practice MCQ AI
4 pages
Topcoater Elite Series 5: Manual Powder Cart
No ratings yet
Topcoater Elite Series 5: Manual Powder Cart
2 pages
Questions and Answers
No ratings yet
Questions and Answers
7 pages
R2032051
No ratings yet
R2032051
7 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Quiz and Mid Paper Data
No ratings yet
Quiz and Mid Paper Data
31 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
Data Science Questions
No ratings yet
Data Science Questions
5 pages
Department: Lab Manual
No ratings yet
Department: Lab Manual
36 pages
Vn800 Service Manual
No ratings yet
Vn800 Service Manual
405 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
100% (1)
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
6 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML QB Ans
No ratings yet
ML QB Ans
48 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
MCQ of Machine Learning
100% (2)
MCQ of Machine Learning
151 pages
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
100% (6)
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
31 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
MCQQQQQQQQQ
No ratings yet
MCQQQQQQQQQ
35 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
Neo4j Graph Data Science Certified - Exam Practice Tests
From Everand
Neo4j Graph Data Science Certified - Exam Practice Tests
Cristian Scutaru
No ratings yet
Machine Learning Imp Questions
100% (2)
Machine Learning Imp Questions
95 pages