We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7
1. What is the purpose of a scatter plot in data visualization?
a. Displaying the distribution of categorical variables
b. b) Showing the relationship between two numerical variables c. c) Highlighting the correlation between features d. d) Visualizing time series data 2. In machine learning, what is the main goal of dimensionality reduction? a. a) Increasing the number of features b. b) Improving model complexity c. c) Reducing the size of the dataset d. d) Capturing relevant information while reducing noise 3. Which technique is commonly used to address the issue of overfitting in machine learning? a. a) Regularization b. b) Data augmentation c. c) Feature engineering d. d) Ensemble methods 4. What does the AUC-ROC curve measure in binary classification? a. a) Model accuracy b. b) Precision c. c) Recall d. d) True positive rate vs. false positive rate 5. Which algorithm is suitable for clustering data when the number of clusters is known in advance? a. a) K-means b. b) Hierarchical clustering c. c) DBSCAN d. d) Random Forest 6. Which statistical test is used to determine if there is a significant difference between the means of two groups? a. a) ANOVA b. b) Chi-squared test c. c) T-test d. d) Pearson correlation 7. What is the purpose of the Levenshtein distance metric in natural language processing? a. a) Measuring document similarity b. b) Evaluating sentiment analysis c. c) Calculating word embeddings d. d) Quantifying the difference between two strings 8. Which machine learning technique can handle both classification and regression tasks? a. a) Linear regression b. b) Decision trees c. c) Naive Bayes d. d) Support Vector Machines 9. What is the "bias-variance trade-off" in machine learning? a. a) Balancing the trade-off between bias and fairness in models b. b) Balancing the trade-off between underfitting and overfitting c. c) Balancing the trade-off between feature selection and feature extraction d. d) Balancing the trade-off between model accuracy and interpretability 10. Which method is used for handling class imbalance in a binary classification problem? a. a) Data augmentation b. b) Feature scaling c. c) Regularization d. d) Principal Component Analysis (PCA) 11. Which technique is used to assess the significance of variables in a linear regression model? a. a) p-value b. b) R-squared c. c) F-statistic d. d) Mean squared error 12. What does the term "bagging" refer to in ensemble learning? a. a) Training multiple models sequentially b. b) Training multiple models in parallel and averaging their predictions c. c) Reducing the number of features in a dataset d. d) Combining models using a weighted average 13. What is the purpose of the sigmoid activation function in a neural network? a. a) Introducing non-linearity b. b) Regularizing model parameters c. c) Calculating the mean squared error d. d) Scaling input features 14. In a decision tree, what is the "Gini impurity" used for? a. a) Measuring the variance of the target variable b. b) Calculating the entropy of the target variable c. c) Quantifying the purity of a node's class distribution d. d) Assessing the correlation between features 15. Which algorithm is used for optimizing hyperparameters in machine learning models? a. a) Gradient descent b. b) K-means c. c) Grid search d. d) Hierarchical clustering 16. What is the purpose of the L1 regularization term in linear regression? a. a) Reducing bias in the model b. b) Penalizing large coefficients c. c) Increasing model complexity d. d) Improving convergence of optimization algorithms 17. Which technique is used to prevent the "curse of dimensionality" in machine learning? a. a) Regularization b. b) Feature scaling c. c) Dimensionality reduction d. d) Ensemble learning 18. What is the Kullback-Leibler (KL) divergence used for in probability theory? a. a) Measuring the similarity between two probability distributions b. b) Calculating the variance of a dataset c. c) Evaluating the goodness of fit of a model d. d) Assessing the linearity of a regression model 19. Which method is commonly used for imputing missing values in a dataset? a. a) Removing rows with missing values b. b) Filling missing values with the mean of the feature c. c) Ignoring missing values during analysis d. d) Replacing missing values with the mode of the feature 20. What is the purpose of the Viterbi algorithm in Hidden Markov Models (HMM)? a. a) Calculating the likelihood of an observation sequence b. b) Estimating the parameters of the model c. c) Decoding the most likely sequence of hidden states d. d) Smoothing noisy observations 21. Which technique is used to prevent overfitting in decision trees? a. a) Pruning b. b) Bagging c. c) Boosting d. d) Feature scaling 22. What is the goal of natural language processing (NLP)? a. a) Simulating human intelligence b. b) Generating random text c. c) Reducing the dimensionality of text data d. d) Extracting and understanding information from text 23. Which evaluation metric is appropriate for imbalanced multi-class classification problems? a. a) Accuracy b. b) Precision-recall curve c. c) F1-score d. d) Mean squared error 24. What does the term "one-hot encoding" refer to in data preprocessing? a. a) Converting categorical variables into numerical values b. b) Combining multiple features into a single feature c. c) Reducing the dimensionality of data d. d) Transforming continuous variables into binary vectors 25. Which algorithm is used for reducing the dimensionality of high-dimensional data? a. a) Naive Bayes b. b) K-means clustering c. c) Principal Component Analysis (PCA) d. d) Random Forest 26. What is the purpose of the Jensen-Shannon divergence in probability theory? a. a) Measuring the similarity between two probability distributions b. b) Calculating the mean of a dataset c. c) Estimating the variance of a distribution d. d) Assessing the linearity of a regression model 27. Which method is used for text data preprocessing to remove unnecessary words and reduce dimensionality? a. a) One-hot encoding b. b) Word embedding c. c) Stopword removal d. d) Lemmatization 28. What is the primary purpose of cross-validation in machine learning? a. a) Training a model on all available data b. b) Evaluating a model's performance on a separate dataset c. c) Dividing data into training and testing sets d. d) Visualizing the distribution of data 29. Which technique is used for reducing variance and improving the generalization of an ensemble model? a. a) Bagging b. b) Boosting c. c) Pruning d. d) Regularization 30. In a support vector machine (SVM), what is the "kernel trick" used for? a. a) Reducing model complexity b. b) Adding new features to the dataset c. c) Transforming data into a higher-dimensional space d. d) Improving convergence of the optimization algorithm 31. What does the term "precision" refer to in binary classification? a. a) The ratio of true positives to true negatives b. b) The ratio of true positives to the sum of true positives and false positives c. c) The ratio of true positives to the sum of true positives and false negatives d. d) The ratio of true negatives to the sum of true negatives and false negatives 32. Which technique is used for generating new data samples using a trained model? a. a) Clustering b. b) Dimensionality reduction c. c) Data augmentation d. d) Regularization 33. What is the goal of feature scaling in machine learning? a. a) Converting categorical features into numerical values b. b) Balancing class distribution c. c) Scaling numerical features to a similar range d. d) Increasing the complexity of the model 34. What is the primary purpose of a confusion matrix in binary classification? a. a) Evaluating the model's performance b. b) Calculating the mean squared error c. c) Identifying the number of features d. d) Visualizing the data distribution 35. Which algorithm is used for extracting important features from text data? a. a) Principal Component Analysis (PCA) b. b) Linear Discriminant Analysis (LDA) c. c) K-means clustering d. d) Gradient Boosting 36. What does the term "bag of words" represent in natural language processing? a. a) A technique for analyzing sentence structure b. b) A method for encoding categorical variables c. c) A model for sequence generation d. d) A representation of text as a collection of word occurrences 37. Which method is used to mitigate the issue of multicollinearity in linear regression? a. a) Feature scaling b. b) L1 regularization c. c) L2 regularization d. d) Removing one of the correlated features 38. What is the primary purpose of a learning rate in gradient descent optimization? a. a) Balancing the trade-off between bias and variance b. b) Adjusting the number of iterations in training c. c) Controlling the step size during parameter updates d. d) Calculating the regularization term 39. Which technique is used for evaluating the importance of features in a random forest model? a. a) Gini impurity b. b) Area Under the Curve (AUC) c. c) Recursive Feature Elimination (RFE) d. d) Mean squared error 40. What is the purpose of the log loss (binary cross-entropy) loss function in classification? a. a) Calculating the mean squared error b. b) Minimizing the difference between predicted and actual values c. c) Penalizing large model coefficients d. d) Encouraging confident predictions and penalizing uncertainty 41. In time series forecasting, what is the role of the "lag" parameter? a. a) Balancing class distribution b. b) Specifying the number of clusters c. c) Defining the number of previous time steps to consider d. d) Determining the learning rate 42. Which technique is used for representing text data in a continuous vector space? a. a) One-hot encoding b. b) Word embedding c. c) TF-IDF d. d) Bag of words 43. What is the purpose of the Hessian matrix in optimization algorithms? a. a) Calculating the gradient of the loss function b. b) Regularizing model parameters c. c) Determining the step size during optimization d. d) Improving the convergence of gradient descent 44. Which algorithm is commonly used for sentiment analysis in text data? a. a) Linear regression b. b) Support Vector Machines (SVM) c. c) Naive Bayes d. d) Decision trees 45. What is the goal of the Expectation-Maximization (EM) algorithm? a. a) Calculating the mean squared error b. b) Training deep neural networks c. c) Clustering data into groups d. d) Optimizing hyperparameters 46. Which method is used for reducing variance in a model by averaging multiple instances of it? a. a) Regularization b. b) Ensemble learning c. c) Feature scaling d. d) Dimensionality reduction 47. What is the purpose of the inverted dropout technique in neural networks? a. a) Preventing overfitting by dropping out neurons during training b. b) Scaling the input features to a similar range c. c) Increasing model complexity by adding more layers d. d) Introducing non-linearity 48. Which technique is used for finding the optimal number of clusters in K-means clustering? a. a) The Elbow method b. b) Principal Component Analysis (PCA) c. c) The Silhouette score d. d) Regularization 49. What is the goal of gradient boosting in ensemble learning? a. a) Increasing the variance of individual models b. b) Training multiple models in parallel c. c) Combining weak learners to create a strong model d. d) Reducing the bias of the model 50. Which method is used for reducing the dimensionality of high-dimensional data while preserving its variance? a. a) Principal Component Analysis (PCA) b. b) K-means clustering c. c) Support Vector Machines (SVM) d. d) Bagging
Answers:
1. b) Showing the relationship between two numerical variables
2. d) Capturing relevant information while reducing noise 3. a) Regularization 4. d) True positive rate vs. false positive rate 5. a) K-means 6. c) T-test 7. a) Measuring document similarity 8. d) Support Vector Machines 9. b) Data augmentation 10. c) F1-score 11. a) p-value 12. b) Training multiple models in parallel and averaging their predictions 13. a) Introducing non-linearity 14. c) Quantifying the purity of a node's class distribution 15. c) Grid search 16. b) Penalizing large coefficients 17. c) Dimensionality reduction 18. a) Measuring the similarity between two probability distributions 19. b) Filling missing values with the mean of the feature 20. c) Decoding the most likely sequence of hidden states 21. a) Pruning 22. d) Extracting and understanding information from text 23. c) F1-score 24. a) Converting categorical variables into numerical values 25. c) Principal Component Analysis (PCA) 26. a) Measuring the similarity between two probability distributions 27. c) Stopword removal 28. b) Evaluating a model's performance on a separate dataset 29. a) Bagging 30. c) Transforming data into a higher-dimensional space 31. b) The ratio of true positives to the sum of true positives and false positives 32. c) Data augmentation 33. c) Scaling numerical features to a similar range 34. a) Evaluating the model's performance 35. b) Linear Discriminant Analysis (LDA) 36. d) A representation of text as a collection of word occurrences 37. b) L1 regularization 38. c) Controlling the step size during parameter updates 39. a) Gini impurity 40. d) Encouraging confident predictions and penalizing uncertainty 41. c) Defining the number of previous time steps to consider 42. b) Word embedding 43. c) Determining the step size during optimization 44. c) Naive Bayes 45. c) Clustering data into groups 46. b) Ensemble learning 47. a) Preventing overfitting by dropping out neurons during training 48. a) The Elbow method 49. c) Combining weak learners to create a strong model 50. a) Principal Component Analysis (PCA)