The document consists of a series of multiple-choice questions related to supervised learning, regression tasks, linear regression, cost functions, gradient descent, and regularization techniques. It covers concepts such as overfitting, underfitting, feature scaling, and the bias-variance tradeoff. Each question tests knowledge on fundamental principles and practices in machine learning and regression analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
9 views
Supervised Machine Learning Regression
The document consists of a series of multiple-choice questions related to supervised learning, regression tasks, linear regression, cost functions, gradient descent, and regularization techniques. It covers concepts such as overfitting, underfitting, feature scaling, and the bias-variance tradeoff. Each question tests knowledge on fundamental principles and practices in machine learning and regression analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6
1. Which of the following best describes supervised learning?
a) Finding patterns in unlabeled data
b) Using labeled data to make predictions c) Clustering unlabeled samples into groups d) Reducing dimensionality of data without labels 2. In a regression task, the target variable is: a) Continuous b) Categorical c) Ordinal only d) Binary 3. The hypothesis function for linear regression with one feature is typically expressed as: a) h(x)=θ1xh(x) = \theta_1 xh(x)=θ1x b) h(x)=θ0+θ1xh(x) = \theta_0 + \theta_1 xh(x)=θ0+θ1x c) h(x)=θ0+θ1x+θ2x2h(x) = \theta_0 + \theta_1 x + \theta_2 x^2h(x)=θ0+θ1x+θ2x2 d) h(x)=xTθh(x) = x^T \thetah(x)=xTθ with no intercept term 4. The cost function often used in linear regression is: a) Mean Absolute Error (MAE) b) Mean Squared Error (MSE) c) Cross-Entropy Loss d) Hinge Loss 5. Minimizing the cost function in linear regression typically involves: a) Maximizing the number of features b) Reducing the training set size c) Adjusting model parameters to minimize errors d) Decreasing the number of training iterations intentionally 6. Gradient descent updates parameters by moving in the direction of: a) Increasing gradient of the cost function b) Decreasing gradient of the cost function c) Zero gradient at all steps d) Random directions to explore the parameter space 7. If the learning rate in gradient descent is too large, the algorithm may: a) Converge too slowly but still reach minimum b) Converge exactly to the global minimum c) Fail to converge and instead diverge d) Not update parameters at all 8. Feature scaling (e.g., normalization) can speed up gradient descent by: a) Making the cost function vanish b) Ensuring all features have similar ranges c) Removing the need for an intercept term d) Guaranteeing convergence in one step 9. Which of the following would likely indicate overfitting in a regression model? a) High training error and low test error b) Low training error and low test error c) Low training error and high test error d) High training error and high test error 10. High bias in a model typically leads to: a) Underfitting b) Overfitting c) Perfect fitting of training data d) Higher variance 11. When adding polynomial features, we are: a) Increasing model complexity by transforming inputs b) Reducing model complexity by removing features c) Increasing the number of training examples d) Automatically ensuring better generalization 12. The normal equation is a closed-form solution for: a) Finding learning rates b) Performing gradient descent updates c) Computing the parameters θ\thetaθ that minimize the cost function d) Selecting the best model hyperparameters 13. Which matrix operation is commonly used in the normal equation approach? a) Matrix inversion b) Element-wise multiplication only c) Eigenvalue decomposition only d) Convolution 14. Regularization techniques like Ridge (L2) regression: a) Add a penalty proportional to the absolute values of coefficients b) Add a penalty proportional to the square of coefficients c) Remove the bias term from the model d) Are never used in linear models 15. Lasso (L1) regularization tends to produce models that are: a) More likely to have many small coefficients b) More likely to zero out some coefficients c) Identical to Ridge regression solutions d) Always outperforming Ridge in all scenarios 16. The purpose of a validation set is to: a) Estimate the model’s performance on unseen data and tune hyperparameters b) Train the model parameters c) Replace the test set d) Collect more training examples 17. Reducing variance in a regression model could be achieved by: a) Increasing the model’s complexity b) Using fewer training examples c) Implementing regularization d) Removing regularization terms 18. Suppose we have a dataset with features in widely different numerical ranges. Without feature scaling, gradient descent may: a) Converge more quickly b) Converge more slowly c) Be unaffected by feature scales d) Always find a global minimum instantly 19. Overfitting can often be addressed by: a) Using a more complex model b) Increasing the number of features arbitrarily c) Using regularization or collecting more data d) Decreasing regularization penalties 20. The mean squared error (MSE) between predictions y^\hat{y}y^ and true values yyy is calculated as: a) 1m∑i=1m(y^(i)−y(i))\frac{1}{m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})m1∑i=1m (y^(i)−y(i)) b) 1m∑i=1m∣y^(i)−y(i)∣\frac{1}{m}\sum_{i=1}^m |\hat{y}^{(i)} - y^{(i)}|m1∑i=1m∣y^ (i)−y(i)∣ c) 12m∑i=1m(y^(i)−y(i))2\frac{1}{2m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^22m1 ∑i=1m(y^(i)−y(i))2 d) ∑i=1m(y^(i)−y(i))2\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^2∑i=1m(y^(i)−y(i))2 21. A model that is too simple and fails to capture the underlying data pattern is likely experiencing: a) Low bias, low variance b) High bias, low variance c) High bias, high variance d) Low bias, high variance 22. A polynomial regression model that fits training data perfectly but performs poorly on test data is an example of: a) Underfitting b) Overfitting c) Just right model complexity d) No variance 23. The term “hypothesis” in linear regression typically refers to: a) The theoretical assumption that data is perfectly linear b) The function mapping inputs to predicted outputs c) A guess about the number of features to use d) A hypothesis test for statistical significance 24. Convergence in gradient descent is often checked by: a) Monitoring the training set size b) Checking if parameters exceed a certain range c) Observing if the cost function stops decreasing significantly d) Ensuring the gradient updates are random 25. To combat underfitting, one might: a) Increase model complexity (e.g., add features or polynomial terms) b) Use a more aggressive regularization c) Reduce the size of the training data d) Decrease the number of iterations in gradient descent 26. If the learning rate is too small, gradient descent will: a) Never update parameters b) Converge very quickly c) Converge very slowly but still eventually approach minimum d) Oscillate around the minimum 27. The bias-variance tradeoff refers to the balance between: a) Underfitting and overfitting b) Training size and feature size c) Linear and polynomial models d) Regularized and unregularized models 28. The parameter θ0\theta_0θ0 in linear regression is: a) The slope of the regression line b) The regularization parameter c) The intercept term d) Always equal to zero 29. Which approach would NOT typically help with high variance? a) Adding regularization b) Adding more training examples c) Simplifying the model d) Increasing the complexity of the hypothesis 30. In multiple linear regression, the model is: a) h(x)=θ0+θ1x1+θ2x2+⋯+θnxnh(\mathbf{x}) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_nh(x)=θ0+θ1x1+θ2x2+⋯+θnxn b) h(x)=x1+x2+⋯+xnh(\mathbf{x}) = x_1 + x_2 + \cdots + x_nh(x)=x1+x2+⋯+xn with no parameters c) h(x)=θ0x0h(\mathbf{x}) = \theta_0x_0h(x)=θ0x0 only d) h(x)=max(x)h(\mathbf{x}) = \max(\mathbf{x})h(x)=max(x) 31. When performing feature scaling by normalization, values are typically rescaled to: a) Mean 0, standard deviation 1 b) Range [0, 1] c) Both a and b are common approaches d) Range [-10, 10] 32. One advantage of using the normal equation over gradient descent is: a) It does not require choosing a learning rate b) It scales better for very large feature sets c) It always runs faster d) It prevents overfitting automatically 33. The “cost function” in linear regression measures: a) The model’s complexity in terms of number of parameters b) The discrepancy between predictions and actual values c) The number of iterations required d) The computational expense of training 34. If adding new features significantly reduces training error but not test error, this typically indicates: a) Underfitting b) Proper generalization c) Overfitting d) No change in performance 35. Ridge regression’s penalty term is added to the cost function as: a) λ∑∣θj∣\lambda \sum |\theta_j|λ∑∣θj∣ b) λ∑θj2\lambda \sum \theta_j^2λ∑θj2 c) λ∑log(θj)\lambda \sum \log(\theta_j)λ∑log(θj) d) λ∑θj\lambda \sum \sqrt{\theta_j}λ∑θj 36. A key distinction between regression and classification tasks is that regression outputs: a) Discrete categories b) Probabilities of classes c) Continuous numeric values d) Binary (0/1) predictions only 37. Before running gradient descent, we often initialize parameters θ\thetaθ to: a) Arbitrary small random values or zeros b) Their closed-form solution c) Exactly the final solution d) Infinity 38. The term "regularization parameter" (λ\lambdaλ) controls: a) The number of features in the dataset b) The relative importance of the penalty term on the parameters c) The step size in gradient descent d) The intercept term 39. Which evaluation metric best measures how close predicted values are to the actual values in a regression problem? a) Accuracy b) Mean squared error (MSE) c) AUC (Area Under the Curve) d) Gini impurity 40. When training a polynomial regression model, an appropriate approach to avoid overly large polynomial coefficients might be: a) Add more training data only b) Use a lower learning rate only c) Implement L2 regularization to control coefficients d) Randomly shuffle the features 1. b) Using labeled data to make predictions 2. a) Continuous 3. b) h(x)=θ0+θ1xh(x) = \theta_0 + \theta_1 xh(x)=θ0+θ1x 4. b) Mean Squared Error (MSE) 5. c) Adjusting model parameters to minimize errors 6. b) Decreasing gradient of the cost function 7. c) Fail to converge and instead diverge 8. b) Ensuring all features have similar ranges 9. c) Low training error and high test error 10. a) Underfitting (high bias means underfitting) 11. a) Increasing model complexity by transforming inputs 12. c) Computing the parameters θ\thetaθ that minimize the cost function 13. a) Matrix inversion 14. b) Add a penalty proportional to the square of coefficients 15. b) More likely to zero out some coefficients 16. a) Estimate the model’s performance on unseen data and tune hyperparameters 17. c) Implementing regularization 18. b) Converge more slowly 19. c) Using regularization or collecting more data 20. c) 12m∑i=1m(y^(i)−y(i))2\frac{1}{2m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^22m1∑i=1m(y^(i)−y(i))2 21. b) High bias, low variance 22. b) Overfitting 23. b) The function mapping inputs to predicted outputs 24. c) Observing if the cost function stops decreasing significantly 25. a) Increase model complexity (e.g., add features or polynomial terms) 26. c) Converge very slowly but still eventually approach minimum 27. a) Underfitting and overfitting (bias and variance) 28. c) The intercept term 29. d) Increasing the complexity of the hypothesis 30. a) h(x)=θ0+θ1x1+θ2x2+⋯+θnxnh(\mathbf{x}) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_nh(x)=θ0+θ1x1+θ2x2+⋯+θnxn 31. c) Both a and b are common approaches 32. a) It does not require choosing a learning rate 33. b) The discrepancy between predictions and actual values 34. c) Overfitting 35. b) λ∑θj2\lambda \sum \theta_j^2λ∑θj2 36. c) Continuous numeric values 37. a) Arbitrary small random values or zeros 38. b) The relative importance of the penalty term on the parameters 39. b) Mean squared error (MSE) 40. c) Implement L2 regularization to control coefficients
Permutation Tests for Complex Data Theory Applications and Software Wiley Series in Probability and Statistics 1st Edition Fortunato Pesarin 2024 scribd download
Full download (Ebook) The Physics and Technology of Laser Resonators by Hall, Denis; Jackson, P.E ISBN 9780852741177, 9781000112221, 9781000132182, 9781000157031, 9781003069508, 0852741170, 1000112225, 1000132188, 1000157032 pdf docx