AsCEnD Machine Learning Course Answers
AsCEnD Machine Learning Course Answers
Question 1 :- Drithi has formal badminton lessons every Tuesday afternoon, and plays with
whoever is available on Saturdays. Which type of learning is she using?
ANS :- semi-supervised
Question 2 :- In the learning process, what would you typically do after implementing a rule?
Question 4 :- Instead of explicit, stepwise instructions, what are the fundamental inputs to a
machine-learning process?
ANS :- data
Question 1 :- What distinguishes supervised machine learning from other types of machine
learning?
ANS :- Transduction uses more information and produces more specific rules
Question 3 :- A company has a variety of efforts that it uses to find new antiviral medications.
Which effort best exemplifies unsupervised learning?
Question 4 :- Why would you choose to not use semi-supervised learning all the time?
Question 3 :- Why are dog weight and height not necessarily good choices for predictors in
naïve Bayes methods?
Question 4 :- Why are regression methods not considered good examples of machine
learning?
Question 5 :- What should you do if your decision tree has too much entropy?
Question 6 :- Which one of the following do you use when you look for trends instead of
trying to classify data into different groups?
Question 7 :- Salim wants to predict the best harvest date for his orchard based on prior
weather reports and harvest histories. Which type of tool does he require?
ANS :- predictions that are almost always 7.5 pounds too high
Question 2 :- Hugo's modeling strategy uses k-nearest neighbor, followed by regression
analysis. What does his strategy exemplify?
Question 3 :- Lydia has produced a model with 100 predictors that is capable of almost
perfectly fitting the 200 observations in the training data. How would you improve Lydia's
model?
Question 1 :- Most of Atul's experience is with decision trees and regression. Which strategy
should he embrace as manager of a new project?
Question 2 :- You finished fine-tuning your model with training data, and are eager to show
the results to your business team. What should you do instead?
Exam :-
Question 1 :- In _____, you are using _____ to help your program find patterns in _____.
Question 2 :- What does the training data, which is a smaller chunk than the test data, help
you find?
ANS :- patterns
Question 3 :- If you are training the machine to predict how long it takes you to drive home,
which one of the following is the dependent variable - weather, time of day, holiday, or
commute time?
ANS :- unsupervised
ANS :- states
Question 7 :- Joaquim needs a photograph analysis tool to identify the plant species
corresponding to individual seeds. Which type of problem does he have?
Question 9 :- Decision trees can be used for _____ classification challenges with _____ machine
learning.
Question 13 :- Which of the following is the best fit when you have lots of unlabeled data?
Question 14 :- What is the total score for a hound corresponding to the following weights
and individual scores?
Multiplier Hound
ANS :- 4.6
2 - Applied Machine Learning: Algorithms
(A) Review of Foundations
ANS :- TRUE
Question 2 :- Which is NOT a reason why we had to determine whether Age was missing at
random?
ANS :- FALSE
ANS :- FALSE
Question 2 :- Logistic Regression is a good choice when you have a massive amount of data
or you're trying to solve a state of the art problem.
ANS :- FALSE
Question 4 :- How many individual Logistic Regression models were fit using GridSearchCV?
ANS :- 35
(C) Support Vector Machines
Question 1 :- The kernel trick is used every time you fit SVM to data.
ANS :- FALSE
ANS :- TRUE
Question 3 :- Which of the following does NOT happen with a very low value of C?
ANS :- TRUE
Question 4 :- The activation function controls how many hidden layer nodes are used in
training.
ANS :- FALSE
ANS :- TRUE
Question 3 :- Increasing n_estimators or max_depth will ALWAYS decrease the training error.
ANS :- TRUE
Question 4 :- If I fit two Random Forest models with the exact same hyperparameter settings
on the exact same data, I should get exactly the same model and performance.
ANS :- FALSE
(F) Boosting
Question 1 :- The trees built in Gradient Boosted Trees can be fit in parallel.
ANS :- FALSE
Question 2 :- Gradient Boosted Trees' ability to learn from its own mistakes also drives it to
overfit to outliers.
ANS :- TRUE
Question 3 :- A small learning rate will ensure you always find the optimal model.
ANS :- FALSE
Question 4 :- Which statement is true regarding Random Forest and Gradient Boosted Trees?
ANS :- Random Forest does better with few, deep trees, and Gradient Boosted Trees does
better with many, shallow trees.
(G) Summary
ANS :- FALSE
Question 2 :- Which is NOT a reason why we use Python for machine learning?
ANS :- FALSE
Question 2 :- Which is NOT a reason why we had to determine whether Age was missing at
random?
Question 3 :- Looking at mean values overstates the impact of Fare on whether somebody
survived or not.
ANS :- TRUE
ANS :- TRUE
ANS :- FALSE
Question 7 :- Which is NOT a required argument for the where method from numpy?
Question 1 :- Model performance on the training set matters more than performance on the
validation set or test set.
ANS :- FALSE
ANS :- FALSE
Question 3 :- How many individual models will be built in standard 10-fold Cross-Validation?
ANS :- 10
ANS :- cross-validation
Question 1 :- The goal of model optimization is to tune model complexity to minimize total
error by reducing variance and bias.
ANS :- TRUE
Question 1 :- We fill missing values in the Age column with the average to ensure the model
isn't determining whether somebody survived or not based on Age alone.
ANS :- TRUE
Question 2 :- Keeping the Embarked feature in our data likely would have generated better
results on unseen data.
ANS :- FALSE
Question 3 :- Index=False should still be used even if you have a meaningful index column.
ANS :- FALSE
ANS :- faster than just fitting on the training set and evaluating on a validation set
Question 5 :- How many individual models are built using 5-fold Cross-Validation and 12
total hyperparameter combinations?
ANS :- 60
Question 6 :- If the best model based on Cross-Validation does NOT perform best on the
validation set, then you did something wrong.
ANS :- FALSE
Question 7 :- What is the most likely reason for test set performance to deviate in a
significant way from the validation set performance?
ANS :- It is a gateway to all kinds of machine learning topics and broad data science.
Question 2 :- In Tom Khabaza's 9 Laws of Data Mining, what is the fourth law?
ANS :- The fourth law states, "the right model for a given application can only be
discovered by experiment."
ANS :- Logistic regression is where all of the inputs can be ranked from most to least
important.
ANS :- You divide the data or partition the data into 50% training data and 50% testing
data.
ANS :- Lift charts assess models and all kinds of variations on a theme.
ANS :- The model makes errors. It's a technique for accessing the model's performance.
Question 6 :- There are three common problems: interactions, missing data, and overfitting.
What is overfitting?
Question 7 :- When using heterogeneous ensemble, which three models should you use?
ANS :- You analyze the data when the criterion or the dependent variable is categorical,
and the independent variable is an interval in nature.
Question 9 :- Neural Networks use multilayer perceptrons. What are multilayer perceptrons?
ANS :- They consist of three layers of nodes: an input layer, a hidden layer, and an output
layer.