20AIPC302 FMLT Question Bank
20AIPC302 FMLT Question Bank
Unit I
Part – A
1. What is Machine learning? What is the need of it?
2. List four examples of machine learning.
3. What is meant by concept learning?
4. What are the types of machine learning?
5. List out any 5 real-time applications of Machine Learning.
6. What is ML Programming?
7. Distinguish between Traditional programming and ML programming.
8. How does machine learning work?
9. List the phases of Machine Learning.]
10. Define the steps involved in the basic machine learning process.
11. Compare Supervised, Unsupervised and Reinforcement Learning.
12. What is Reinforcement learning?
13. How does Human Learning vary from Machine Learning?
14. State how Tom M Mitchel defined Machine Learning.
15. List some of the typical applications of Regression.
Part – B
1. Explain in detail about the types of Machine learning with necessary diagrams.
2. Discuss how Machine Learning works in the field of Banking.
3. Brief about the applications of Machine learning in the Healthcare domain.
4. Discuss how Machine Learning works in the field of Insurance.
5. Examine the Issues in Machine Learning.
6. Explain in detail about Reinforcement Learning with relevant Examples.
7. Compare and Contrast Supervised, Unsupervised and Reinforcement Learning types.
8. Brief the following with example:
a) Structured data
b) Semi structured data
c) Unstructured data
9. Explain in detail about Supervised learning and also about its type’s classification and
regression in detail.
10. How Machine Learning algorithms help in detecting fraudulent activities in Banking?
Also state some of the real-time software used.
11. Examine on how Human Learning happens
i) Under direct guidance of Experts
ii) Under Indirect Guidance of Experts
iii) Self-Learning
12. Identify the Machine Learning Process Involved in Railway Ticket Booking Process
and discuss what type of Learning can be applied in grouping the types of Customers.
13. Classify the Learning types involved in Supervised Learning Model.
Unit II
Part – A
1. Distinguish between Supervised and Unsupervised Machine Learning?
2. How Classification varies from regression?
3. Distinguish Between Predictive Analytics vs Descriptive Analytics.
4. What is a Confusion Matrix?
5. How to handle Missing or Corrupted Data in a Dataset?
6. What is ROC Curve and what does it represent?
7. Define the formulate to calculate the accuracy?
8. What is ‘Overfitting’ in Machine learning?
9. What is Inductive Logic Programming in Machine Learning?
10. Define Silhouette Width.
11. Differentiate Model underfitting vs. overfitting.
12. Why is the Kappa value List the step wise process of Machine Learning with a diagram.
used in Classification?
13. Give the Hierarchical structure of data types available.
Part- B
1. Explain In Detail About the Basic Types of Data in Machine Learning.
2. How Numerical Data is Explored? Explain in detail.
3. Explain In Detail About Histograms with Standard Deviation and Variance.
4. Explain In Detail About Box Plots and its types.
5. How will you evaluate the Performance of the Predictive model?
6. Explain in detail about different methods to improve the performance of models.
7. Discuss about the data preprocessing steps in Machine learning. Explain
8. the steps involved with a neat sketch.
9. Explain, in detail, the different components of a box plot? When will the lower whisker
be longer than the upper whisker? How can outliers be detected using box plots?
10. Describe the different methods involved in training the Predictive Model.
i) How the Performance of the Regression is calculated?
ii) Distinguish between Lazy Learner and Eager Learner.
11. How will you remediate Data? Why is Data Quality Important?
12. Brief about the following:
i) Dimensionality Reduction
ii) Feature Subset Selection
13. Write the difference between with relevant Illustrations
1. Nominal and ordinal data
2. Box plot and histogram
3. Mean and median
14. Let’s assume the confusion matrix of the win/loss prediction of cricket match problem to
be as below:
Actual Win Actual Loss
Predicted Win 85 4
Predicted Loss 2 9
15. Show the parameters involved in calculating the performance of the model.
16. Explain Qualitative and Quantitative data in detail. And how are missing values
handled?
Unit III
Part – A
1. Compare Cross Validation and Bootstrap Sampling Methods.
2. What are Principal Components? Why are they used?
3. State TP, FP, FN,TN for disease prediction into Benign & Malignant tumor assuming
‘Benign’ as a class of win.
4. Compare the model under fitting vs. overfitting in machine learning.
5. What are the multiple factors which led to the data Quality issues in a Machine Learning
model?
6. How Classification varies from regression?
7. What is a Confusion Matrix?
8. What is ROC Curve and what does it represent?
9. List any three weaknesses of the decision tree method.
10. What is Data Pre-processing?
11. What are the strengths and weaknesses of KNN Algorithms?
12. What are the usual stopping criteria for a Decision Tree?
13. How Exhaustive search carried out?
14. Give short notes on Entropy & Information Gain.
15. What are Pre-Pruning and Post Pruning?
Part – B
1. Explain in detail about KNN with the Algorithm.
2. Explain classification with examples. What are the steps available in classification
Learning?
3. Describe the different approaches involved in feature subset selection.
4. Discuss Random Forest model in detail. What are the strengths and weaknesses of it?
5. Explain in detail about Support Vector Machines with algorithms.
6. Consider the training dataset given in the following table. Use Weighted k-NN and
determine the class. Test instance (7.6, 60, 8) and K=3.
7. Describe in detail about Decision tree algorithms and discuss the Entropy and
Information Gain.
8. Write short notes on
i) Target Function
ii) Error Function
iii) Objective Function
iv) Loss Function
9. Explain about the model representation and interpretability with overfitting and
Underfitting, bias variance concepts.
10. Apply suitable Algorithm to find out if Robert would get a job with the following
Parameter values
11.
12. Communication – Bad; Aptitude – High; Programming skills – Bad, CGPA-High
13. Consider the training dataset shown in the Table and construct a decision tree using ID3
algorithm. Find Entrophy and Information Gain for the attributes=knowledge
communication skills,and also find entrophy for the value of CGPA>=9.
19. Apply Support Vector machine algorithm and explain the role of Hyperplane involved
in the construction of the algorithm.
Unit IV
Part – A
1. How will you handle the Outliers in a Regression Model?
2. How will you Evaluate the performance of the Regression Model?
3. Write short notes on the conditions of a negative slope in linear regression?
4. List some of the common regression Algorithms.
5. What is the slope of the simple linear regression model?
6. What are Rise and Run with respect to slope?
7. Write short notes on the conditions of a Positive slope in linear regression?
8. What are Partial regression Coefficients?
9. State Gauss Markov Theorem.
10. Define Multicollinearity.
11. Define Heteroskedasticity.
12. What is Variance?
13. What is Bias?
14. How can the accuracy of Simple Linear regression be improved?
15. How to get rid of Multicollinearity?
Part – B
1. Define simple linear regression using a graph explaining slope and intercept also
Explain rise, run, and slope in a graph.
2. Explain slope, linear positive slope, and linear negative slope in a graph along with
various conditions leading to the slope with curve linear negative slope and curve linear
positive slope in a graph.
3. Explain maximum and minimum point of curves through a graph.
4. Explain the OLS algorithm with steps.
5. What is the standard error of the regression? Draw a graph to represent the same.
6. Explain multiple linear regression with an example.
7. Explain the assumptions in regression analysis and the BLUE concept.
8. Explain two main problems in regression analysis. Also state how to improve accuracy
of the linear regression model?
9. Explain polynomial regression model in detail with an example.
10. Explain logistic regression in detail.
11. Brief about the following
i) Elastic Net Regression
ii) Stepwise Regression
12. Brief about the following
i) Ridge Regression
ii) Lasso Regression
13. Explain in detail about logistic regression and draw different scenarios for slopes.
14. Explain in detail about Simple Regression, with types of slopes and calculate with
scatter plot to explore the relationship between the interdependent variable (internal
marks) mapped to X-axis and dependent variable (external marks) mapped to y-axis
Internal Exam 15 23 18 23 24 22 22 19 19 16 24
11 24 16 23
External Exam 49 63 58 60 58 61 60 63 60 52 62
30 59 49 68
Unit V
Part – A
1. What is Unsupervised learning?
2. List some of the applications of Unsupervised Learning.
3. What is Clustering?
4. What is K-Means Clustering?
5. List out the Strength and Weaknesses of K-Means Clustering.
6. What is the Association Rule?
7. What is SSE? What is its use in the context of the k-means algorithm?
8. What is a dendrogram?
9. State the main difference in the approach of k-means and k-medoids algorithms.
10. Define Instance based Learning.
11. What is Active Learning. Give example
12. What are the Partitioning methods that are involved in the Clustering Process?
13. What is the elbow Method?
14. What is DBSCAN?
15. How will you find patterns in association learning?
16. What is the main objective of Representation Learning?
17. What is Uncertainty sampling in Active learning?
18. What are the Pros and Cons of Instance-Based Learning (IBL) Method?
Part – B
1. What are the broad three categories of clustering techniques? Explain the characteristics
of each briefly.
2. Explain how the Market Basket Analysis uses the concepts of association analysis.
3. Explain the Apriori algorithm for association rule learning with an example.
4. How the distance between clusters is measured in hierarchical clustering
5. Explain the use of this measure in making decisions on when to stop the iteration.
6. How to recompute the cluster centroids in the k-means algorithm?
7. Discuss one technique to choose the appropriate number of clusters at the beginning of
clustering exercise.
8. Discuss the strengths and weaknesses of the k-means algorithm by implementing it in
any dataset.
9. Explain the concept of clustering with a neat diagram.
10. How is unsupervised learning different from supervised learning?
11. Explain with some examples.
12. Describe the concept of single link and complete link in the context of hierarchical
clustering.
13. Explain in detail about Representation Learning.
14. What is active learning? Explain its heuristics.
15. Discuss Instance-based Learning (Memory-based learning)
16. Discuss various Active learning Query Strategies
17. What is an Ensemble Learning Algorithm? Discuss various types.
18. How apriori principle helps in reducing the calculation overhead for a market basket
analysis? Provide an example to explain.
19. You are given a set of one-dimensional data points: {5, 10, 15, 20, 25, 30, 35}. Assume
that k = 2 and first set of random centroid is selected as {15, 32} and then it is refined
with {12, 30}.
20. Create two clusters with each set of centroid mentioned above following the k-means
approach
iii) Calculate the SSE for each set of centroid
21. During a research work, you found 7 observations as described with the data points
below. You want to create 3 clusters from these observations using the K-means
algorithm. After first iteration, the clusters C1, C2, C3 has following observations:
C1: {(2,2), (4,4), (6,6)}
C2: {(0,4), (4,0)}
C3: {(5,5), (9,9)}
22. If you want to run a second iteration then what will be the cluster centroids? What will
be the SSE of this clustering?
23. Apply Apriori Principle for Association rule learning in any Supermarket Dataset.
24. Illustrate how Bagging, Gradient Boosting works with Ensemble Learning?