Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
1.Image recognition is one of the most common uses of machine learning. There are many situations where you can classify the
object as a digital image. For example, in the case of a black and white image, the intensity of each pixel is served as one of the
measurements. In colored images, each pixel provides 3 measurements of intensities in three different colors – red, green and
blue (RGB).
Machine learning can be used for face detection in an image as well. There is a separate category for each person in a database
of several people. Machine learning is also used for character recognition to discern handwritten as well as printed letters. We
can segment a piece of writing into smaller images, each containing a single character.
eg. Auto friend tagging suggestion on facebook.
It is done with the help of deepface project of deep learning face recognition system.
2.Speech Recognition
Speech recognition is the translation of spoken words into the text. It is also known as computer speech recognition or
automatic speech recognition. Here, a software application can recognize the words spoken in an audio clip or file, and then
subsequently convert the audio into a text file. The measurement in this application can be a set of numbers that represent the
speech signal. We can also segment the speech signal by intensities in different time-frequency bands.
Speech recognition is used in the applications like voice user interface, voice searches and more. Voice user interfaces include
voice dialing, call routing, and appliance control. It can also be used a simple data entry and the preparation of structured
documents.
3.Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a significant role in self-
driving cars. Tesla, the most popular car manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
Learning associations is the process of developing insights into the various associations between the products. A good example
is how the unrelated products can be associated with one another. One of the applications of machine learning is studying the
associations between the products that people buy. If a person buys a product, he will be shown similar products because there
is a relation between the two products. When any new products are launched in the market, they are associated with the old
ones to increase their sales.
4.Medical diagnosis
It is used for the analysis of the clinical parameters and their combination for the prognosis example prediction of disease
progression for the extraction of medical knowledge for the outcome research, for therapy planning and patient monitoring.
These are the successful implementations of the machine learning methods. It can help in the integration of computer-based
systems in the healthcare sector.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is growing very fast and able
to build 3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
5.Financial Services
Machine learning has a lot of potential in the financial and banking sector. It is the driving force behind the popularity of the
financial services. Machine learning can help the banks, financial institutions to make smarter decisions. Machine learning can
help the financial services to spot an account closure before it occurs. It can also track the spending pattern of the customers.
Machine learning can also perform the market analysis. Smart machines can be trained to track the spending patterns. The
algorithms can identify the tends easily and can react in real time.
6.Product Recommendations
You shopped for a product online few days back and then you keep receiving emails for shopping suggestions. If not this, then
you might have noticed that the shopping website or the app recommends you some items that somehow matches with your
taste. Certainly, this refines the shopping experience but did you know that it’s machine learning doing the magic for you? On
the basis of your behaviour with the website/app, past purchases, items liked or added to cart, brand preferences etc., the
product recommendations are made.
7. Email Spam and Malware Filtering
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always receive an important
mail in our inbox with the important symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:
Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier are used for email
spam filtering and malware detection.
8.Predictions while Commuting
Traffic Predictions: We all have been using GPS navigation services. While we do that, our current locations and velocities are
being saved at a central server for managing traffic. This data is then used to build a map of current traffic. While this helps in
preventing the traffic and does congestion analysis, the underlying problem is that there are less number of cars that are
equipped with GPS. Machine learning in such scenarios helps to estimate the regions where congestion can be found on the
basis of daily experiences.
9. Virtual Personal Assistant:
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name suggests, they help us in
finding the information using our voice instruction. These assistants can help us in various ways just by our voice instructions
such as Play music, call someone, Open an email, Scheduling an appointment, etc.
10. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud transaction. Whenever we perform some
online transaction, there may be various ways that a fraudulent transaction can take place such as fake accounts, fake ids, and
steal money in the middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
Training Phase:
The model learns patterns and relationships from a labeled dataset, called the training dataset.
It involves adjusting parameters using techniques like gradient descent to minimize errors.
The goal is to optimize the model's performance on the training data.
Testing Phase:
The trained model is evaluated on an unseen dataset, called the testing dataset.
This phase assesses the model's ability to generalize and perform well on new data.
Metrics such as accuracy, precision, recall, and F1 score are used for evaluation.
Key Difference:
The training phase focuses on learning from data, while the testing phase measures how well the learning generalizes to new
data. A balance between both ensures the model avoids overfitting or underfitting.
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a
single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using
a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to
find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of
SVM.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
• The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as
shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension
plane.
• We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
Embedded method
10. Explain wrapper method in detail.
• Wrapper method is easy as compared to filter method as it does not use any statistical method for feature selection.
• There are 3 basic mechanisms in this method.
– 1) Forward selection-Forward selection is an iterative method in which we start with having no feature in the model. In each
iteration we keep adding the feature which best improves our model till an addition of a new variable does not improve the
performance of the model.
– eg. A,B,C,D,E (are the indendent features ) o/p
A->Model->accuracy
Model is trained with feature (attribute) A and accuracy is checked. In next iteration next feature B is also added to train the model
and accuracy is checked.
AB->Model->accuracy
• both the accuracies are compared ,if accuracy is improved after adding feature B in second iteration then B is added to feature
set, if accuracy is not improved after adding any new feature then those features are eliminated.
2. Backward elimination
• In backward elimination we start with all the features and removes the least significant fearure at each iteration which
improves the performance of the model. We repeat this until no improvement is observed on removal of features.
The statistical test will find the feature that has lowest impact on targeted variable.ie. Correlation between independent and target
variable is nothing.Chi square test can be done to find the feature for elimination . In chi square method p value is calculated
if p<0.05 This feature is useful.
else p>0.05 This feature is not useful and can be eliminated.
3. Recursive Feature Elimination
It is a greedy optimization algorithm which aims to find best performing feature subset. It repeatedly creates models and keeps
aside the best or worst performing feature at each iteration. It constructs the next model with the left features until all the features
are exhausted. It then ranks the features based on the order of their elimination.
These techniques are useful when dataset is very small.
11. What is Expert system ? What are it’s components .Explain with diagram
What is Expert System?
● An expert system is a computer program that is designed to solve complex problems and to provide decision-making ability
like a human expert.
● It performs this by extracting knowledge from its knowledge base using the reasoning and inference rules according to the
user queries.
● The expert system is a part of AI, and the first ES was developed in the year 1970, which was the first successful approach of
artificial intelligence.
● It solves the most complex issue as an expert by extracting the knowledge stored in its knowledge base.
● The system helps in decision making for complex problems using both facts and heuristics like a human expert.
● It is called so because it contains the expert knowledge of a specific domain and can solve any complex problem of that
particular domain.
● These systems are designed for a specific domain, such as medicine, science, etc.
● The performance of an expert system is based on the expert's knowledge stored in its knowledge base.
● The more knowledge stored in the KB, the more that system improves its performance.
● One of the common examples of an ES is a suggestion of spelling errors while typing in the Google search box.
Components of ES
The components of ES include −
I. Knowledge Base
II. Inference Engine
III. User Interface
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit intelligence. The success of any ES
majorly depends upon the collection of highly accurate and precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task domain. Data, information, and past
experience combined together are termed as knowledge.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in the form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and accuracy of the information stored in the
knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the Knowledge Engineers.
The knowledge engineer is a person with the qualities of empathy, quick learning, and case analyzing skills.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a correct, flawless solution. In case of
knowledge-based ES, the Inference Engine acquires and manipulates the knowledge from the knowledge base to arrive at a
particular solution.
The Inference Engine uses the following strategies −
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally deduces the outcome. It considers all the
facts and rules, and sorts them before concluding to a solution. This strategy is followed for working on conclusion, result, or
effect. For example, prediction of share market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this happened?”
On the basis of what has already happened, the Inference Engine tries to find out which conditions could have happened in the
past for this result. This strategy is followed for finding out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally Natural Language Processing so as to
be used by the user who is well- versed in the task domain. The user of the ES need not be necessarily an expert in Artificial
Intelligence.
Unit – II
1. What is classification? How to check accuracy of binary classification
Assignment
2. Explain terminologies of classification
1) Classifier- Is an algorithm to map input data to a specific category.
2) Classiation Model- Model predicts the class through input data given for training.
3) Feature- is an individual measureable property or phenomenon which will be observed.
4) Binary Classification- Classification with two classes.
Which has exactly two outcomes and maps the instance to any one of the two classes. eg. Positive and negative
numbers, even or odd numbers,spam mail or ham mail etc.
5) Multiclass classification - Classification with more than two classes.Each sample is assigned one and only one label
or target
4. Explain the role of confusion matrix for checking the accuracy of the model
Refer above
5. Explain Naive bayes classifier with example
Naive Bayes Classifier
The Naive Bayes Classifier is a probabilistic machine learning algorithm based on Bayes' Theorem. It assumes that features are
independent of each other (naive assumption), simplifying the computation of probabilities. It is particularly effective for
classification tasks with large datasets and works well with text data like spam filtering or sentiment analysis.
Bayes’ Theorem
Working of Naive Bayes
Example
Problem: Predict whether an email is spam or not based on two features:
- Contains the word "Offer" (yes/no).
- Contains the word "Win" (yes/no).
Advantages:
- Simple and fast.
- Handles high-dimensional data well.
Limitations:
- Assumes feature independence, which may not hold true in real-world data.
classifiers.
○ Each classifier distinguishes between two classes at a time.
● Prediction:
For a new instance, each classifier votes for one of its two classes. The class with the most votes is the final prediction.
● Example:
For 3 classes A,B,CA, B, CA,B,C:
○ Train three classifiers:
■ Classifier 1: AAA vs. BBB
■ Classifier 2: AAA vs. CCC
■ Classifier 3: BBB vs. CCC
● Advantages:
○ Often more accurate as it focuses on simpler binary problems.
○ Handles imbalances between pairs of classes better.
● Disadvantages:
○ Computationally expensive, especially for large nnn.
○ More complex to implement due to the number of classifiers.
Mean sqauared error is simply the average of squared errors ie squared differences between actual and predected value of
dependent variable.
The MSE value 0 Indicates the ideal regression model which is rare whereas higher value of MSE indeicates the bad
performance of regression model.
Root mean square error is yet another method of finding performance of regression model which is more useful than MSE.
It can be obtained just by taking the square root of the MSE but it is more accurate than MSE.
12. Explain polynomial regression in detail.
● Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and
independent variable(x) as nth degree polynomial. The Polynomial Regression equation is given below:
● y= b0+b1x1+ b2x1^2+ b2x1^3+...... bnx1^n
● It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the
Multiple Linear regression equation to convert it into Polynomial Regression.
● It is a linear model with some modification in order to increase the accuracy.
● The dataset used in Polynomial regression for training is of non-linear nature.
● It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.
● Hence, "In Polynomial regression, the original features are converted into Polynomial features of required degree
(2,3,..,n) and then modeled using a linear model."
The need of Polynomial Regression in ML can be understood in the below points:
● If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in Simple Linear
Regression, but if we apply the same model without any modification on a non-linear dataset, then it will produce a
drastic output.
● Due to which loss function will increase, the error rate will be high,and accuracy will be decreased.So for such cases,
where data points are arranged in a non-linear fashion, we need the Polynomial Regression model.
● We can understand it in a better way using the below comparison diagram of the linear dataset and non-linear
dataset.
13. Explain the following terms in brief: i) bias, ii) variance, iii) overfitting, vi) underfitting
1. Bias:
Bias refers to the error introduced by approximating a real-world problem, which may be complex, with a simplified model. A
high-bias model makes strong assumptions about the data, which can lead to systematic errors and underfitting.
● High Bias: The model is too simple and fails to capture the underlying patterns of the data (underfitting).
● Low Bias: The model is complex enough to represent the data accurately.
Example: A linear model trying to fit non-linear data will have high bias because it's oversimplifying the relationships.
2. Variance:
Variance refers to the model's sensitivity to small fluctuations in the training data. A model with high variance will capture noise
in the data, leading to overfitting, where the model fits the training data too closely and fails to generalize well to new data.
● High Variance: The model is too complex and fits the noise of the training data.
● Low Variance: The model is stable and less sensitive to fluctuations in the data.
Example: A very complex model (like a high-degree polynomial) might perform very well on training data but fail to generalize
to unseen data because it fits even the noise.
3. Underfitting:
Underfitting occurs when the model is too simple to capture the underlying patterns in the data. It results from high bias and
leads to poor performance on both the training and test sets.
● Cause: Usually due to a model that is not complex enough, lacks the capacity, or is trained for too few epochs.
● Consequence: The model performs poorly on both the training and test data, as it has not learned enough from the
data.
Example: Using a linear regression model to fit a dataset with clear non-linear relationships.
4. Overfitting:
Overfitting occurs when the model learns not only the underlying patterns but also the noise in the training data. It performs
very well on training data but poorly on unseen test data because it fails to generalize.
● Cause: Usually due to a model that is too complex or trained for too long.
● Consequence: The model has low error on the training set but high error on the test set.
Example: A decision tree with too many branches might perfectly classify the training data but perform poorly on new data.
● Bias: Error due to overly simplistic assumptions in the model (underfitting).
● Variance: Error due to the model being too sensitive to small changes in the training data (overfitting).
● Underfitting: Model is too simple to capture data patterns (high bias).
● Overfitting: Model is too complex, capturing both patterns and noise (high variance).
Algorithm:
• Step1: Load Data set
• Step2: Initialize General Hypothesis and Specific Hypothesis.
• Step3: For each training example
• Step4: If example is positive example
• if attribute_value == hypothesis_value:
• Do nothing
• else:
• replace attribute value with '?' (Basically generalizing it)
• Step5: If example is Negative example
• Make generalize hypothesis more specific.
https://www.youtube.com/watch?v=O2wYwFOMQ24
The output of the logistic function (a value between 0 and 1) is interpreted as a probability, and a threshold (usually
0.5) is applied to decide the class label. If P(Y=1∣X)>0.5P(Y = 1|X) > 0.5P(Y=1∣X)>0.5, the instance is classified as 1;
otherwise, it is classified as 0.
Types of Logistic Regression
There are three primary types of logistic regression, based on the number of classes in the classification problem:
Binary Logistic Regression
Used for classification tasks with two classes (e.g., Spam vs. Not Spam).
The model predicts the probability of one class (e.g., P(Y=1∣X)P(Y = 1|X)P(Y=1∣X)) and uses a threshold (0.5) to classify the
instance.
Example: Predicting whether a patient has a disease (1) or not (0) based on features like age, weight, etc.
Multinomial Logistic Regression (or Multiclass Logistic Regression)
Used when the target variable has more than two categories (classes).
It generalizes binary logistic regression by calculating the probability of each class using multiple equations (one vs all
approach).
The model outputs a probability distribution over all possible classes. The class with the highest probability is selected.
Example: Classifying animals into categories such as Dog, Cat, and Rabbit based on features like size, fur type, etc.
Unit III
1. What is least square method? Explain in detail.
Refer notes
2. Explain Univariant Linear regression.
3. Write a short note on Multivariate Linear Regression.
➢ Multivariate Regression is one of the simplest Machine Learning Algorithm. It comes under the class of Supervised
Learning Algorithms i.e, when we are provided with training dataset. Some of the problems that can be solved using
this model are:
➢ A researcher has collected data on three psychological variables, four academic variables (standardized test scores),
and the type of educational program the student is in for 600 high school students. She is interested in how the set of
psychological variables is related to the academic variables and the type of program the student is in.
➢ A doctor has collected data on cholesterol, blood pressure, and weight. She also collected data on the eating habits
of the subjects (e.g., how many ounces of red meat, fish, dairy products, and chocolate consumed per week). She
wants to investigate the relationship between the three measures of health and eating habits.
➢ Multivariate Regression is a type of machine learning algorithm that involves multiple data variables for analysis.
➢ It is mostly considered as a supervised machine learning algorithm. Steps involved for Multivariate regression analysis
are feature selection and feature engineering, normalizing the features, selecting the loss function and hypothesis
parameters, optimize the loss function, Test the hypothesis and generate the regression model.
➢ The major advantage of multivariate regression is to identify the relationships among the variables associated with
the data set. It helps to find the correlation between the dependent and multiple independent variables. Multivariate
linear regression is a commonly used machine learning algorithm.
➢ Multivariate Regression helps us to measure the angle of more than one independent variable and more than one
dependent variable. It finds the relation between the variables (Linearly related).
➢ It used to predict the behavior of the outcome variable and the association of predictor variables and how the
predictor variables are changing.
➢ It can be applied to many practical fields like politics, economics, medical, research works and many different kinds of
businesses.
➢ Multivariate regression is a simple extension of multiple regression.
➢ Examples of Multivariate Regression
○ If E-commerce Company has collected the data of its customers such as Age, purchased history of a customer, gender and
company want to find the relationship between these different dependents and independent variables.
○ A gym trainer has collected the data of his client that are coming to his gym and want to observe some things of client
that are health, eating habits (which kind of product client is consuming every week), the weight of the client. He wants
to find a relation between these variables.
5. Explain the concept of least squares regression to find the line of best fit for the above data.
Step 1: Calculate the slope ‘m’ by using the following formula:
b. Add all the multiplied values and call them Weighted Sum.
◆ The output can be represented as “1” or “0.” It can also be represented as “1” or “-1” depending on which
activation function is used.
8. Describe Soft Margin SVM.
Support Vector Machines (SVMs) are powerful supervised learning algorithms primarily used for classification tasks. A Soft
Margin SVM is an extension of the standard Hard Margin SVM, designed to handle data that is not perfectly linearly separable.
It allows for some misclassifications by introducing a margin of tolerance, which makes the model more flexible and robust
when dealing with noisy or overlapping data. In a typical Hard Margin SVM, the goal is to find a hyperplane that perfectly
separates the two classes with the largest possible margin, without allowing any misclassification. However, in real-world
scenarios, data is often noisy, and a perfect separation may not be possible. Soft Margin SVM allows for some misclassifications
to achieve a better overall model that generalizes well.
The key concept of Soft Margin SVM is to balance between maximizing the margin and allowing some misclassifications.
Example of Soft Margin SVM
Consider a binary classification problem where we want to classify data points into two classes: positive (+1) and negative (-1).
Some data points may be noisy, and perfect separation is not possible.
Using Soft Margin SVM, the classifier will find a hyperplane that separates the two classes while allowing some data points to
fall within the margin or be misclassified. The regularization parameter CCC will control the trade-off between margin width and
the number of misclassifications.
Unit IV
1. Explain KNN classification in detail.
Assignment
2. What is clustering ? Types of clustering.
Assignment
3. Explain K-means clustering algorithm with example.
•K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K
defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for
K=3, there will be three clusters, and so on.
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
OR( example refer below if algo is asked refer above)
8. Explain the following terms wrt decision tree:Splitting, decision node, pruning, sub tree
● Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
● Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node.
● Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given
conditions.
● Branch/Sub Tree: A tree formed by splitting the tree.
● Pruning: Pruning is the process of removing the unwanted branches from the tree.
● Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
Entropy:
Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be
calculated as:
Unit V
1. Explain Ensemble Method
Ensemble methods are machine learning techniques that combine predictions from multiple individual models to improve
overall performance. The goal of ensemble learning is to reduce errors, improve accuracy, and create a robust model that
outperforms individual models. By aggregating predictions, ensemble methods can better handle variance, bias, and model
overfitting.
Types of Ensemble Methods
Bagging (Bootstrap Aggregating):(refer below)
Boosting:(refer below)
Stacking:
● Definition: Combines multiple base models by training a meta-model (a second-level model) that learns to aggregate
the predictions of base models.
● Example: Combining predictions from logistic regression, decision trees, and SVMs into a single model.
● Advantages: Exploits the strengths of different algorithms.
Voting:
● Definition: Aggregates predictions from multiple models using voting (for classification) or averaging (for regression).
● Types:
○ Hard Voting: Takes the majority class label from all models.
○ Soft Voting: Takes the average of predicted probabilities and selects the class with the highest probability.
● Example: Combining predictions from decision trees, SVM, and k-NN classifiers.
2. What is Bagging?
● bagging, that often considers homogeneous weak learners, learns them independently from each other in parallel and combines
them following some kind of deterministic averaging process
● Bagging is used when our objective is to reduce the variance of a decision tree. Here the concept is to create a few subsets of data
from the training sample, which is chosen randomly with replacement. Now each collection of subset data is used to prepare their
decision trees thus, we end up with an ensemble of various models. The average of all the assumptions from numerous trees is
used, which is more powerful than a single decision tree.
● Random Forest is an expansion over bagging. It takes one additional step to predict a random subset of data. It also makes the
random selection of features rather than using all features to develop trees. When we have numerous random trees, it is called
the Random Forest.
● These are the following steps which are taken to implement a Random forest:
○ Let us consider X observations Y features in the training data set. First, a model from the training data set is taken randomly with
substitution.
○ The tree is developed to the largest.
○ The given steps are repeated, and prediction is given, which is based on the collection of predictions from n number of trees.
3. What is Boosting?
Boosting:
● boosting, that often considers homogeneous weak learners, learns them sequentially in a very adaptative way (a base model
depends on the previous ones) and combines them following a deterministic strategy
● Boosting is another ensemble procedure to make a collection of predictors. In other words, we fit consecutive trees, usually
random samples, and at each step, the objective is to solve net error from the prior trees.
● If a given input is misclassified by theory, then its weight is increased so that the upcoming hypothesis is more likely to classify it
correctly by consolidating the entire set at last converts weak learners into better performing models.
● Gradient Boosting is an expansion of the boosting procedure.
● Gradient Boosting = Gradient Descent + Boosting
● It utilizes a gradient descent algorithm that can optimize any differentiable loss function. An ensemble of trees is constructed
individually, and individual trees are summed successively. The next tree tries to restore the loss ( It is the difference between
actual and predicted values).
Various training data subsets are randomly drawn with replacement Each new subset contains the components that were misclassified by
from the whole training dataset. previous models.
Bagging attempts to tackle the over-fitting issue.(Bagging tries to Boosting tries to reduce bias.
reduce variance)
If the classifier is unstable (high variance), then we need to apply If the classifier is steady and straightforward (high bias), then we need
bagging. to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that belong to the same It is a way of connecting predictions that belong to the different
type. types.
Every model is constructed independently. New models are affected by the performance of the previously
developed model.
Sequence prediction attempts to predict elements of a sequence on the basis of the preceding elements
“A prediction model is trained with a set of training sequences. Once trained, the model is used to perform sequence
predictions. A prediction consists in predicting the next items of a sequence.” This task has numerous applications such as web
page prefetching, consumer product recommendation, weather forecasting and stock market prediction.
Applications
● Time-Series Forecasting:
○ Predicting stock prices, weather, or sales trends based on historical data.
● Natural Language Processing (NLP):
○ Predicting the next word in a sentence (e.g., autocomplete in keyboards).
● Recommender Systems:
○ Suggesting the next movie, song, or product based on user behavior.
● Biological Sequences:
○ Predicting DNA or protein sequences for genetic research.
● Autonomous Systems:
○ Predicting future actions in robotics or navigation tasks.
•On a certain planet, there are various fruits of different size(1-5), some of them are poisonous and others don’t. The only
criteria to decide a fruit is poisonous or not is it’s size. our task is to train a classifier which predicts the given fruit is poisonous
or not. The only information we have, is fruit with size 1 is not poisonous, the fruit of size 5 is poisonous and after a particular
size, all fruits are poisonous.
•The first approach is to check each and every size of the fruit, which consume time and resources.
•The second approach is to apply the binary search and find the transition point (decision boundary). This approach uses fewer
data and gives the same results as of linear search.