ML 1234
ML 1234
computers “see” and distinguish objects and text in images and videos.
Four years later, in 2010 Microsoft revealed their Kinect technology could track 20 human features
What is Machine Learning? at a rate of 30 times per second, allowing people to interact with the computer via movements and
Machine Learning is a branch of artificial intelligence that develops algorithms by learning the gestures. The follow year IBM’s Watson beat its human competitors at Jeopardy.
hidden patterns of the datasets used it to make predictions on new similar type data, without being Google Brain was developed in 2011 and its deep neural network could learn to discover and
explicitly programmed for each task. categorize objects much the way a cat does. The following year, the tech giant’s X Lab developed a
Traditional Machine Learning combines data with statistical tools to predict an output that can be machine learning algorithm that is able to autonomously browse YouTube videos to identify the
used to make actionable insights. videos that contain cats.
Machine learning is used in many different applications, from image and speech recognition to In 2014, Facebook developed DeepFace, a software algorithm that is able to recognize or verify
natural language processing, recommendation systems, fraud detection, portfolio optimization, individuals on photos to the same level as humans can.
automated task, and so on. Machine learning models are also used to power autonomous vehicles, 2015 - Present day
drones, and robots, making them more intelligent and adaptable to changing environments.
Amazon launched its own machine learning platform in 2015. Microsoft also created the Distributed
History of the machine learning - Machine Learning Toolkit, which enabled the efficient distribution of machine learning problems
The early days - across multiple computers.
Machine learning history starts in 1943 with the first mathematical model of neural networks Then more 3,000 AI and Robotics researchers, endorsed by Stephen Hawking, Elon Musk and Steve
presented in the scientific paper "A logical calculus of the ideas immanent in nervous activity" by Wozniak (among many others), signed an open letter warning of the danger of autonomous weapons
Walter Pitts and Warren McCulloch. which select and engage targets without human intervention.
Then, in 1949, the book The Organization of Behavior by Donald Hebb is published. The book had
theories on how behavior relates to neural networks and brain activity and would go on to become Need of machine learning -
one of the monumental pillars of machine learning development.
Use cases of Machine Learning Technology
In 1950 Alan Turing created the Turing Test to determine if a computer has real intelligence. To
pass the test, a computer must be able to fool a human into believing it is also human. Machine Learning is broadly used in every industry and has a wide range of applications, especially
that involves collecting, analyzing, and responding to large sets of data. The importance of Machine
Learning can be understood by these important applications.
Playing games and plotting routes - Some important applications in which machine learning is widely used are given below:
The first ever computer learning program was written in 1952 by Arthur Samuel. The program was 1. Healthcare: Machine Learning is widely used in the healthcare industry. It helps healthcare
the game of checkers, and the IBM computer improved at the game the more it played, studying researchers to analyze data points and suggest outcomes. Natural language processing helped to
which moves made up winning strategies and incorporating those moves into its program. give accurate insights for better results of patients. Further, machine learning has improved the
Then in 1957 Frank Rosenblatt designed the first neural network for computers - the perceptron - treatment methods by analyzing external data on patients' conditions in terms of X-ray, Ultrasound,
which simulated the thought processes of the human brain. CT-scan, etc. NLP, medical imaging, and genetic information are key areas of machine learning that
improve the diagnosis, detection, and prediction system in the healthcare sector.
Twelve years later, in 1979 students at Stanford University invent the ‘Stanford Cart’ which could
navigate obstacles in a room on its own. And in 1981, Gerald Dejong introduced the concept of 2. Automation: This is one of the significant applications of machine learning that helps to make the
Explanation Based Learning (EBL), where a computer analyses training data and creates a general system automated. It helps machines to perform repetitive tasks without human intervention. As a
rule it can follow by discarding unimportant data. machine learning engineer and data scientist, you have the responsibilities to solve any given task
multiple times with no errors. However, this is not practically possible for humans. Hence machine
Big steps forward - learning has developed various models to automate the process, having the capability of performing
In the 1990s work on machine learning shifted from a knowledge-driven approach to a data-driven iterative tasks in lesser time.
approach. Scientists began creating programs for computers to analyze large amounts of data and 3. Banking and Finance: Machine Learning is a subset of AI that uses statistical models to make
draw conclusions — or “learn” — from the results. accurate predictions. In the banking and finance sector, machine learning helped in many ways,
And in 1997, IBM’s Deep Blue shocked the world by beating the world champion at chess. such as fraud detection, portfolio management, risk management, chatbots, document analysis,
high-frequency trading, mortgage underwriting, AML detection, anomaly detection, risk credit filter, rules-based filter, permission filter, general blacklist filter, etc., are some important spam
score detection, KYC processing, etc. Hence, machine learning is widely applied in the banking and filters used by Google.
finance sector to reduce error as well as time.
10. Self-driving cars: This is one of the most exciting applications of machine learning. Machine
4. Transportation and Traffic Prediction: This is one of the most common applications of Machine learning plays a vital role in the manufacturing of self-driving cars. It uses an unsupervised learning
Learning that is widely used by all individuals in their daily routine. It helps to ensure highly method to train car models to detect people and objects while driving. Tata and Tesla are the most
secured routes, generate accurate ETAs, predict vehicle breakdown, Driving Prescriptive Analytics, popular car manufacturing companies working on self-driving cars. Hence, it is a big revolution in a
etc. Although machine learning has solved transportation problems, it still requires more technological era which is also done with the help of machine learning.
improvement. Statistical machine learning algorithms helps to build a smart transportation system.
11. Credit card fraud detection: Credit card frauds have become very easy targets for online hackers.
Further, deep Learning explored the complex interactions of roads, highways, traffic, environmental
As the culture of online/digital payments is increasing, the risk of credit/debit cards is parallel
elements, crashes, etc. Hence, machine learning technology has improved daily traffic management
increasing. Machine Learning also helps developers to detect and analyze frauds in online
as well as a collection of traffic data to predict insights of routes and traffic.
transactions. It develops a novel fraud detection method for Streaming Transaction Data, with an
5. Image Recognition: It is one of the most common applications of machine learning which is used objective to analyze the past transaction details of the customers and extract the behavioral patterns.
to detect the image over the internet. Further, various social media sites such as Facebook uses Further, cardholders are clustered into various categories with their transaction amount so that the
image recognition for tagging the images to your Facebook friends with its feature named auto behavioral pattern of the groups can be extracted respectively. Hence, credit card fraud detection is
friend tagging suggestion. Further, now a day's, almost all mobile devices come with exciting face a novel approach using Aggregation Strategy and Feedback Mechanism of machine learning.
detection features. Using this feature, you can secure your mobile data with face unlocking, so if
12. Stock Marketing and Trading: Machine learning also helps in the stock marketing and trading
anyone tries to access your mobile device, they cannot open without face recognition.
sector, where it uses historical trends or past experience for predicting the market risk. As share
6. Speech Recognition: Speech recognition is one of the biggest achievements of machine learning marketing is another name of marketing risk, machine learning reduces it to some extent and
applications. It enables users to search content without writing text or, in other words, 'search by predicts data against marketing risk. Machine learning's long short-term neural memory network is
voice'. It can search content/products on YouTube, Google, Amazon, etc. platforms by your voice. used for the prediction of stock market trends.
This technology is referred to as speech recognition. It is a process of converting voice
13. Language Translation: The use of Machine learning can be seen in language translation. It uses
instructions into the text; hence it is also known as 'Speech to text' or 'Computer speech recognition.
the sequence-to-sequence learning algorithms for translating one language into other. Further, it also
Some important examples of speech recognitions are Google assistant, Siri, Cortana, Alexa, etc.
uses images recognition techniques to identify the text from one language to other. Similarly,
7. Product Recommendation: It is one of the biggest achievements made by machine learning which Google's GNMT (Google Neural Machine Translation) provides this feature, which is a Neural
helps various e-commerce and entertainment companies like Flipkart, Amazon, Netflix, etc., to Machine Learning that translates the text into our familiar language, and it is called automatic
digitally advertise their products over the internet. When anyone searches for any product, they start translation.
getting an advertisement for the same product while internet surfing on the same browser. This is
possible by machine learning algorithms that work on users' interests or past experience and
accordingly recommend them for products. For e.g., when we search for a laptop on the Amazon Features of machine learning -
platform, then it also gets started with so many other laptops having the same categories and
There are several advantages of using machine learning, including:
criteria. Similarly, when we use Netflix, we find some recommendations for entertainment series,
movies, etc. Hence, this is also possible by machine learning algorithms. 1. Improved accuracy: Machine learning algorithms can analyze large amounts of data and identify
patterns that may not be apparent to humans. This can lead to more accurate predictions and
8. Virtual Personal Assistance: This feature helps us in many ways, such as searching content using
decisions.
voice instruction, calling a number using voice, searching contact in your mobile, playing music,
opening an email, Scheduling an appointment, etc. Now a day, you all have seen advertising like 2. Automation: Machine learning models can automate tasks that would otherwise be done by
"Alexa! Play the Music" this is also done with the help of machine learning. Google Assistant, humans, freeing up time and resources.
Alexa, Cortana, Siri, etc., are a few common applications of machine learning. These virtual
3. Real-time performance: Machine learning models can analyze data in real time, allowing for
personal assistants record our voice instructions, send them over to the server on a cloud, decode it
quick decision making.
using ML algorithms and act accordingly.
4. Scalability: Machine learning models can be easily scaled up or down to handle changes in the
9. Email Spam and Malware detection & Filtering: Machine learning also helps us for filtering amount of data.
emails in different categories such as spam, important, general, etc. In this way, users can easily
identify whether the email is useful or spam. This is also possible by machine learning algorithms 5. Cost-effectiveness: Machine learning can reduce the need for human labor, which can lead to cost
such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier. Content filter, header savings over time.
6. Ability to learn from experience: Machine learning models can improve over time as they are
exposed to more data, which enables them to learn from their mistakes and improve their
performance. Block diagrammatic representation of Machine Learning.
7. Better predictions: Machine learning models can make predictions with greater accuracy than
traditional statistical models. > Diagrammatic representation of machine-learning methods, datasets and methods of validation
8. Predictive Maintenance: Machine learning models can help identify patterns in sensor data that used in prediction of PPIs. (A) Classification of different machine-learning methods into supervised
are indicative of equipment failure, allowing for preventative maintenance to be scheduled before an and unsupervised approaches. (B) Training, testing and blind datasets for k-fold cross-validation.
issue occurs. (C) Training and testing datasets for bootstrap validation.
It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture. 6) Email Spam and Malware Filtering
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
2) Speech Recognition always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning. Below are some spam filters used by
While using Google, we get an option of "Search by voice," it comes under speech recognition, and Gmail:
it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms are *Content Filter
widely used by various applications of speech recognition. Google assistant, Siri, Cortana, and *Header filter
Alexa are using speech recognition technology to follow the voice instructions.
*General blacklists filter
*Rules-based filters
3) Traffic prediction
*Permission filters
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with
the shortest route and predicts the traffic conditions. Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
* Real Time location of the vehicle form Google Map app and sensors 7) Virtual Personal Assistant
* Average time has taken on past days at the same time. We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the
name suggests, they help us in finding the information using our voice instruction. These assistants
Everyone who is using Google Map is helping this app to make it better. It takes information from can help us in various ways just by our voice instructions such as Play music, call someone, Open
the user and sends back to its database to improve the performance. an email, Scheduling an appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
4) Product recommendations These assistant record our voice instructions, send it over the server on a cloud, and decode it using
Machine learning is widely used by various e-commerce and entertainment companies such as ML algorithms and act accordingly
Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some 8) Online Fraud Detection
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
Google understands the user interest using various machine learning algorithms and suggests the transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
product as per customer interest. transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values Unit – 2
become the input for the next round. For each genuine transaction, there is a specific pattern which
gets change for the fraud transaction hence, it detects it and makes our online transactions more Dimensionality Reduction
secure
>Dimensionality reduction is the process of reducing the number of features (or dimensions) in a
dataset while retaining as much information as possible. This can be done for a variety of reasons,
such as to reduce the complexity of a model, to improve the performance of a learning algorithm, or
9) Stock Market trading
to make it easier to visualize the data. There are several techniques for dimensionality reduction,
Machine learning is widely used in stock market trading. In the stock market, there is always a risk including principal component analysis (PCA), singular value decomposition (SVD), and linear
of up and downs in shares, so for this machine learning's long short term memory neural network is discriminant analysis (LDA). Each technique uses a different method to project the data onto a
used for the prediction of stock market trends. lower-dimensional space while preserving important information.
9) Stock Market trading > Dimensionality reduction is a technique used to reduce the number of features in a dataset while
retaining as much of the important information as possible. In other words, it is a process of
Machine learning is widely used in stock market trading. In the stock market, there is always a risk
transforming high-dimensional data into a lower-dimensional space that still preserves the essence
of up and downs in shares, so for this machine learning's long short term memory neural network is
of the original data.
used for the prediction of stock market trends.
In machine learning, high-dimensional data refers to data with a large number of features or
variables. The curse of dimensionality is a common problem in machine learning, where the
10) Medical Diagnosis performance of the model deteriorates as the number of features increases. This is because the
complexity of the model increases with the number of features, and it becomes more difficult to find
In medical science, machine learning is used for diseases diagnoses. With this, medical technology
a good solution. In addition, high-dimensional data can also lead to overfitting, where the model fits
is growing very fast and able to build 3D models that can predict the exact position of lesions in the
the training data too closely and does not generalize well to new data.
brain.
Dimensionality reduction can help to mitigate these problems by reducing the complexity of the
It helps in finding brain tumors and other brain-related diseases easily.
model and improving its generalization performance. There are two main approaches to
dimensionality reduction: feature selection and feature extraction
11) Automatic Language Translation
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at 1) Feature Selection:
all, as for this also machine learning helps us by converting the text into our known languages.
Feature selection involves selecting a subset of the original features that are most relevant to the
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
problem at hand. The goal is to reduce the dimensionality of the dataset while retaining the most
Machine Learning that translates the text into our familiar language, and it called as automatic
important features. There are several methods for feature selection, including filter methods,
translation.
wrapper methods, and embedded methods. Filter methods rank the features based on their relevance
The technology behind the automatic translation is a sequence to sequence learning algorithm, to the target variable, wrapper methods use the model performance as the criteria for selecting
which is used with image recognition and translates the text from one language to another language features, and embedded methods combine feature selection with the model training process.
2) Feature Extraction:
Feature extraction involves creating new features by combining or transforming the original
features. The goal is to create a set of features that captures the essence of the original data in a
lower-dimensional space. There are several methods for feature extraction, including principal
component analysis (PCA), linear discriminant analysis (LDA), and t-distributed stochastic
neighbor embedding (t-SNE). PCA is a popular technique that projects the original features onto a
lower-dimensional space while preserving as much of the variance as possible.
Row Vector Types of datasets
A matrix having only one row is called a row vector. Machine learning incorporates different domains, each requiring explicit sorts of datasets. A few
* Column Vector normal sorts of datasets utilized in machine learning include:
A matrix having only one column is called a column vector 1) Image Datasets:
Image datasets contain an assortment of images and are normally utilized in computer vision tasks
such as image classification, object detection, and image segmentation.
Examples :
*ImageNet
A dataset is a collection of data in which data is arranged in some order. A dataset can contain any *MNIST
data from a series of an array to a database table. Below table shows an example of the dataset:
Country Age Salary Purchased 2) Text Datasets:
India 38 48000 No Text datasets comprise textual information, like articles, books, or virtual entertainment posts.
France 43 45000 Yes These datasets are utilized in NLP techniques like sentiment analysis, text classification, and
machine translation.
Germany 30 54000 No
Examples :
France 48 65000 No
Germany 40 Yes
*Gutenberg Task dataset
India 35 58000 Yes
*IMDb film reviews dataset
3) Time Series Datasets:
A tabular dataset can be understood as a database table or matrix, where each column corresponds
to a particular variable, and each row corresponds to the fields of the dataset. The most supported Time series datasets include information focuses gathered after some time. They are generally
file type for a tabular dataset is "Comma Separated File," or CSV. But to store a "tree-like data," we utilized in determining, abnormality location, and pattern examination.
can use the JSON file more efficiently. Examples :
Types of data in datasets *Securities exchange information
1) Numerical data:Such as house price, temperature, etc. *Climate information
2) Categorical data:Such as Yes/No, True/False, Blue/green, etc. *Sensor readings.
3) Ordinal data:These data are similar to categorical data but can be measured on the basis of 4) Tabular Datasets:
comparison.
Tabular datasets are organized information coordinated in tables or calculation sheets. They contain
Note: A real-world dataset is of huge size, which is difficult to manage and process at the initial lines addressing examples or tests and segments addressing highlights or qualities. Tabular datasets
level. Therefore, to practice machine learning algorithms, we can use any dummy dataset. are utilized for undertakings like relapse and arrangement. The dataset given before in the article is
an illustration of a tabular dataset.
*Completely ready and pre-handled datasets are significant for machine learning projects. Here we have used nm, which is a short name for Numpy, and it will be used in the whole program.
*They give the establishment to prepare exact and solid models. Notwithstanding, working with
enormous datasets can introduce difficulties regarding the board and handling. Mat plotlib: The second library is mat plotlib, which is a Python 2D plotting library, and with this
*To address these difficulties, productive information the executive's strategies and are expected to library, we need to import a sub-library pyplot. This library is used to plot any type of charts in
handle calculations. Python for the code. It will be imported as below:
Since machine learning model completely works on mathematics and numbers, but if our dataset
There are mainly two ways to handle missing data, which are:
would have a categorical variable, then it may create trouble while building the model. So it is
By deleting the particular row: The first way is used to commonly deal with null values. In this way, necessary to encode these categorical variables into numbers.
we just delete the specific row or column which consists of null values. But this way is not so
efficient and removing data may lead to loss of information which will not give the accurate output.
For Country variable:
By calculating the mean: In this way, we will calculate the mean of that column or row which
contains any missing value and will put it on the place of missing value. This strategy is useful for Firstly, we will convert the country variables into categorical data. So to do this, we will use
the features which have numeric data such as age, salary, year, etc. Here, we will use this approach. LabelEncoder() class from preprocessing library.
Column standardization
To handle missing values, we will use Scikit-learn library in our code, which contains various Geometrically, column standardization means squishing the data points such that the mean vector
libraries for building machine learning models. Here we will use Imputer cl… comes at origin and the variance(by either squishing or expanding) on any axes would be 1 in the
[2:15 PM, 5/12/2024] Himanshu Skitm: array([['India', 38.0, 68000.0], transformed space. column standardization is often called mean centering and variance
scaling(squishing/expanding).
['France', 43.0, 45000.0],
[2:16 PM, 5/12/2024] Himanshu Skitm: Standardization or Z-Score Normalization is the
['Germany', 30.0, 54000.0], transformation of features by subtracting from mean and dividing by standard deviation. This is
['France', 48.0, 65000.0], often called as Z-score.
data_set= pd.read_csv('Dataset.csv')
Here, data_set is a name of the variable to store our dataset, and inside the function, we have passed
the name of our dataset. Once we execute the above line of code, it will successfully import the
dataset in our code. We can also check the imported dataset by clicking on the section variable
explorer, and then double click on data_set. Consider the below image:
* Covariance of a data matrix Principal Component Analysis(PCA) technique was introduced by the mathematician Karl
Pearson in 1901. It works on the condition that while the data in a higher dimensional space
Covariance matrices represent the covariance values of each pair of variables in multivariate data.
is mapped to
These values show the distribution magnitude and direction of multivariate data in a
multidimensional space and can allow you to gather information about how data spreads among two data in a lower dimension space, the variance of the data in the lower dimensional space
dimensions. should be maximum.
Covariance Matrix is a type of matrix used to describe the covariance values between two items in a *Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal
random vector. It is also known as the variance-covariance matrix because the variance of each transformation that converts a set of correlated variables to a set of uncorrelated
element is represented along the matrix’s major diagonal and the covariance is represented among variables.PCA is the most widely used tool in exploratory data analysis and in machine
the non-diagonal elements. A covariance matrix is usually a square matrix. It is also positive semi-
learning for predictive models. Moreover,
definite and symmetric. This matrix comes in handy when it comes to stochastic modeling and
Principal component analysis. *Principal Component Analysis (PCA) is an unsupervised learning algorithm technique used
to examine the interrelations among a set of variables. It is also known as a general factor
analysis where regression determines a line of best fit.
*The main goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a
dataset while preserving the most important patterns or relationships between the variables
without any prior knowledge of the target variables.
Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by
finding a new set of variables, smaller than the original set of variables, retaining most of the
sample’s information, and useful for the regression and classification of data.
Principal Component Analysis (PCA) is a technique for dimensionality reduction that
identifies a set of orthogonal axes, called principal components, that capture the maximum
variance in the data. The principal components are linear combinations of the original
* Principal Component Analysis(PCA) variables in the dataset and are ordered in decreasing order of importance. The total variance
captured by all the principal components is equal to the total variance in the original dataset.
As the number of features or dimensions in a dataset increases, the amount of data required
to obtain a statistically significant result increases exponentially. This can lead to issues such *The first principal component captures the most variation in the data, but the second
as overfitting, increased computation time, and reduced accuracy of machine learning principal component captures the maximum variance that is orthogonal to the first principal
models this is known as the curse of dimensionality problems that arise while working with component, and so on.
high-dimensional data.
*Principal Component Analysis can be used for a variety of purposes, including data
As the number of dimensions increases, the number of possible combinations of features visualization, feature selection, and data compression. In data visualization, PCA can be used
increases exponentially, which makes it computationally difficult to obtain a representative to plot high-dimensional data in two or three dimensions, making it easier to interpret. In
sample of the data and it becomes expensive to perform tasks such as clustering or feature selection, PCA can be used to identify the most important variables in a dataset. In
classification because it becomes. Additionally, some machine learning algorithms can be data compression, PCA can be used to reduce the size of a dataset without losing important
sensitive to the number of dimensions, requiring more data to achieve the same level of information.
accuracy as lower-dimensional data.
*In Principal Component Analysis, it is assumed that the information is carried in the
To address the curse of dimensionality, Feature engineering techniques are used which variance of the features, that is, the higher the variation in a feature, the more information
include feature selection and feature extraction. Dimensionality reduction is a type of feature that features carries.
extraction technique that aims to reduce the number of input features while retaining as much
of the original information as possible.
How Supervised Learning Works
Unit 3
In supervised learning, models are trained using labelled dataset, where the model learns
Supervised Learning about each type of data. Once the training process is completed, the model is tested on the
basis of test data (a subset of the training set), and then it predicts the
the output.
Supervised Machine Learning
Supervised learning is the types of machine learning in which machines are trained using
The working of Supervised learning can be easily understood by the below example and
diagram:
well "labelled" training data, and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
o If the given shape has four sides, and all the sides are eq
equal,
ual, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
triangle
o If the given shape has six equal sides then it will be labelled as hexagon.
hexagon
Now, after training, we test our model using the test set, an
and
d the task of the model is to
identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.
Types of supervised Machine learning Algorithms: K-Nearest Neighbor(KNN) Algorithm for Machine Learning
Supervised learning can be further divided into two types of problems: o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
ADVERTISEMENT o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
1. Regression o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Regression algorithms are used if there is a relationship between the input variable and the o Example: Suppose, we have an image of a creature that looks similar to cat and dog,
output variable. It is used for the prediction of continuous variables, such as Weather but we want to know either it is a cat or dog. So for this identification, we can use the
forecasting, Market Trends, etc. Below are some popular Regression algorith
algorithms which come KNN algorithm, as it works on a similarity measure. Our KNN model will find the
under supervised learning: similar features of the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
o Linear Regression
Regression Trees
o
o Non-Linear Regression
How does K-NN work?
o Bayesian Linear Regression The K-NN working can be explained on the basis of the below algorithm:
o Polynomial Regression
o Step-1: Select the number K of the neighbors
2. Classification o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Classification algorithms are used when the output variable is categorical, which means the
there
o Step-4: Among these k neighbors, count the number of the data points in each
are two classes such as Yes-No,
No, Male
Male-Female, True-false, etc.
category.
Spam Filtering, o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Random Forest o Step-6: Our model is ready.
o Decision Trees
o Logistic Regression Suppose we have a new data point and we need to put it in the required category.
o Support vector Machines Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5. o As we can see the 3 nearest neighbors are from category A, hence this new data point
o Next, we will calculate the Euclidean distance between the data points. The must belong to category A.
Euclidean distance is the distance between two points, which we have already studied
in geometry. It can be calculated as: Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
p
o It is mainly used in text classification that includes a high-dimensional
dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles
articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
o Naïve:: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.
features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest apple. Hence each feature individually contributes to identify that it is an apple
neighbors in category A and two nearest neighbors in category B. Consider the below without depending on each other.
image: o Bayes:: It is called Bayes because it depends on the principle of Bayes' Theorem
Theorem.
Bayes' Theorem: o Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to assumes that these values are sampled from the Gaussian distribution.
determine the probability of a hypothesis with prior knowledge. It depends on the o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
conditional probability. multinomial distributed. It is primarily used for document classification problems, it
o The formula for Bayes' theorem is given as: means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
Where, word is present or not in a document. This model is also famous for document
classification tasks.
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
Decision Tree in Machine Learning
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true. A decision tree is a type of supervised learning algorithm that is commonly used in machine
learning to model and predict outcomes based on input data. It is a tree-like structure where
Backward Skip 10sPlay VideoForward Skip 10s each internal node tests on attribute, each branch corresponds to attribute value and each leaf
node represents the final decision or prediction. The decision tree algorithm falls under the
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
category of supervised learning. They can be used to solve
P(B) is Marginal Probability: Probability of Evidence. both regression and classification problems.
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets. There are specialized terms associated with decision trees that denote various components
o It can be used for Binary as well as Multi-class Classifications. and facets of the tree structure and decision-making procedure. :
o It performs well in Multi-class predictions as compared to the other Algorithms.
Root Node: A decision tree’s root node, which represents the original choice or
o It is the most popular choice for text classification problems.
feature from which the tree branches, is the highest node.
Disadvantages of Naïve Bayes Classifier:
Internal Nodes (Decision Nodes): Nodes in the tree whose choices are determined by
o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the values of particular attributes. There are branches on these nodes that go to other
the relationship between features. nodes.
Applications of Naïve Bayes Classifier: Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are
decided upon. There are no more branches on leaf nodes.
o It is used for Credit Scoring.
o It is used in medical data classification. Branches (Edges): Links between nodes that show how decisions are made in
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager response to particular circumstances.
learner.
Splitting: The process of dividing a node into two or more sub-nodes based on a
o It is used in Text classification such as Spam filtering and Sentiment analysis.
decision criterion. It involves selecting a feature and a threshold to create subsets of
Types of Naïve Bayes Model: data.
There are three types of Naive Bayes Model, which are given below: Parent Node: A node that is split into child nodes. The original node from which a
split originates.
Pruning:: The process of removing branches or nodes from a decision tree to improve
its generalisation and prevent overfitting.
y= a0+a1x+ ε
Here,
The linear regression model provides a sloped straight line representing the relationship ADVERTISEMENT
between the variables. Consider the below image:
ADVERTISEMENT
Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it
is called logistic regression, but is used to classify samples; Therefore, it falls under the
classification algorithm.
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond Machine. Consider the below diagram in which there are two different categories that
tha are
this limit, so it forms a curve like the "S" form. The S-form curve is called the classified using a decision boundary or hyperplane:
Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
Example: SVM can be understood with the example that we have uused sed in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be created
by using the SVM algorithm. We will first train our m model
odel with lots of images of cats and
dogs so that it can learn about different features of cats and dogs, and then we test it with this
o But we need range between -[infinity] to +[infinity], then take logarithm of the
strange creature. So as support vector creates a decision boundary between these two data
equation it will become:
(cat and dog) and choose extrememe cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below
diagram:
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data
is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-LinearLinear SVM is used for nonnon-linearly
linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear
linear data and classifier used is called as Non
Non-linear
linear SVM classifier.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. Hence,e, the SVM algorithm helps to find the best line or decision boundary; this best
We want a classifier that can classify the pair(x1, x2) of coordinates in eith
either
er green or blue. boundary or region is called as a hyperplane
hyperplane.. SVM algorithm finds the closest point of the
Consider the below image: lines from both the classes. These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin.. And the goal of SVM is to maximize this
margin. The hyperplane withh maximum margin is called the optimal hyperplane.
hyperplane
Non-Linear SVM:
If data is linearly arranged, then we can separate iitt by using a straight line, but for non
non-linear
data, we cannot draw a single straight line. Consider the below image:
ADVERTISEMENT
ADVERTISEMENT
So to separate these data points, we need to add one more dimension. For linear data, we
have used two dimensions x and y, so for non
non-linear
linear data, we will add a third dimension z. It
can be calculated as:
z=x2 +y2
By adding
ing the third dimension, the sample space will become as below image:
Since we are in 3-dd Space, hence it is looking like a plane parallel to the xx-axis.
axis. If we convert
it in 2d space with z=1, then it will become as: