ML All Units Mca 3rd Semester Anna University
ML All Units Mca 3rd Semester Anna University
ML All Units Mca 3rd Semester Anna University
COURSE: II MCA
SUBJECT: MACHINE LEARNING CODE: MC- 4301
SYLLABUS
UNIT- I
INTRODUCTION
Human Learning - Types – Machine Learning - Types - Problems not to be solved - Applications –
Languages / Tools– Issues. Preparing to Model: Introduction - Machine Learning Activities – Types of
data - Exploring structure of data - Data Quality and remediation - Data Pre-Processing.
UNIT - II
MODEL EVALUATION AND FEATURE ENGINEERING
Model Selection - Training Model - Model Representation and Interpretability – Evaluating Performance of
a Model - Improving Performance of a Model - Feature Engineering: Feature Transformation - Feature
Subset Selection.
UNIT - III
BAYESIAN LEARNING
Basic Probability Notation - Inference - Independence - Bayes’ Rule. Bayesian Learning: Maximum
Likelihood and Least Squared error hypothesis - Maximum Likelihood hypotheses for predicting
probabilities - Minimum description Length principle - Bayes optimal classifier - Naive Bayes classifier -
Bayesian Belief networks - EM algorithm.
UNIT – IV
PARAMETRIC MACHINE LEARNING
Logistic Regression: Classification and representation – Cost function – Gradient descent – Advanced
optimization – Regularization - Solving the problems on overfitting. Perceptron – Neural Networks – Multi
– Class Classification - Backpropagation – Non-linearity with activation functions (Tanh, Sigmoid, Relu,
PRelu) - Dropout as regularization.
UNIT - V
NON PARAMETRIC MACHINE LEARNING
k- Nearest Neighbors- Decision Trees – Branching – Greedy Algorithm - Multiple Branches – Continuous
attributes – Pruning. Random Forests: ensemble learning. Boosting – Adaboost algorithm. Support Vector
Machines – Large Margin Intuition – Loss Function - Hinge Loss – SVM Kernels.
REFERENCES
Ethem Alpaydin, ―Introduction to Machine Learning 3e (Adaptive Computation and Machine
Learning Series)‖, Third Edition, MIT Press, 2014
Tom M. Mitchell, “Machine Learning”, India Edition, 1st Edition, McGraw-Hill Education Private
Limited, 2013
Saikat Dutt, Subramanian Chandramouli and Amit Kumar Das, "Machine
Learning", 1st Edition, Pearson Education, 2019
Christopher M. Bishop, “Pattern Recognition and Machine Learning”, Revised Edition, Springer,
2016.
Aurelien Geron, “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, 2nd
Edition, O‟Reilly, 2019
Stephen Marsland, ―Machine Learning – An Algorithmic Perspective‖, Second Edition, Chapman
and Hall/CRC Machine Learning and Pattern Recognition Series, 2014.
Unit : I
Human Learning - Types – Machine Learning - Types - Problems not to be solved - Applications –
Languages / Tools– Issues. Preparing to Model: Introduction - Machine Learning Activities –
Types of data - Exploring structure of data - Data Quality and remediation - Data Pre-Processing.
INTRODUCTION
Human Learning is the process of obtaining new understanding, knowledge, behaviors, skills, values,
attitudes, and preferences.
The ability to learn is possessed by humans is called human learning.
Human Learning Systems is an approach which holds the complexity of the real world, and enables
us to work effectively in that complexity.
Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of
a machine to imitate intelligent human behavior.
Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed.
ML is one of the most exciting technologies that one would have ever come across.
As it is clear from the name, it gives the computer that makes it more similar to humans: The ability
to learn.
Machine learning is actively being used today, perhaps in many more places than one would expect.
TYPES OF HUMAN LEARNING:
Classical Conditioning
Observational Learning
Operant Conditioning.
Classical Conditioning:
Classical conditioning is a type of
learning that happens
unconsciously.
When you learn through classical
conditioning, an automatic
conditioned response is paired
with a specific stimulus.
This creates a behavior.
Observational Learning:
Observational learning is the process of
learning by watching the behaviors of
others.
The targeted behavior is watched,
memorized, and then mimicked.
Also known as shaping and modeling,
observational learning is most common in
children as they imitate behaviors of
adults.
Operant Conditioning
Operant conditioning
sometimes referred to as
instrumental conditioning.
It is a method of learning that
uses rewards and punishment
to modify behavior.
Through operant conditioning,
behavior that is rewarded is
likely to be repeated, and
behavior that is punished will
rarely occur.
Supervised Learning
Supervised learning is the types of learning in which machines are trained using well "labelled"
training data, and on basis of that data, machines predict the output.
The labelled data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that teaches
the machines to predict the output correctly.
It applies the same concept as a student learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the machine
learning model.
The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x)
with the output variable(y).
Unsupervised Learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
Reinforcement Learning
2) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables.
These are used to predict continuous output variables, such as market trends, weather prediction,
etc.
Some popular Regression algorithms are given below:
a) Simple Linear Regression Algorithm
b) Multivariate Regression Algorithm
c) Decision Tree Algorithm
d) Lasso Regression
Advantages:
Since supervised learning work with the labeled dataset so we can have an exact idea about the
classes of objects.
These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
These algorithms are not able to solve complex tasks.
It may predict the wrong output if the test data is different from the training data.
It requires lots of computational time to train the algorithm.
APPLICATIONS OF SUPERVISED LEARNING
Image Segmentation:
Supervised Learning algorithms are used in image segmentation.
In this process, image classification is performed on different image data with pre-defined labels.
Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes.
It is done by using medical images and past labelled data with labels for disease conditions.
With such a process, the machine can identify a disease for the new patients.
Fraud Detection :
Supervised Learning classification algorithms are used for identifying fraud transactions, fraud
customers, etc.
It is done by using historic data to identify the patterns that can lead to possible fraud.
Spam detection
In spam detection & filtering, classification algorithms are used.
These algorithms classify an email as spam or not spam.
The spam emails are sent to the spam folder.
Speech Recognition
Supervised learning algorithms are also used in speech recognition.
The algorithm is trained with voice data, and various identifications can be done using the same,
such as voice-activated passwords, voice commands, etc.
Unsupervised Machine Learning
Unsupervised learning is different from the supervised learning technique;
In unsupervised machine learning, the machine is trained using the unlabeled dataset, and the
machine predicts the output without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor labelled,
and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences.
Machines are instructed to find the hidden patterns from the input dataset.
The machine will discover its patterns and differences, such as color difference, shape difference,
and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types:
Clustering
Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data.
It is a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups.
An example of the clustering algorithm is grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:
a) K-Means Clustering algorithm
b) Mean-shift algorithm
c) DBSCAN Algorithm
d) Principal Component Analysis
e) Independent Component Analysis
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset.
The main aim of this learning algorithm is to find the dependency of one data item on another
data item and map those variables accordingly so that it can generate maximum profit.
This algorithm is mainly applied in Market Basket analysis, Web usage mining, continuous
production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, and FP-
growth algorithm.
ADVANTAGES AND DISADVANTAGES OF UNSUPERVISED LEARNING:-
Advantages:
These algorithms can be used for complicated tasks compared to the supervised ones because these
algorithms work on the unlabeled dataset.
Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset are easier as
compared to the labelled dataset.
Disadvantages:
The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that
does not map with the output.
Applications of Unsupervised Learning
Network Analysis:
Unsupervised learning is used for identifying plagiarism and copyright in document network
analysis of text data for scholarly articles.
Recommendation Systems:
Recommendation systems widely use unsupervised learning techniques for building
recommendation applications for different web applications and e-commerce websites.
Anomaly Detection:
Anomaly detection is a popular application of unsupervised learning, which can identify unusual
data points within the dataset.
It is used to discover fraudulent transactions.
Singular Value Decomposition:
Singular Value Decomposition or SVD is used to extract particular information from the
database.
For example, extracting information of each user located at a particular location.
Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning.
It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.
Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled data.
As labels are costly, but for corporate purposes, they may have few labels.
It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.
To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced.
The main aim of semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning.
Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it helps
to label the unlabeled data into labelled data.
It is because labelled data is a comparatively more expensive than unlabeled data.
For Example.
Supervised learning is where a student is under the supervision of an instructor at home and
college.
Further, if that student is self-analyzing the same concept without any help from the instructor, it
comes under unsupervised learning.
Under semi-supervised learning, the student has to revise himself after analyzing the same
concept under the guidance of an instructor at college.
Advantages and Disadvantages of Semi-Supervised Learning
Advantages:
It is simple and easy to understand the algorithm.
It is highly efficient.
It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:
Iterations results may not be stable.
We cannot apply these algorithms to Network-Level Data.
Accuracy is Low.
Reinforcement Learning
Reinforcement Learning works on a feedback-based process, which automatically explore its
surrounding by taking action, learning from experiences.., finally it improves our performance.
The Agent gets rewarded for each good action and gets punished for each bad action.
The Goal of Reinforcement Learning is to maximize the rewards.
In Reinforcement Learning, there is No Labeled Data like Supervised Learning.
The Agents learn from their experiences only.
The Reinforcement Learning Process is similar to a Human Learning.
For example,
A child learns various things by experiences in his day-to-day life.
Reinforcement Learning is to play a game, where the Game is the environment.
Agent movements are scored for the goals.
Agent receives feedback in terms of punishment and rewards.
Categories of Reinforcement Learning
Two Types of Reinforcement learning are,
1) Positive Reinforcement Learning:
Positive Reinforcement Learning specifies increasing the tendency that the required behaviour
would occur again by adding something.
It enhances the strength of the behaviour of the agent and positively impacts it.
2) Negative Reinforcement Learning:
Negative Reinforcement Learning works exactly Opposite to the Positive Reinforcement
Learning.
It increases the tendency that the specific behaviour would occur again by avoiding the negative
condition.
Real-world Use cases of Reinforcement Learning
Video Games:
Reinforcement Learning algorithms are much popular in gaming applications.
It is used to gain super-human performance.
Resource Management:
The Resource Management with Deep Reinforcement Learning shows how to use Reinforcement
Learning in computer to automatically learn and schedule resources to wait for different jobs in
order to Minimize Average Job Slowdown.
Robotics:
Reinforcement Learning is widely used in Robotics applications.
Robots are used in the industrial and manufacturing area.
Robots are made more powerful with Reinforcement Learning.
There are different industries that have their vision of building Intelligent Robots using AI and
Machine Learning Technology.
Text Mining
Text Mining, one of the great applications of NLP.
Natural Language Processing (NLP) is an upcoming technology that derives various forms of AI,
by creating a faultless as well as interactive interface between humans and machines with the
help of Reinforcement Learning.
Advantages and Disadvantages of Reinforcement Learning
Advantages
It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
The learning model of RL is similar to the learning of human beings; hence most accurate results
can be found.
Helps in achieving long term results.
Disadvantage
Reinforcement Learning algorithms are not preferred for simple problems.
Reinforcement Learning algorithms require huge data and computations.
Too much reinforcement learning can lead to an overload of states which can weaken the results.
Problems Not to be Solved by Machine Learning
1) Reasoning Power
Reasoning Power is an area where Machine Learning has not successful.
The feature of Reasoning Power is different when it is compare with human features.
Algorithms available today are mainly oriented towards specific use-cases and are narrowed
down when it comes to applicability.
They cannot think as to why a particular method is happening that way or inspect their own
outcomes.
For instance, if an image recognition algorithm identifies apples and oranges in a given scenario,
it cannot say if the apple (or orange) has gone bad or not, or why is that fruit an apple or orange.
Mathematically, all of this learning process can be explained by us, but from an algorithmic
perspective, the natural property cannot be told by the algorithms or even us.
In other words, Machine Learning algorithms require the ability to reason beyond their future
application.
2) Contextual Limitation
According to NLP algorithms the text and speech information are understandable.
The NLP may learn letters, words, sentences or even the syntax, but where they fall back is the
situation and background of the language.
Algorithms do not understand the context of the language used.
So, ML does not have an overall idea of the situation.
It is limited by mnemonic understandings rather than thinking to see what is actually going on.
3) Scalability
ML implementations being organized on a significant basis, it all depends on data as well as its
scalability.
Data is growing at a huge rate and has many forms which largely affect the scalability of an ML
project.
Algorithms cannot do much about this unless they are updated constantly for new changes to
handle data.
This is where ML regularly requires human involvement in terms of scalability and remains
unsolved mostly.
In addition, growing data has to be deal the right way if shared on an ML platform which again
needs examination through knowledge and feeling lacked by current ML.
4) Regulatory Restriction For Data in ML
ML usually needs considerable amounts (actually, massive) of data in stages such as Training,
Cross-Validation etc.
The data includes private as well as general information which makes more complication.
Most TECH companies have privatized data and these data are the ones which are actually
useful for ML applications.
Sometimes it become failure or risk of the wrong usage of data, especially in critical areas such
as medical research, health insurance etc.,
And also anonymised (not related / identified) at times, it has the possibility of being week and
without protection.
Hence this is the reason regulatory rules are compulsory heavily when it comes to using private
data.
5) Internal Working Of Deep Learning
Nowadays, the Deep Learning (DL) powers applications such as voice recognition, image
recognition and so on through artificial neural networks.
But, the internal working of DL is still unknown and yet to be solved.
Advanced DL algorithms still confuse researchers in terms of its working and efficiency.
Millions of neurons that form the neural networks in DL increase abstraction at every level,
which cannot be understand at all.
This is why deep learning is dubbed a ‘black box’ since its internal agenda is unknown.
APPLICATIONS OF MACHINE LEARNING
Machine learning is growing very
rapidly day by day.
We are using machine learning in our
daily life even without knowing it such as
Google Maps, Google assistant, Alexa, etc.
Real-World Applications of Machine
Learning are,
1) Image Recognition:
Image recognition is one of the most
common applications of machine
learning.
It is used to identify objects, persons, places, digital images, etc.
The popular use case of image recognition and face detection is, Automatic friend tagging
suggestion:
Face book provides us a feature of auto friend tagging suggestion.
Whenever we upload a photo with our Face book friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.
It is based on the Face book project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2) Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and
it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition."
At present, machine learning algorithms are widely used by various applications of speech
recognition.
Google assistant and Alexi… are using speech recognition technology to follow the voice
instructions.
3) Traffic Prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the Correct Path with
the Shortest Route and Predicts the Traffic Conditions.
It predicts the traffic conditions such as whether Traffic is Cleared, Slow-Moving, or Heavily
Congested with the help of two ways:
Real Time location of the vehicle form Google Map APP and Sensors
Average Time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better.
It takes information from the user and sends back to its database to improve the performance.
4) Product Recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies,
etc., and this is also done with the help of machine learning.
5) Self-Driving Cars:
One of the most exciting applications of machine learning is Self-Driving Cars.
Machine learning plays a significant role in self-driving cars.
Tesla, the most popular car manufacturing company is working on self-driving car.
It is using unsupervised learning method to train the car models to detect people and objects while
driving.
6) Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning.
Below are some spam filters used by Gmail:
Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naive Bayes
classifier are used for Email Spam Filtering and Malware Detection.
7) Virtual Personal Assistant:
We have various virtual personal assistants such as Google Assistant, Alexa, Cortana and Siri.
As the name suggests, they help us in finding the information using our voice instruction.
These assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and decode it using
ML algorithms and act accordingly.
8) Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction.
So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round.
For each genuine transaction, there is a specific pattern which gets change for the fraud transaction
hence, it detects it and makes our online transactions more secure.
9) Stock Market Trading:
Machine learning is widely used in stock market trading.
In the stock market, there is always a risk of up and downs in shares, so for this machine
learning's long short term memory neural network is used for the prediction of stock market
trends.
10) Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build 3D models that can predict the
exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
11) Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at
all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as Automatic
Translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm, which
is used with image recognition and translates the text from one language to another language.
LANGUAGES USED IN ML
Artificial Intelligence is a very
important technology to develop and
build new computer programs and
systems, which can be used to
simulate various intelligence
processes like learning, reasoning,
etc.
Some of them are Python, R, Lisp, Java, C++, Julia, and Prolog.
1) Python
Python is one of the most powerful and easy programming languages
developed in the early stage of 1991.
Most of the developers choose Python as their favourite programming language for
developing machine learning solutions.
Python is a user friendly language.
Python is a platform-independent language and also provides an extensive framework for Deep
Learning, Machine Learning, and Artificial Intelligence.
Python is also a portable language as it is used on various platforms such as Linux, Windows, Mac
OS, and UNIX.
Features
It is easy to learn than any other programming language.
It is also a Dynamically-Typed Language.
Python is an Object-Oriented Language.
It provides extensive community support and a framework for ML and DL.
It is Open-Source software.
Large standard sets of libraries.
Interpreted language.
Python is an ideal programming language used for Machine Language, Natural Processing
Language (NLP), and Neural networks, etc.
2) Java
Java is the most widely used programming language by all developers
and programmers to develop machine learning solutions.
Java is also a platform-independent language as it can also be easily
implemented on various platforms.
Java is an object-oriented and scalable programming language.
Java allows virtual machine technology that helps to create a single version of the app and provides
support to your business.
The best thing about Java is once it is written and compiled on one platform, then you do not need to
compile it again and again.
This is known as WORA (Once Written Read/Run Anywhere) principle.
Features of Java
Portability
Cross-platform.
Easy to learn and use.
Easy-to-code Algorithms.
Built-in garbage collector.
Standard Widget Toolkit.
Simplified work with large-scale projects.
Better user interaction.
Easy to debug.
3) Prolog
Prolog is one of the oldest programming languages used for Machine Learning solutions.
Prolog stands for "Programming in Logic",
It is developed by French Scientist Alain Colmerauer in 1970.
Prolog is a declarative language rather than imperative.
Features of Prolog
Supports basic mechanisms such as
Pattern Matching,
Tree-based data structuring, and automatic backtracking.
4) Lisp
Lisp is widely used for scientific research in the fields of natural languages, theorem proofs, and to
solve artificial intelligence problems.
Lisp was originally created as a practical mathematical notation for programs.
Lisp programming language is the Second Oldest Language after
FORTRAN; it is still being used because of its Crucial Features.
LISP programming was invented by John McCarthy.
LISP is one of the most efficient programming languages for solving specific problems.
It is mainly used for machine learning and logic problems.
It has also influenced the creation of other programming languages for AI, like R and Julia.
It is so flexible; but it is not user friendly and also a lack of well-known libraries, syntax, etc.
Due to this reason, it is not preferred by the programmers.
Features of LISP
The program can be easily modified, similar to data.
Make use of recursion for control structure rather than iteration.
Garbage Collection is necessary.
We can easily execute data structures as programs.
An object can be created dynamically.
5) R
R is one of the great languages for statistical processing in programming.
R supports free, open-source programming language for data analysis
purposes.
It may not be the perfect language for AI, but it provides great performance while dealing with large
numbers.
R contains several packages that are specially designed for AI, which are:
Gmodels:
This package provides different tools for the Model Fitting Task.
TM :
It is a great framework that is used for text mining applications.
RODBC:
It is an ODBC interface.
OneR:
This package is used to implement the One Rule Machine Learning classification
algorithm.
Features of R programming
R is an open-source programming language, which is free of cost, and also you can add packages
for other functionalities.
R provides strong & interactive graphics capability to users.
It enables you to perform complex statistical calculations.
It is widely used in machine learning and AI due to its high-performance capabilities.
6) Julia
Julia is one of the newer languages on the list and was created to focus on performance computing in
scientific and technical fields.
Julia includes several features that directly apply to AI programming.
Julia is a comparatively new language, which is mainly suited for
numerical analysis and computational science.
Features of Julia
Common numeric data types.
Arbitrary precision values.
Robust mathematical functions.
Built-in package manager.
Dynamic type system.
Ability to work for both parallel and distributed computing.
Macros and Meta programming capabilities.
Support for multiple dispatches.
Support for C functions.
7) C++
C++ language has been present for so long around, but still being a top
and popular programming language among developers.
It provides better handling for AI models while developing.
Although C++ may not be the first choice of developers for AI
programming, various machine learning and deep learning libraries are written in the C++ language.
Features of C++
C++ is one of the fastest languages, and it can be used in statistical techniques.
It can be used with ML algorithms for fast execution.
Most of the libraries and packages available for Machine learning and AI are written in C++.
It is a user friendly and simple language.
MACHINE LEARNING TOOLS
Machine learning is one of the most revolutionary technologies that are making lives simpler.
It is a subfield of Artificial Intelligence, which analyses the data, build the model, and make
predictions.
There are different tools, software, and platform available for machine learning and also new
software and tools are evolving day by day.
Machine learning tools, choosing the best tool per your model is a challenging task.
If you choose the right tool for your model, you can make it faster and more efficient.
1) TensorFlow
Tensor Flow is one of the most popular open-source libraries used to train and
build both machine learning and deep learning models.
It is developed by Google Brain Team.
It is much popular among machine learning enthusiasts, and they use it for
building different ML applications.
It offers a powerful library, tools, and resources for numerical computation, specifically for large
scale machine learning and deep learning projects.
It enables data scientists/ML developers to build and organize machine learning applications
efficiently.
Features:
Tensor Flow enables us to build and train our ML models easily.
It also enables you to run the existing models using the TensorFlow.js
It provides multiple abstraction levels that allow the user to select the correct resource as per the
requirement.
It helps in building a neural network.
This is open-source software and highly flexible.
It also enables the developers to perform numerical computations using data flow graphs.
Run-on GPUs and CPUs, and also on various Mobile Computing Platforms.
GPUs can process many pieces of data simultaneously, making them useful for machine learning,
video editing, and gaming applications.
It enables to easily organize and training the model in the cloud.
2) PyTorch
PyTorch is an open-source machine learning framework, which is
based on the Torch library.
This framework is free and open-source and developed by FAIR (Face
book’s AI Research lab).
It is one of the popular ML frameworks, which can be used for various applications, including
computer vision and natural language processing.
PyTorch has Python and C++ interfaces; however, the Python interface is more interactive.
Different deep learning software is made up on top of PyTorch, such as PyTorch Lightning, Hugging
Face's Transformers (it use state-of-the-art), Tesla autopilot, etc.
It supports GPU.
Features:
It is more suitable for deep learning researches with good speed and flexibility.
It can also be used on cloud platforms.
It includes tutorial courses, various tools, and libraries.
It also provides a dynamic computational graph that makes this library more popular.
It allows changing the network behaviour randomly without any delay.
It is freely available.
3) Google Cloud ML Engine
It is a hosted platform where ML developers and data scientists build and run optimum quality
machine, learning models.
It provides a managed service that allows developers to easily create ML
models with any type of data and of any size.
Features:
Provides machine learning model training, building, deep learning and
predictive modeling.
The Two Services, namely, Prediction and Training, can be used independently or combinedly.
It can be used by enterprises, i.e., for identifying clouds in a satellite image, responding faster to
emails of customers.
It can be widely used to train a complex model.
4) Amazon Machine Learning (AML)
Amazon Machine Learning (AML) is a cloud-based and robust machine
learning software application, which is widely used for building machine
learning models and making predictions.
It integrates data from multiple sources, including Red shift, Amazon S3, or
RDS.
Features
Enables the users to identify the patterns, build mathematical models, and
make predictions.
It provides support for three types of models,
Multi-Class Classification
Binary Classification
Regression.
It permits users to import the model into or export the model out from Amazon Machine
Learning.
It also provides core concepts of machine learning, including ML models, Data sources,
Evaluations, Real-time predictions and Batch predictions.
It enables the user to retrieve predictions with the help of batch APIs for bulk requests or real-
time APIs for individual requests.
5) Google ML kit for Mobile
For Mobile app developers, Google brings ML Kit, which is packaged
with the expertise of machine learning and technology to create more
robust, optimized, and personalized apps.
This tools kit can be used for face detection, text recognition, landmark
detection, image labeling, and barcode scanning applications.
Features:
The ML kit is optimized for mobile.
It provides easy-to-use APIs that enables powerful use cases in your mobile apps.
It includes Vision API and Natural Language APIS to detect faces, text, and objects, and identify
different languages & provide reply suggestions.
DATA PRE-PROCESSING
Data Preprocessing is a process of preparing the raw data and making it suitable for a machine
learning model.
It is the first and crucial step while creating a machine learning model.
When creating a machine learning project, operation with data, it is mandatory to clean [noisy,
missing values and unusable format] it and put in a formatted way.
Data preprocessing is required tasks for cleaning the data and making it suitable for a machine
learning model which also increases the accuracy and efficiency of a machine learning model.
It involves below steps:
1) Getting the Dataset
2) Importing Libraries
3) Importing Datasets
4) Finding Missing Data
5) Encoding Categorical Data
6) Splitting Dataset into Training and Test Set
7) Feature Scaling
1) Get the Dataset
To create a machine learning model, the collected data for a particular problem in a proper
format is known as the Dataset.
For Example,
Each dataset is different from another dataset.
To use the dataset in our code, we usually put it into a CSV file.
Sometimes, we need to use an HTML or xlsx file.
Dataset may be of different formats for different purposes, such as, if we want to create a
machine learning model for business purpose, then dataset will be different with the dataset
required for a patient.
2) Importing Libraries
To perform data preprocessing using Python, we need to import some predefined Python
libraries.
These libraries are used to perform some specific jobs.
There are three specific libraries that we will use for data preprocessing, which are:
Numpy:
It includes mathematical
operation [Fundamental
Package for Scientific
Calculation]
Matplotlib:
In Python 2D plotting library, used to plot any type of charts.
Pandas:
It is used for importing and managing the datasets.
It is an open-source data manipulation and analysis library.
3) Importing the Datasets
To import the datasets which we have collected for our machine learning project.
But before importing a dataset, we have to set the current directory as a working directory.
The following steps are used to set a working directory in Spyder IDE.
To save your Python file in the directory this contains dataset.
Go to File explorer option in Spyder IDE, and select the required directory.
Click on F5 button or run option to execute the file.
4) Handling Missing data:
The next step of data preprocessing is to handle missing data in the datasets.
If our dataset contains some missing data, then it may create a huge problem for our machine
learning model.
Two Different Ways to handle Missing Data:
A) By Deleting the Particular Row:
This stage commonly deals with null values.
To delete the specific row or column this consists of null values, but it is not an efficient way
and removing data may lead to loss of information which will not give the accurate output.
B) By Calculating the Mean:
To solve the above problem, calculate the mean of that column or row which contains any
missing value and will put it on the place of missing value.
This strategy is useful for the features which have numeric data such as age, salary, year, etc.
5) Encoding Categorical Data:
In a dataset, the Categorical Data is data which has some categories such as, Country, and
Purchased.
Machine Learning Model works on mathematics and numbers, so it is difficult to build a model.
To solve it is necessary to encode these categorical variables into numbers.
6) Splitting the Dataset into the Training set and Test set
In Machine Learning Data Preprocessing, we divide our dataset into a training set and test set.
Training dataset is completely different test dataset.
If we train our model very well and its training accuracy is also very high, but we provide a new
dataset to it, then it will decrease the performance.
To make easier performs well with the training set and also with the test dataset.
Training Set:
A subset of dataset to train the
machine learning model, the output
of the model will be known
already.
Test Set:
A subset of dataset to test the machine learning model, by which the test set, model
predicts the output.
During the first stage, split the arrays of the dataset into random train and test subsets.
In the second stage further classified into,
x_train : Features for the Training Data
x_test : Features for Testing Data
y_train : Dependent Variables for Training Data
y_test : Independent Variable for Testing Data
7) Feature Scaling
Feature scaling is the final step of data preprocessing in machine learning.
It is a technique to standardize the independent variables of the dataset in a specific range.
In feature scaling, we put our variables in the same range and in the same scale so that no any
variable dominate the other variable.
Unit : I completed
UNIT - II
MODEL EVALUATION AND FEATURE ENGINEERING
Model Selection - Training Model - Model Representation and Interpretability – Evaluating Performance of
a Model - Improving Performance of a Model - Feature Engineering: Feature Transformation - Feature
Subset Selection.
UNIT - II
MODEL EVALUATION AND FEATURE ENGINEERING
MODEL SELECTION
Model Selection in Machine Learning is the process of choosing the best suited model for a
particular problem.
Selecting a Model depends on various factors such as the dataset, task, nature of the model etc.
In Machine Learning, Different types of model are available.
Some of them are, Logistic Regression , K – Means Clustering , Neural Network , Support Vector
Machine , Hierarchical Clustering , Decision Tree , Random Forest …
Logistic Regression – Supervised Learning
Logistic Regression is used for predicting the categorical dependent variable using a given set of
independent variables.
Logistic Regression predicts the output of a
categorical dependent variable.
Categorical or Discrete value can be either
Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie
between 0 and 1.
K – Means Clustering – Unsupervised
Learning
It is an iterative algorithm that divides the
unlabeled dataset into k different clusters in
such a way that each dataset that belongs
only one group with similar properties.
Neural Network – Deep Learning
A neural network is a method in machine
learning that teaches computers to process data
in a way that is inspired by the human brain.
It is a type of machine learning process, called
Deep Learning that uses interconnected nodes
or neurons in a layered structure that resembles
the human brain.
Model Selection in Machine Learning is based on:
Data
Task
1) Based on Types of Data
If we have Different Types of Data, choose a Specific Model Based on Data that we have.
CNN Model :
a) Used For processing data that has a grid pattern, such as images…
TRAINING MODEL
Training a model simply means learning (determining) good values for all the weights and the bias
from labeled examples.
In supervised learning, a machine learning
algorithm builds a model by examining many
examples and attempting to find a model that
minimizes loss; this process is called
Experiential Risk Minimization.
How to Train For a Model
Training Machine Learning Models starts
after the Machine learning model was built.
It is trained in order to get the appropriate
results.
To train a machine learning model, one
needs a huge amount of pre-processed data.
The Pre-Processed data means data in structured form with reduced null values, etc.
The best model will chosen depends on associated attributes, the volume of the available dataset,
the number of features, complexity, etc.
However, in practice, it is recommended that we always start with the simplest model that can be
applied to the particular problem and then gradually enhance the complexity & test the accuracy
with the help of parameter tuning and cross-validation.
MODEL REPRESENTATION AND INTERPRETABILITY
MODEL REPRESENTATION
Models are representations of a selected part or aspect (often referred to as target system, parent
system, original, or prototype) of the external world.
This is the model's target system.
Model Representation of the data is to provide a useful view point into the data's key qualities.
In order to train a model, choose the best set of data with features to represent.
How to Represent a Model ( Conversion of Raw Data into Feature Vector)
Feature
Engineering
means
transforming raw
data into a feature
vector.
UNIT: II COMPLETED
UNIT: III- BAYESIAN LEARNING
Basic Probability Notation - Inference – Independence - Bayes‟ Rule - Bayesian Learning: Maximum
Likelihood and Least Squared Error Hypothesis - Maximum Likelihood Hypotheses For Predicting
Probabilities - Minimum Description Length Principle - Baye's Optimal Classifier - Naïve Baye's
Classifier - Bayesian Belief Networks - EM Algorithm.
Introduction
UNIT: III
TYPES OF PROBABILITY
Conditional Probability
The Conditional Probability is the probability of one event given the occurrence of another
event, oftendescribed in terms of events a and b from two dependent random variables e.g. x
and y.
If A and B are dependent events. Conditional probability of A given B means the
probability ofoccurrence of A when the event B has already happened. It is denoted by P(A/B)
and is defined by
𝐏𝐏(𝐀𝐀/𝐁𝐁) =𝐏𝐏(𝐀𝐀 ∩ 𝐁𝐁)
𝐏𝐏(𝐁𝐁)
, 𝐢𝐢𝐢𝐢 𝐏𝐏(𝐁𝐁) ≠ 𝟎𝟎
,
𝐢𝐢𝐢𝐢 𝐏𝐏(𝐀𝐀) ≠ 𝟎𝟎
Joint probability is a statistical measure that calculates two events occurring together and at the
same point in time. These two events are usually coined event A and event B, and can formally
be written as:p(A and B).‟
The joint probability distribution (or "joint" for short), which completely specifies an agent's
probabilityassignments to all propositions in the domain (both simple and complex).
Example:
A, B =Two different events that intersect P (A and B)
The joint probability of A and B, {P (A B )= P (A⋂B) }
AXIOMS OF PROBABILITY
Example:
INDEPENDENCE
BAYES' THEOREM:
Bayes' Theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines theprobability of an event with uncertain knowledge.
In Probability Theory, it relates the conditional probability and marginal probabilities of two
randomevents.
Bayes' Theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference isan application of Bayes' theorem, which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' Theorem allows updating the probability prediction of an event by observing new
information ofthe real world.
Example: If cancer corresponds to one's age then by using Bayes' Theorem, we can
determine theprobability of cancer more accurately with the help of age.
Bayes' Theorem can be derived using product rule and conditional probability of event A with
knownevent B:
As from product rule we can write:
P(A ⋀ B )= P(A|B) P(B) or
Similarly, the probability of event B with known event A: P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of
both theequations, we will
get:
Where A1, A2, A3... An is a set of mutually exclusive and exhaustive events.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and
want todetermine the fourth one.
Suppose we want to perceive the effect of some unknown cause, and want to compute
that cause,then the Bayes' rule becomes:
Example-1:
Question: From a standard deck of playing cards, a single card is drawn. The probability that
the card is king is 4/52, then calculate posterior probability P(King|Face), which means the
drawn face card is a king card.
Given Data:
BAYESIAN LEARNING
Introduction
Bayesian Learning views the problem of constructing hypotheses from data as a sub
problem of themore fundamental problem of making predictions.
The idea is to use hypotheses as intermediaries between data and predictions.
First, the probability of each hypothesis is estimated, given the data.
Predictions are then made from the hypotheses, using the posterior probabilities of the
hypotheses toweight the predictions.
For Example,
Consider the problem of predicting tomorrow's weather.
Suppose the available experts are divided into two camps: some suggest model A, and
somesuggest model B. The Bayesian method, rather than choosing between A and B,
gives someweight to each based on their likelihood.
The likelihood will depend on how much the known data support each of the two models.
MAXIMUM LIKELIHOOD HYPOTHESIS
Introduction
Science involves the creation of hypothesis (or theories), and the testing of those theories by
comparingtheir predictions with experimental observations.
In many cases, the conclusions of an experiment are obvious – the theory is supported or disproven.
The uncertainties data determines whether a hypothesis is right or wrong – only how likely it
is to beright: “Probability of the Hypothesis”.
In order to do this, our hypothesis must be detailed enough for us to work out how likely we
would havebeen to get the results we observe, assuming that the hypotheses is true.
Let X1, X2, X3, .............. Xn be independent and identically distributed random sample drawn from a
population having the probability density function f(X, ), then the joint density of X1, X2, X3,…
……..Xn is given by
L( Ɵ ) = f(X1, Ɵ), f(X2, Ɵ), f(X3, Ɵ) ..............f(Xn, Ɵ)
n
π f (xi,Ɵ )
L( Ɵ ) = i=1
is known as Maximum likelihood function and it is denoted by „L‟.
The Least Squares Method is a statistical procedure to find the best fit for a set of data points by
minimizing the sum of the offsets or residuals of points from the plotted curve.
Least Squares Estimates are calculated by fitting a regression line to the points from a data set
that hasthe minimal sum of the deviations squared (least square error).
Least Squares Regression is used to predict the behavior of dependent variables.
Least-Squared Error Hypotheses are generated by adding random noise to the true target
value, wherethis random noise is drawn independently for each example from a Normal
distribution with zero mean.
Maximizing the Likelihood Function determines the parameters that are most likely to
produce theobserved data.
Maximum likelihood estimation involves defining a likelihood function for calculating the
conditionalprobability of observing the data sample given a probability distribution and
distribution parameters. This approach can be used to search a space of possible distributions
and parameters.
Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that
maximizesthe likelihood P (data | p).
MLE is the technique which helps us in determining the parameters of the distribution that best
describethe given data.
These values are a good representation of the given data but may not best describe the
population. Wecan use MLE in order to get stronger parameter estimates.
Bayesian belief network is key computer technology for dealing with probabilistic events and to
solve aproblem which has uncertainty.
"A Bayesian network is a probabilistic graphical model which represents a set of variables
and theirconditional dependencies using a Directed Acyclic Graph."
It is also called a Bayes Network, Belief Network, Decision Network, or Bayesian Model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution,and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks including
prediction, anomalydetection, diagnostics, automated insight, reasoning, time series prediction,
and decision making underuncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists oftwo parts:
1) Directed Acyclic Graph
2) Table of Conditional Probabilities.
The generalized form of Bayesian network that represents and
solve decision problems under uncertainknowledge is known as
an Influence diagram.
A Bayesian network graph is made up of nodes
and Arcs(directed links), where:
Each node corresponds to the random variables, and a variable can be continuous or discrete.
Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other.
EM – ALGORITHM
Algorithm:
How it works:
In this step, we use the observed data in order to estimate or guess the values of
the missing or incomplete data.
The next step is known as “Maximization”-step or M-step. In this step, we use the complete
data generated in the preceding “Expectation” – step in order to update the values of the
parameters. It isbasically used to update the hypothesis.
Now, in the fourth step, it is checked whether the values are converging or not, if yes,
then stopotherwise repeat step-2 and step-3 i.e. “Expectation” – step and “Maximization”
– step until theconvergence occurs.
Flow chart for EM algorithm
Usage of EM Algorithm
Advantages of EM Algorithm
Disadvantages of EM ALGORITHM
UNIT: IV
Introduction
Assumptions can greatly simplify the learning process, but can also limit what can be
learned.Algorithms that simplify the function to a known form are called parametric
machine learningalgorithms.
Examples for the Parametric Machine Learning Algorithms are ,
1) Logistic Regression
2) Linear Discriminant Analysis
3) Perceptron
4) Naive Bayes
5) Simple Neural Networks
LOGISTIC REGRESSION
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, whichpredicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as whether the
cells arecancerous or not, a mouse is overweight or not based on its weight, etc.
Another way, we are modeling the probability that an input (X) belongs to the default class
(Y=1), wecan write this formally as:
P(X) = P( Y=1 | X)
It is one of the simplest ML algorithms that can be used for various classification problems such
as spamdetection, Diabetes prediction, cancer detection etc.
Generally, Logistic Regression means Binary Logistic Regression having binary target
variables, butthere can be two more categories of target variables that can be predicted by it.
Based on those numbers of categories, it is classified into 3 types,
1) Binary or Binomial
In such a kind of classification, a dependent variable will have only two possible types either 1
or 0.
For Example, these variables may represent success or failure, Yes or No, Win or Loss, Pass or
Fail.
2) Multinomial
In this type, the dependent variable can have 3 or more possible Un-Ordered types or
quantitativesignificance.
For Example, these variables may represent “Poor” or “Good”, “Very Good”, “Excellent”
and eachcategory can have the scores like 0,1,2,3, "Low", "Medium", or "High".
Before diving into the implementation of logistic regression, we must be aware of all the
assumptions in that,
In binary logistic regression, the target variables must be binary always and the
desiredoutcome is represented by the factor level 1.
The independent variable should not have Multi – Collinearity, which means the
independent variables must be independent of each other.
We should choose a large sample size for logistic regression.
TYPES OF CLASSIFICATION
1) Binary Classification
2) Multi-Class Classification
3) Multi-Label Classification
4) Imbalanced Classification
Binary Classification:-
Multi-Class Classification
It is used when there are three or more classes and the data
we want to classify belongs exclusively to one of those
classes.
For Example, to classify a set of images of fruits which
may be oranges, apples, or pears?
Multi-class classification makes the assumption
that each sample is assigned to one and only one
label: a fruit can beeither an apple or a pear but not
both at the same time.
Many algorithms used for binary classification can be used for multi-class classification.
Popularalgorithms that can be used for multi-class classification include:
K-Nearest Neighbors.
Decision Trees.
Naive Bayes.
Random Forest.
Gradient Boosting.
Instead, heuristic methods can be used to split a multi-class classification problem into
multiple binaryclassification datasets and train a binary classification model each.
Two examples of these heuristic methods include:
1) One-vs-Rest (OvR)
2) One-vs-One (OvO)
One-vs-Rest (OvR)
One-vs-One (OvO)
Multi-Label Classification
COST FUNCTION
A Cost Function is a mechanism utilized insupervised machine learning, the cost function
returns the error between predicted outcomes compared with the actual outcomes.
Naive Bayes [NB] loss function is
defined as theerror for one sample,
whereas the cost function isthe average
loss across a number of samples in a
given dataset.
Loss functions measure how far an
estimated valueis from its true value.
A loss function maps decisions to their
associatedcosts.
Loss functions are not fixed, they change depending on the task in hand and the goal to be met.
Loss function is used to minimize the error in the algorithm while computing the errors in
the givendatasets.
It is used to quantify how good or bad the model is performing.
It is divided into two categories
1) Regression Loss
2) Classification Loss.
a) Binary Classification Cost Functions
b) Multi-Class Classification Cost Functions
A Loss Function is a measure of how good a prediction model does in terms of being able to
predict theexpected outcome.
Regression models deal with predicting a continuous value for example salary of an employee,
price of acar, loan prediction, etc.
A cost function used in the Regression Problem is called “Regression Cost Function”.
The calculation is based on the Distance-Based Error.
The corresponding cost function is the Mean of this Squared Errors (MSE).
The MSE Loss function penalizes the model
for making large errors by squaring them
and this property makes the MSE cost
function less robust tooutliers.
Therefore, it should not be used if the data is
prone tomany outliers.
The MSE function where the true target value is 100,and the
predicted values range between - 10,000 to 10,000.
The MSE loss (Y-axis) reaches its minimum value at prediction (X-axis) = 100.
The range is 0 to ∞.
MSE = (Sum of Squared Errors) / n.
2) Mean Absolute Error (MAE)
It is another loss function used for regression models.
MAE is the sum of absolute
differences between our
target and predicted
variables, where MSE loss
function is defined as the
average ofabsolute
differences between the
actual and the predicted
value.
It measures the average
magnitude oferrors in a set of
predictions, without
considering their directions.
It is also called as Mean Bias Error (MBE), which is a sum of residuals/errors).
The range is also 0 to ∞.
2) Hinge Loss.
It is also known as Multi class SVM Loss.
In simple terms, the score of correct category should be greater than sum of scores of all
incorrectcategories by some safety margin (usually one).
Hinge loss is applied for maximum-margin classification, highly for support vector machines.
It is a convex function used in convex optimizers.
GRADIENT DESCENT
Stochastic gradient descent refers to calculating the derivative from each training data
instanceand calculating the update immediately.
The parameters are being updated even after one iteration in which only a single
example hasbeen processed.
It is quite faster than batch gradient descent.
When the number of training examples is large, even then it processes only one
example which can be additional overhead for the system as the number of iterations
will be quite large.
β^=argminβL(β,X,Y)
Regularization theory adds a second term to this optimization problem, which we will term R:
β^=argminβL(β,X,Y)+λR(β)
RL1=∑i=0m|βi|RL2=∑i=0mβi2
Overfitting means that the noise [Irrelevant or Meaningless] or random variations in the
training data ispicked up and learned as concepts by the model.
Over fitting is a modeling error which occurs when a function is too closely fit to a limited set
of datapoints.
Under fitting refers to a model that can neither model the training data nor generalize to new data.
Under fitting occurs when a statistical
model ormachine learning algorithm
cannot capture the underlying trend of
the data.
Under fitting occurs when the
model or thealgorithm does not fit
the data well enough.
This problem can be addressed by pruning a tree after it has
learned in order to remove some of the detail it has picked up.
Note: Pruning is a technique that reduces the size of decision trees by removing sections of the tree
that are non-critical and redundant.
Under fitting occurs if the model or algorithm shows low variance but high bias.
Over fitting occurs if the model or algorithm shows high variance but low bias.
Over fitting arises when a model tries to fit the training data so well that it cannot generalize
to newobservations.
Under fitting models do not generalize well to both training and test data sets.
PERCEPTRON
A perceptron works by taking in some numerical inputs along with what is known as weights and a
bias.
The multiplies these inputs with the respective weights (this is known as the weighted sum).
These products are then added together along with the bias.
The activation function takes the weighted sum and the bias as inputs and returns a final output.
A perceptron consists of four parts:
1) Input values
2) Weights and a bias,
3) A weighted sum,
4) Activation function.
Assume we have a single neuron and threeinputs x1, x2, x3 multiplied by the weights
w1, w2, and w3 respectively.
This function is called the weighted sum because it is the sum of the weights and inputs.
Here the outputs to falls in between the range 0 to 1.
NEURAL NETWORKS
Neural networks are artificial systems that were inspired by biological neural networks.
These systems learn to perform tasks by being exposed to various datasets and examples
without anytask-specific rules.
The idea is that the system generates identifying characteristics from the data they have been
passedwithout being programmed with a pre-programmed understanding of these datasets.
Neural networks are based on computational models for threshold logic.
Threshold logic is a combination of algorithms and mathematics.
Neural networks are based either on the study of the brain or on the application of neural
networks toartificial intelligence.
The work has led to improvements in finite automata theory.
Neural networks learn via supervised learning; supervised machine learning involves
an inputvariable x and output variable y.
The algorithm learns from a training dataset. With each correct answer, algorithms
iteratively makepredictions on the data.
The learning stops when the algorithm reaches an acceptable level of performance.
Unsupervised machine learning has input data X and no corresponding output variables.
The goal is to model the underlying structure of the data for understanding more about the data.
The keywords for supervised machine learning are classification and regression.
For unsupervised machine learning, the keywords are clustering and association
Input:
Output:
0.990628
0.984596
0.994117
Multi-layer Neural Networks
Multiclass classification means a classification problem where the task is to classify between
more thantwo classes.
Multiclass
classification is a
popular problem in
supervisedmachine
learning.
A classification task with
more than two classes;
e.g., classify a set of
images of fruits which
maybe oranges, apples,
or pears.
Multi-class classification makes the assumption that each sample is assigned to one and only one
label: afruit can be either an apple or a pear but not both at the same time.
Imbalanced dataset refers to a problem with classification problems where the classes
are notrepresented equally.
In a multiclass classification, we train a classifier using our training data, and use this
classifier forclassifying new examples.
Load dataset from source and also split the dataset into “training” and “test” data.
In machine learning, multiclass or multinomial classification is the problem of classifying
instances intoone of three or more classes (classifying instances into one of two classes is called
binary classification)
Popular Algorithms that can be used for Multi-Class Classification include:
1) k-Nearest Neighbors.
2) Decision Trees.
3) Naive Bayes.
4) Random Forest.
5) Gradient Boosting.
BACK PROPAGATION
Static Back-Propagation:
It is one kind of backpropagation network which produces a mapping of a static input for static
output.
It is useful to solve static classification issues like optical character recognition.
Recurrent Back Propagation:
Recurrent backpropagation is fed forward until a fixed value is achieved. After that, the
error iscomputed and propagated backward.
The main difference between both of these methods is: that the mapping is rapid in
static back-propagation while it is non static in recurrent backpropagation.
The actual performance of backpropagation on a specific problem is dependent on the input data.
Backpropagation can be quite sensitive to noisy data
ACTIVATION FUNCTION
Activation functions are mathematical equations that determine the output of a neural network.
The activation function is a Non-Linear Transformation that we do over the input before
sending itto the next layer of neurons or finalizing it as output.
An activation function is a very important feature of an artificial neural network, they basically
decidewhether the neuron should be activated or not.
In a neural network, numeric data points, called inputs, are fed into the neurons in the
inputlayer.
Each neuron has a weight, and multiplying the input number with the weight gives the
output ofthe neuron, which is transferred to the next layer.
In artificial neural networks, the activation function defines the output of that node given an
input orset of inputs.
Modern neural networks use a technique called back propagation to train the model,
whichplaces an increased computational strain on the activation function, and its
derivative function.
The activation function is a mathematical “gate” in between the input feeding the
currentneuron and its output going to the next layer.
It can be as simple as a step function that turns the neuron output on and off,
depending on arule or threshold.
It can be a transformation that maps the input signals into output signals that are needed
for theneural network to function.
Neural Networks use non-linear activation functions, which can help the network learn
complex data, compute and learn almost any function representing a question, and
provide accurate predictions.
S-Shape.
Parametric ReLU
Parameterized ReLU or
Parametric ReLU activation
function is a variant of ReLU. It
is similar to Leaky ReLU [Leaky
ReLU function is an improved
version of the ReLU activation
function], with a slight change in
dealingwith negative input
values where the negative part of
the function adaptively learns
during the training phase.
If then PReLU becomes ReLU. While the positive part is linear.
DROPOUT AS REGULARIZATION
Unit: IV Completed
UNIT: V - NON PARAMETRIC MACHINE LEARNING
SYLLABUS
UNIT: V
Non-Parametric methods seek to best fit the training data in constructing the mapping function.
Ability to generalize to unseen data.
Able to fit a large number of functional forms from the training data.
Do not make strong assumptions [makes predictions based on the training patterns for a
new datainstance] about the form of the mapping function
1) k-Nearest Neighbors
2) Decision Trees like CART…
3) Support Vector Machines
1) More Data: Require a lot more training data to estimate the mapping function.
2) Slower: A lot slower to train as they often have far more parameters to train.
3) Overfitting: More of a risk to overfit the training data and it is harder to explain why
specificpredictions are made.
K-Nearest Neighbors is one of the simplest Machine Learning algorithms based on Supervised
Learningtechnique.
K-NN algorithm assumes the similarity between the new case/data and available cases and put
the newcase into the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. Thismeans when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems.
K-NN is a Non-Parametric Algorithm, which means it does not make any assumption on
underlyingdata.
It is also called A Lazy Learner Algorithm because it does not learn from the training set
immediatelyinstead it stores the dataset and at the time of classification, it performs an action on
the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifiesthat data into a category that is much similar to the new data.
Example:
We have an image of a creature that looks similar to cat and dog,
We want to know either it is a cat or dog.
By using the KNN
algorithm the identification
works on a similarity
measure.
Our KNN model, finds the similar
Features of the new data set to the cats and dogs images.
Based on the most similar features it will put it in either cat or dog category.
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
Step-6: Our model is ready.
By calculating the Euclidean distance we got
the nearest neighbors, as three nearest
neighbors in category A and two nearest
neighbors in category B.
By considering all the 3 nearest neighbors are
from category A, hence this new data point
must belong tocategory A.
DECISION TREES
Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).Step-3: Divide the S into subsets that contains possible values
for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continuethis process until a stage is reached where you cannot further classify the nodes and called
the final nodeas a leaf node.
Example:
BRANCHING
Branching algorithm consists of multiple branching rules and in each node of the search
tree oneof these rules is selected to be applied based on various criteria.
The algorithm is best known for the fact that it solves the 3-Satisfiability problem in time.
In logic and computer science, the Boolean satisfiability problem (sometimes called
propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT)
is the problem of determining if there exists an interpretation that satisfies a given
Boolean formula. .
In contrast, “AND, NOT “is unsatisfiable.
Satisfiable: If the Boolean variables can be assigned values such that the formula turns
out to beTRUE, then we say that the formula is satisfiable.
Un-Satisfiable : If it is not possible to assign such values, then we say that the
formula isunsatisfiable.
GREEDY ALGORITHM
Greedy algorithms work by recursively constructing a set of objects from the smallest
possibleconstituent parts.
Recursion is an approach to problem solving in which the solution to a particular problem
depends onsolutions to smaller instances of the same problem.
Greedy is an algorithmic model that builds up a solution piece by piece, always choosing the
next piecethat offers the most clear and immediate benefit.
Where the problems choosing locally optimal also leads to global solution are best fit for Greedy.
The advantage to using a greedy algorithm is that solutions to smaller instances of the
problem can bestraightforward and easy to understand.
The disadvantage is that it is entirely possible that the most optimal short-term solutions may
lead to theworst possible long-term outcome.
Greedy algorithms can be used for optimization purposes or finding close to optimization in
case of NPHard [Non-Deterministic Polynomial Time, meaning that provably solving it in
polynomial time would also solve thousands of open problems that have been open for decades.
MULTIPLE
BRANCHES6 MAJOR BRANCHES OF ARTIFICIAL
INTELLIGENCE (AI)
1) Machine learning
Machine Learning is the technique
that gives computers the potential to
learn without beingprogrammed, and
it is classified into,
a) Supervised Learning:
b) Unsupervised Learning:
c) Reinforcement Learning:
2) Neural Network
Neural network replicates the human brain where thehuman brain comprises an infinite number
of neurons and to code brain-neurons into a system or a machine is what the neural network
functions.
3) Robotics
4) Expert Systems
An expert system refers to a computer system that mimics the decision-making intelligence of a
humanexpert.
The key features of expert systems include extremely responsive, reliable, understandable
and highexecution.
5) Fuzzy Logic
Fuzzy logic is a technique that represents and modifies uncertain information by measuring the
degree towhich the hypothesis is correct.
6) Natural Language Processing
NLP is a method that deals in searching, analyzing, understanding and deriving information
from thetext form of data.
CONTINUOUS ATTRIBUTE
Attribute is a data field that represents the characteristics or features of a data object.
For a customer, object attributes can be customer Id, address, etc.
A set of attributes used to describe a given object are known as Attribute Vector or Feature
Vector.
Continuous Attribute:
It is a Quantitative type.
Data have an infinite no of
states.Continuous data is of
float type.
There can be many values
between 2 and 3
PRUNING
The process of adjusting Decision Tree to minimize “misclassification error” is called pruning.
Tree pruning is performed in order to remove anomalies in the training data due to noise or outliers.
The pruned trees are smaller and less complex.
Types of Pruning:-
Ensemble learning refers to algorithms that combine the predictions from two or more models.
Ensemble methods, which combines several decision trees to produce better predictive
performance thanutilizing a single decision tree.
A random forest algorithm consists of many decision trees.
The main principle behind the ensemble model is that a group of Weak Learners come together
to forma Strong Learner.
BOOSTING
The basic principle behind the working of the boosting algorithm is to generate multiple weak
learnersand combine their predictions to form one strong rule.
After multiple iterations, the weak learners are combined to form a strong learner that will
predict amore accurate outcome.
1) Bagging
2) Boosting
Bagging
ADABOOST ALGORITHM
AdaBoost algorithm, short for Adaptive Boosting, is a Boosting technique used as an Ensemble
Methodin Machine Learning.
It is called Adaptive Boosting as the weights are re-assigned to each instance, with higher
weightsassigned to incorrectly classified instances.
AdaBoost learns from the mistakes by increasing the weight of misclassified data points.
Step 2: Calculate the weighted errorrate of the decision tree. Higher the weight, the
more the corresponding error will be weighted.
GRADIENT BOOSTING
Step 4: Repeat Step 1 (until the number of trees we set to train is reached)
The Gradient Boosting makes a new prediction by simply adding up the predictions (of all
trees).
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used
for bothclassification and regression challenges.
It is mostly used in classification problems.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n- dimensional space into classes so that we can easily put the new data point in the correct
category in thefuture. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed as Support
VectorMachine.
Types of SVM
1) Linear SVM:
Linear SVM is used for linearly separable data, which means if a dataset can be classified
into twoclasses by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
2) Non-Linear SVM:
Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot
be classified by using a straight line, then such data is termed as non-linear data and
classifier used iscalled as Non-linear SVM classifier.
How does it work?
Suppose we have a dataset that has two tags (green and blue), and the dataset has two
features x1and x2.
The classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below images:
An intuition for large-margin classification, insisting on a large margin reduces the capacity of
the model: the range of angles at which the fat decision surface can be placed is smaller than
for a decision hyperplane
SVM classifier creates a maximum-margin hyperplane that lies in a transformed input space
and splitsthe example classes, while maximizing the distance to the nearest cleanly split
examples.
SVM methods have been used
as powerful tools in solving
classification problems in a
wide range ofapplication fields.
Sometimes people refer to
SVM as large margin
classifiers We will consider
what that means and what an SVM hypothesis looks like The
SVM cost function is as above, and we've drawn out the cost terms below:
Left is cost1 and right is cost0 What does it take to make terms small
LOSS FUNCTION
HINGE LOSS
SVM KERNELS
SVM algorithms use a set of mathematical functions that are defined as the kernel.
The function of kernel is to take data as input and transform it into the required form.
Different SVM algorithms use different types of kernel functions.
These functions can be different types.
For example linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.
Introduce Kernel functions for sequence data, graphs, text, images, as well as vectors.
The most used type of kernel function is RBF.
Because it has localized and finite response along the entire x-axis.
The kernel functions return the inner product between two points in a suitable feature space.
Thus by defining a notion of similarity, with little computational cost even in very high-
dimensionalspaces.
UNIT : V COMPLETED