0% found this document useful (0 votes)
14 views21 pages

ML 1

The document discusses well-posed learning problems and provides examples of tasks, performance measures, and experiences for problems such as spam filtering, checkers playing, handwriting recognition, robot driving, fruit prediction, and face recognition. It also discusses key aspects of machine learning such as how machine learning works, features of machine learning, and the need for machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

ML 1

The document discusses well-posed learning problems and provides examples of tasks, performance measures, and experiences for problems such as spam filtering, checkers playing, handwriting recognition, robot driving, fruit prediction, and face recognition. It also discusses key aspects of machine learning such as how machine learning works, features of machine learning, and the need for machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Well posed learning problems

Well Posed Learning Problem – A computer program is said to learn from experience E in
context to some task T and some performance measure P, if its performance on T, as was
measured by P, upgrades with experience E.
Any problem can be segregated as well-posed learning problem if it has three traits –
 Task
 Performance Measure
 Experience

Certain examples that efficiently defines the well-posed learning problem are –
1. To better filter emails as spam or not
 Task – Classifying emails as spam or not
 Performance Measure – The fraction of emails accurately classified as spam or not
spam
 Experience – Observing you label emails as spam or not spam
2. A checkers learning problem
 Task – Playing checkers game
 Performance Measure – percent of games won against opposer
 Experience – playing implementation games against itself
3. Handwriting Recognition Problem
 Task – Acknowledging handwritten words within portrayal
 Performance Measure – percent of words accurately classified
 Experience – a directory of handwritten words with given classifications
4. A Robot Driving Problem
 Task – driving on public four-lane highways using sight scanners
 Performance Measure – average distance progressed before a fallacy
 Experience – order of images and steering instructions noted down while observing a
human driver
5. Fruit Prediction Problem
 Task – forecasting different fruits for recognition
 Performance Measure – able to predict maximum variety of fruits
 Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
 Task – predicting different types of faces
 Performance Measure – able to predict maximum types of faces
 Experience – training machine with maximum amount of datasets of different face
images
7. Automatic Translation of documents
 Task – translating one type of language used in a document to other language
 Performance Measure – able to convert one language to other efficiently
 Experience – training machine with a large dataset of different types of languages

Machine Learning
The Machine Learning Tutorial covers both the fundamentals and more complex ideas of machine
learning. Students and professionals in the workforce can benefit from our machine learning
tutorial.
A rapidly developing field of technology, machine learning allows computers to automatically learn
from previous data. For building mathematical models and making predictions based on historical
data or information, machine learning employs a variety of algorithms. It is currently being used
for a variety of tasks, including speech recognition, email filtering, auto-tagging on Facebook, a
recommender system, and image recognition.

You will learn about the many different methods of machine learning, including reinforcement
learning, supervised learning, and unsupervised learning, in this machine learning tutorial.
Regression and classification models, clustering techniques, hidden Markov models, and various
sequential models will all be covered.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their experiences
with their learning capability, and we have computers or machines which work on our instructions.
But can a machine also learn from experiences or past data like a human does? So here comes the
role of Machine Learning.

Introduction to Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the creation of
algorithms that enable a computer to independently learn from data and previous experiences.
Arthur Samuel first used the term "machine learning" in 1959. It could be summarized as follows:

Without being explicitly programmed, machine learning enables a machine to automatically learn
from data, improve performance from experiences, and predict things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample historical data,
or training data. For the purpose of developing predictive models, machine learning brings
together statistics and computer science. Algorithms that learn from historical data are either
constructed or utilized in machine learning. The performance will rise in proportion to the quantity
of information we provide.

A machine can learn if it can gain more data to improve its performance.

How does Machine Learning work


A machine learning system builds prediction models, learns from previous data, and predicts the
output of new data whenever it receives it. The amount of data helps to build a better model that
accurately predicts the output, which in turn affects the accuracy of the predicted output.

Let's say we have a complex problem in which we need to make predictions. Instead of writing
code, we just need to feed the data to generic algorithms, which build the logic based on the data
and predict the output. Our perspective on the issue has changed as a result of machine learning.
The Machine Learning algorithm's operation is depicted in the following block diagram:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge amount of the data.

Need for Machine Learning


The demand for machine learning is steadily rising. Because it is able to perform tasks that are too
complex for a person to directly implement, machine learning is required. Humans are constrained
by our inability to manually access vast amounts of data; as a result, we require computer systems,
which is where machine learning comes in to simplify our lives.

By providing them with a large amount of data and allowing them to automatically explore the
data, build models, and predict the required output, we can train machine learning algorithms. The
cost function can be used to determine the amount of data and the machine learning algorithm's
performance. We can save both time and money by using machine learning.

The significance of AI can be handily perceived by its utilization's cases, Presently, AI is utilized in
self-driving vehicles, digital misrepresentation identification, face acknowledgment, and
companion idea by Facebook, and so on. Different top organizations, for example, Netflix and
Amazon have constructed AI models that are utilizing an immense measure of information to
examine the client interest and suggest item likewise.

Following are some key points which show the importance of Machine
Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Design a Learning System in Machine Learning:-


According to Arthur Samuel “Machine Learning enables a Machine to Automatically learn from
Data, Improve performance from an Experience and predict things without explicitly
programmed.”
In Simple Words, When we fed the Training Data to Machine Learning Algorithm, this algorithm
will produce a mathematical model and with the help of the mathematical model, the machine will
make a prediction and take a decision without being explicitly programmed. Also, during training
data, the more machine will work with it the more it will get experience and the more efficient
result is produced.

Example : In Driverless Car, the training data is fed to Algorithm like how to Drive Car in
Highway, Busy and Narrow Street with factors like speed limit, parking, stop at signal etc. After
that, a Logical and Mathematical model is created on the basis of that and after that, the car will
work according to the logical model. Also, the more data the data is fed the more efficient output
is produced.

Designing a Learning System in Machine Learning :


According to Tom Mitchell, “A computer program is said to be learning from experience (E), with
respect to some task (T). Thus, the performance measure (P) is the performance at task T, which is
measured by P, and it improves with experience E.”
Example: In Spam E-Mail detection,
 Task, T: To classify mails into Spam or Not Spam.
 Performance measure, P: Total percent of mails being correctly classified as being
“Spam” or “Not Spam”.
 Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System are:

Step 1) Choosing the Training Experience: The very important and first task is to choose the
training data or training experience which will be fed to the Machine Learning Algorithm. It is
important to note that the data or experience that we fed to the algorithm must have a significant
impact on the Success or Failure of the Model. So Training data or experience should be chosen
wisely.
Below are the attributes which will impact on Success and Failure of Data:
 The training experience will be able to provide direct or indirect feedback regarding
choices. For example: While Playing chess the training data will provide feedback to
itself like instead of this move if this is chosen the chances of success increases.
 Second important attribute is the degree to which the learner will control the sequences
of training examples. For example: when training data is fed to the machine then at that
time accuracy is very less but when it gains experience while playing again and again
with itself or opponent the machine algorithm will get feedback and control the chess
game accordingly.
 Third important attribute is how it will represent the distribution of examples over
which performance will be measured. For example, a Machine learning algorithm will
get experience while going through a number of different cases and different examples.
Thus, Machine Learning Algorithm will get more and more experience by passing
through more and more examples and hence its performance will increase.

Step 2- Choosing target function: The next important step is choosing the target function. It
means according to the knowledge fed to the algorithm the machine learning will choose
NextMove function which will describe what type of legal moves should be taken. For example :
While playing chess with the opponent, when opponent will play then the machine learning
algorithm will decide what be the number of possible legal moves taken in order to get success.

Step 3- Choosing Representation for Target function: When the machine algorithm will know
all the possible legal moves the next step is to choose the optimized move using any
representation i.e. using linear Equations, Hierarchical Graph Representation, Tabular form etc.
The NextMove function will move the Target move like out of these move which will provide
more success rate. For Example : while playing chess machine have 4 possible moves, so the
machine will choose that optimized move which will provide success to it.

Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen


just with the training data. The training data had to go through with set of example and through
these examples the training data will approximates which steps are chosen and after that machine
will provide feedback on it. For Example : When a training data of Playing chess is fed to
algorithm so at that time it is not machine algorithm will fail or get success and again from that
failure or success it will measure while next move what step should be chosen and what is its
success rate.

Step 5- Final Design: The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision and what will be the next step etc.
Example: DeepBlue is an intelligent computer which is ML-based won chess game against the
chess expert Garry Kasparov, and it became the first computer which had beaten a human chess
expert.

Issues in Machine Learning


1. Inadequate Training Data
The major issue that comes while using machine learning algorithms is the lack of quality as well as
quantity of data. Although data plays a vital role in the processing of machine learning algorithms,
many data scientists claim that inadequate data, noisy data, and unclean data are extremely
exhausting the machine learning algorithms. For example, a simple task requires thousands of
sample data, and an advanced task such as speech or image recognition needs millions of sample
data examples. Further, data quality is also important for the algorithms to work ideally, but the
absence of data quality is also found in Machine Learning applications. Data quality can be
affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well as
accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in machine
learning models. Hence, incorrect data may affect the accuracy of the results also.
o Generalizing of output data- Sometimes, it is also found that generalizing output data becomes
complex, which results in comparatively poor future actions.

2. Poor quality of data


As we have discussed above, data plays a significant role in machine learning, and it must be of
good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less
accuracy in classification and low-quality results. Hence, data quality can also be considered as a
major common problem while processing machine learning algorithms.
3. Non-representative training data
To make sure our training model is generalized well or not, we have to ensure that sample training
data must be representative of new cases that we need to generalize. The training data must cover
all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in less accurate
predictions. A machine learning model is said to be ideal if it predicts well for generalized cases
and provides accurate decisions. If there is less training data, then there will be a sampling noise in
the model, called the non-representative training set. It won't be accurate in predictions. To
overcome this, it will be biased against one class or a group.

Hence, we should use representative data in training to protect against being biased and make
accurate predictions without any drift.

4. Overfitting and Underfitting


Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the performance
of the model. Let's understand with a simple example where we have a few training data sets such
as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a considerable
probability of identification of an apple as papaya because we have a massive amount of biased
data in the training data set; hence prediction got negatively affected. The main reason behind
overfitting is using non-linear methods used in machine learning algorithms as they build non-
realistic data models. We can overcome overfitting by using linear and parametric algorithms in
the machine learning models.

Methods to reduce overfitting:

o Increase training data in a dataset.


o Reduce model complexity by simplifying the model by selecting one with fewer parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.

Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with
fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys
the accuracy of the machine learning model.
Underfitting occurs when our model is too simple to understand the base structure of the data,
just like an undersized pant. This generally happens when we have limited data into the data set,
and we try to build a linear model with non-linear data. In such scenarios, the complexity of the
model destroys, and rules of the machine learning model become too easy to be applied on this
data set, and the model starts doing wrong predictions as well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Monitoring and maintenance


As we know that generalized output data is mandatory for any machine learning model; hence,
regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.

6. Getting bad recommendations


A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example where at a
specific time customer is looking for some gadgets, but now customer requirement changed over
time but still machine learning model showing same recommendations to the customer while
customer expectation has been changed. This incident is called a Data Drift. It generally occurs
when new data is introduced or interpretation of data changes. However, we can overcome this by
regularly updating and monitoring data according to the expectations.

7. Lack of skilled resources


Although Machine Learning and Artificial Intelligence are continuously growing in the market, still
these industries are fresher in comparison to others. The absence of skilled resources in the form
of manpower is also an issue. Hence, we need manpower having in-depth knowledge of
mathematics, science, and technologies for developing and managing scientific substances for
machine learning.

8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning algorithm.
To identify the customers who paid for the recommendations shown by the model and who don't
even check them. Hence, an algorithm is necessary to recognize the customer behavior and trigger
a relevant recommendation for the user based on past experience.
9. Process Complexity of Machine Learning
The machine learning process is very complex, which is also another major issue faced by machine
learning engineers and data scientists. However, Machine Learning and Artificial Intelligence are
very new technologies but are still in an experimental phase and continuously being changing over
time. There is the majority of hits and trial experiments; hence the probability of error is higher
than expected. Further, it also includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, etc., making the procedure more complicated and
quite tedious.

10. Data Bias


Data Biasing is also found a big challenge in Machine Learning. These errors exist when certain
elements of the dataset are heavily weighted or need more importance than others. Biased data
leads to inaccurate results, skewed outcomes, and other analytical errors. However, we can resolve
this error by determining where data is actually biased in the dataset. Further, take necessary steps
to reduce it.

Methods to remove Data Bias:

o Research more for customer segmentation.


o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content moderation, and intent recognition.

11. Lack of Explainability


This basically means the outputs cannot be easily comprehended as it is programmed in specific
ways to deliver for certain conditions. Hence, a lack of explainability is also found in machine
learning algorithms which reduce the credibility of the algorithms.

12. Slow implementations and results


This issue is also very commonly seen in machine learning models. However, machine learning
models are highly efficient in producing accurate results but are time-consuming. Slow
programming, excessive requirements' and overloaded data take more time to provide accurate
results than expected. This needs continuous maintenance and monitoring of the model for
delivering accurate results.
13. Irrelevant features
Although machine learning models are intended to give the best possible outcome, if we feed
garbage data as input, then the result will also be garbage. Hence, we should use relevant features
in our training sample. A machine learning model is said to be good if training data has a good set
of features or less to no irrelevant features.

Types of Machine Learning

1. Supervised Machine Learning


Supervised learning is defined as when a model gets trained on a “Labelled Dataset”. Labelled
datasets have both input and output parameters. In Supervised Learning algorithms learn to map
points between inputs and correct outputs. It has both training and validation datasets labelled.
Supervised Learning

Let’s understand it with the help of an example.


Example: Consider a scenario where you have to build an image classifier to differentiate
between cats and dogs. If you feed the datasets of dogs and cats labelled images to the algorithm,
the machine will learn to classify between a dog or a cat from these labeled images. When we
input new dog or cat images that it has never seen before, it will use the learned algorithms and
predict whether it is a dog or a cat. This is how supervised learning works, and this is
particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
 Classification
 Regression

Classification
Classification deals with predicting categorical target variables, which represent discrete classes
or labels. For instance, classifying emails as spam or not spam, or predicting whether a patient has
a high risk of heart disease. Classification algorithms learn to map the input features to one of the
predefined classes.
Classification algorithms are used to solve the classification problems in which the output variable
is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset. Some real-world examples of
classification algorithms are Spam Detection, Email filtering, etc.

Here are some classification algorithms:


 Logistic Regression
 Support Vector Machine
 Random Forest
 Decision Tree
 K-Nearest Neighbors (KNN)
 Naive Bayes

Regression
Regression, on the other hand, deals with predicting continuous target variables, which represent
numerical values. For example, predicting the price of a house based on its size, location, and
amenities, or forecasting the sales of a product. Regression algorithms learn to map the input
features to a continuous numerical value.
Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such
as market trends, weather prediction, etc.
Here are some regression algorithms:
 Linear Regression
 Polynomial Regression
 Ridge Regression
 Lasso Regression
 Decision tree
 Random Forest

Advantages of Supervised Machine Learning


 Supervised Learning models can have high accuracy as they are trained on labelled
data.
 The process of decision-making in supervised learning models is often interpretable.
 It can often be used in pre-trained models which saves time and resources when
developing new models from scratch.

Disadvantages of Supervised Machine Learning


 It has limitations in knowing patterns and may struggle with unseen or unexpected
patterns that are not present in the training data.
 It can be time-consuming and costly as it relies on labeled data only.
 It may lead to poor generalizations based on new data.

Applications of Supervised Learning


Supervised learning is used in a wide variety of applications, including:
 Image classification: Identify objects, faces, and other features in images.
 Natural language processing: Extract information from text, such as sentiment,
entities, and relationships.
 Speech recognition: Convert spoken language into text.
 Recommendation systems: Make personalized recommendations to users.
 Predictive analytics: Predict outcomes, such as sales, customer churn, and stock prices.
 Medical diagnosis: Detect diseases and other medical conditions.
 Fraud detection: Identify fraudulent transactions.
 Autonomous vehicles: Recognize and respond to objects in the environment.
 Email spam detection: Classify emails as spam or not spam.
 Quality control in manufacturing: Inspect products for defects.
 Credit scoring: Assess the risk of a borrower defaulting on a loan.
 Gaming: Recognize characters, analyze player behavior, and create NPCs.
 Customer support: Automate customer support tasks.
 Weather forecasting: Make predictions for temperature, precipitation, and other
meteorological parameters.
 Sports analytics: Analyze player performance, make game predictions, and optimize
strategies.

2. Unsupervised Machine Learning


Unsupervised Learning Unsupervised learning is a type of machine learning technique in which an
algorithm discovers patterns and relationships using unlabeled data. Unlike supervised learning,
unsupervised learning doesn’t involve providing the algorithm with labeled target outputs. The
primary goal of Unsupervised learning is often to discover hidden patterns, similarities, or clusters
within the data, which can then be used for various purposes, such as data exploration,
visualization, dimensionality reduction, and more.

Unsupervised Learning

Let’s understand it with the help of an example.


Example: Consider that you have a dataset that contains information about the purchases you
made from the shop. Through clustering, the algorithm can group the same purchasing behavior
among you and other customers, which reveals potential customers without predefined labels.
This type of information can help businesses get target customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
 Clustering
 Association

Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for labeled
examples.
Here are some clustering algorithms:
 K-Means Clustering algorithm
 Mean-shift algorithm
 DBSCAN Algorithm
 Principal Component Analysis
 Independent Component Analysis

Association
Association rule learning is a technique for discovering relationships between items in a dataset. It
identifies rules that indicate the presence of one item implies the presence of another item with a
specific probability.
Here are some association rule learning algorithms:
 Apriori Algorithm
 Eclat
 FP-growth Algorithm

Advantages of Unsupervised Machine Learning


 It helps to discover hidden patterns and various relationships between the data.
 Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
 It does not require labeled data and reduces the effort of data labeling.

Disadvantages of Unsupervised Machine Learning


 Without using labels, it may be difficult to predict the quality of the model’s output.
 Cluster Interpretability may not be clear and may not have meaningful interpretations.
 It has techniques such as autoencoders and dimensionality reduction that can be used to
extract meaningful features from raw data.

Applications of Unsupervised Learning


Here are some common applications of unsupervised learning:
 Clustering: Group similar data points into clusters.
 Anomaly detection: Identify outliers or anomalies in data.
 Dimensionality reduction: Reduce the dimensionality of data while preserving its
essential information.
 Recommendation systems: Suggest products, movies, or content to users based on
their historical behavior or preferences.
 Topic modeling: Discover latent topics within a collection of documents.
 Density estimation: Estimate the probability density function of data.
 Image and video compression: Reduce the amount of storage required for multimedia
content.
 Data preprocessing: Help with data preprocessing tasks such as data cleaning,
imputation of missing values, and data scaling.
 Market basket analysis: Discover associations between products.
 Genomic data analysis: Identify patterns or group genes with similar expression
profiles.
 Image segmentation: Segment images into meaningful regions.
 Community detection in social networks: Identify communities or groups of
individuals with similar interests or connections.
 Customer behavior analysis: Uncover patterns and insights for better marketing and
product recommendations.
 Content recommendation: Classify and tag content to make it easier to recommend
similar items to users.
 Exploratory data analysis (EDA): Explore data and gain insights before defining
specific tasks.

3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised and
unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful when
obtaining labeled data is costly, time-consuming, or resource-intensive. This approach is useful
when the dataset is expensive and time-consuming. Semi-supervised learning is chosen when
labeled data requires skills and relevant resources in order to train or learn from it.
We use these techniques when we are dealing with data that is a little bit labeled and the rest large
portion of it is unlabeled. We can use the unsupervised techniques to predict labels and then feed
these labels to supervised techniques. This technique is mostly applicable in the case of image
data sets where usually all images are not labeled.

Semi-Supervised Learning

Let’s understand it with the help of an example.


Example: Consider that we are building a language translation model, having labeled translations
for every sentence pair can be resources intensive. It allows the models to learn from labeled and
unlabeled sentence pairs, making them more accurate. This technique has led to significant
improvements in the quality of machine translation services.

Types of Semi-Supervised Learning Methods


There are a number of different semi-supervised learning methods each with its own
characteristics. Some of the most common ones include:
 Graph-based semi-supervised learning: This approach uses a graph to represent the
relationships between the data points. The graph is then used to propagate labels from
the labeled data points to the unlabeled data points.
 Label propagation: This approach iteratively propagates labels from the labeled data
points to the unlabeled data points, based on the similarities between the data points.
 Co-training: This approach trains two different machine learning models on different
subsets of the unlabeled data. The two models are then used to label each other’s
predictions.
 Self-training: This approach trains a machine learning model on the labeled data and
then uses the model to predict labels for the unlabeled data. The model is then retrained
on the labeled data and the predicted labels for the unlabeled data.
 Generative adversarial networks (GANs) : GANs are a type of deep learning
algorithm that can be used to generate synthetic data. GANs can be used to generate
unlabeled data for semi-supervised learning by training two neural networks, a
generator and a discriminator.
Advantages of Semi- Supervised Machine Learning
 It leads to better generalization as compared to supervised learning, as it takes both
labeled and unlabeled data.
 Can be applied to a wide range of data.

Disadvantages of Semi- Supervised Machine Learning


 Semi-supervised methods can be more complex to implement compared to other
approaches.
 It still requires some labeled data that might not always be available or easy to obtain.
 The unlabeled data can impact the model performance accordingly.

Applications of Semi-Supervised Learning


Here are some common applications of semi-supervised learning:
 Image Classification and Object Recognition: Improve the accuracy of models by
combining a small set of labeled images with a larger set of unlabeled images.
 Natural Language Processing (NLP): Enhance the performance of language models
and classifiers by combining a small set of labeled text data with a vast amount of
unlabeled text.
 Speech Recognition: Improve the accuracy of speech recognition by leveraging a
limited amount of transcribed speech data and a more extensive set of unlabeled audio.
 Recommendation Systems: Improve the accuracy of personalized recommendations by
supplementing a sparse set of user-item interactions (labeled data) with a wealth of
unlabeled user behavior data.
 Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a
small set of labeled medical images alongside a larger set of unlabeled images.

4. Reinforcement Machine Learning


Reinforcement machine learning algorithm is a learning method that interacts with the environment
by producing actions and discovering errors. Trial, error, and delay are the most relevant
characteristics of reinforcement learning. In this technique, the model keeps on increasing its
performance using Reward Feedback to learn the behavior or pattern. These algorithms are
specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with
humans and even itself to get better and better performers in Go Game. Each time we feed in data,
they learn and add the data to their knowledge which is training data. So, the more it learns the
better it gets trained and hence experienced.
Here are some of most common reinforcement learning algorithms:
 Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which
maps states to actions. The Q-function estimates the expected reward of taking a
particular action in a given state.
 SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL
algorithm that learns a Q-function. However, unlike Q-learning, SARSA updates the Q-
function for the action that was actually taken, rather than the optimal action.
 Deep Q-learning : Deep Q-learning is a combination of Q-learning and deep learning.
Deep Q-learning uses a neural network to represent the Q-function, which allows it to
learn complex relationships between states and actions.
Reinforcement Machine Learning

Let’s understand it with the help of examples.


Example: Consider that you are training an AI agent to play a game like chess. The agent explores
different moves and receives positive or negative feedback based on the outcome. Reinforcement
Learning also finds applications in which they learn to perform tasks by interacting with their
surroundings.

Types of Reinforcement Machine Learning


There are two main types of reinforcement learning:
Positive reinforcement
 Rewards the agent for taking a desired action.
 Encourages the agent to repeat the behavior.
 Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct
answer.
Negative reinforcement
 Removes an undesirable stimulus to encourage a desired behavior.
 Discourages the agent from repeating the behavior.
 Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by
completing a task.
Advantages of Reinforcement Machine Learning
 It has autonomous decision-making that is well-suited for tasks and that can learn to
make a sequence of decisions, like robotics and game-playing.
 This technique is preferred to achieve long-term results that are very difficult to achieve.
 It is used to solve a complex problems that cannot be solved by conventional
techniques.

Disadvantages of Reinforcement Machine Learning


 Training Reinforcement Learning agents can be computationally expensive and time-
consuming.
 Reinforcement learning is not preferable to solving simple problems.
 It needs a lot of data and a lot of computation, which makes it impractical and costly.
Applications of Reinforcement Machine Learning
Here are some applications of reinforcement learning:
 Game Playing: RL can teach agents to play games, even complex ones.
 Robotics: RL can teach robots to perform tasks autonomously.
 Autonomous Vehicles: RL can help self-driving cars navigate and make decisions.
 Recommendation Systems: RL can enhance recommendation algorithms by learning
user preferences.
 Healthcare: RL can be used to optimize treatment plans and drug discovery.
 Natural Language Processing (NLP): RL can be used in dialogue systems and
chatbots.
 Finance and Trading: RL can be used for algorithmic trading.
 Supply Chain and Inventory Management: RL can be used to optimize supply chain
operations.
 Energy Management: RL can be used to optimize energy consumption.
 Game AI: RL can be used to create more intelligent and adaptive NPCs in video games.
 Adaptive Personal Assistants: RL can be used to improve personal assistants.
 Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create
immersive and interactive experiences.
 Industrial Control: RL can be used to optimize industrial processes.
 Education: RL can be used to create adaptive learning systems.
 Agriculture: RL can be used to optimize agricultural operations.

Association Rule Learning


Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of dataset.
It is based on different rules to discover the interesting relations between variables in the database.

The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket, as in a supermarket,
all products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram:
Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.

How does Association Rule Learning work?


Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These types
of relationships where we can find out some association or relation between two items is known as
single cardinality. It is all about creating rules, and if the number of items increases, then cardinality
also increases accordingly. So, to measure the associations between thousands of data items, there
are several metrics. These metrics are given below:

o Support
o Confidence
o Lift

Let's understand each of them:


Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the
fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and
Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that contain X.

Lift
It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are independent of
each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is independent of each other.
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has a negative
effect on another.

Types of Association Rule Lerning


Association rule learning can be divided into three algorithms:

Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on the
databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first
search technique to find frequent itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.

F-P Growth Algorithm


The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.

Applications of Association Rule Learning


It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:

o Market Basket Analysis: It is one of the popular examples and applications of association rule
mining. This technique is commonly used by big retailers to determine the association between
items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it helps in
identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other applications.

You might also like