ML Unit-1 Notes
ML Unit-1 Notes
MACHINE LEARNING
• Machine learning is a growing technology which enables computers to learn automatically from
past data.
• Machine learning uses various algorithms for building mathematical models and making
predictions using historical data or information.
• Currently, it is being used for various tasks such as image recognition, speech recognition, email
filtering, Facebook auto-tagging, recommender system, and many more.
Arthur Samuel
• The term machine learning was first introduced by Arthur Samuel in 1959. We can define it in a
summarized way as:
• Machine learning enables a machine to automatically learn from data, improve
performance from experiences, and predict things without being explicitly programmed.
Definition of learning
A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while observing a
human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
A computer program which learns from experience is called a machine learning program or simply a learning
program. Such a program is sometimes also referred to as a learner.
All of the most successful speech recognition systems employ machine learning in some form.
For example, the SPHINXsy stem (e.g., Lee 1989) learns speaker-specific strategies for recognizing
the primitive sounds (phonemes) and words from the observed speech signal. Neural network
learning methods (e.g., Waibel et al. 1989) and methods for learning hidden Markov models
(e.g., Lee 1989) are effective for automatically customizing to,individual speakers, vocabularies,
microphone characteristics, background noise, etc. Similar techniques have potential applications
in many signal-interpretation problems.
Machine learning methods have been used to train computer-controlled vehicles to steer correctly
when driving on a variety of road types. For example, the ALVINN system (Pomerleau 1989)
has used its learned strategies to drive unassisted at 70 miles per hour for 90 miles on public
highways among other cars. Similar techniques have possible applications in many sensor-based
control problems.
Machine learning methods have been applied to a variety of large databases to learn general
regularities implicit in the data. For example, decision tree learning algorithms have been used
by NASA to learn how to classify celestial objects from the second Palomar Observatory Sky
Survey (Fayyad et al. 1995). This system is now used to automatically classify all objects in the
Sky Survey, which consists of three terrabytes of image data.
SYEDA SHAFIA Assistant Professor CSE Department
MVJCE 3
The most successful computer programs for playing games such as backgammon are based on
machiie learning algorithms. For example, the world's top computer program for backgammon,
TD-GAMMON(T esauro 1992, 1995). learned its strategy by playing over one million practice
games against itself. It now plays at a level competitive with the human world champion. Similar
techniques have applications in many practical problems where very large search spaces must be
examined efficiently.
For any learning system, we must be knowing the three elements — T (Task), P (Performance
Measure), and E (Training Experience). At a high level, the process of learning system looks as below.
The learning process starts with task T, performance measure P and training experience E and objective
are to find an unknown target function. The target function is an exact knowledge to be learned from the
training experience and its unknown. For example, in a case of credit approval, the learning system will
have customer application records as experience and task would be to classify whether the given
customer application is eligible for a loan. So in this case, the training examples can be represented as (x1,y1)(x2,y2)..
(xn,yn) where X represents customer application details and y represents the status of
credit approval.
With these details, what is that exact knowledge to be learned from the training experience?
So the target function to be learned in the credit approval learning system is a mapping function f:X →y.
This function represents the exact knowledge defining the relationship between input variable X and
output variable y.
choices. The design choices will be to decide the following key components:
We will look into the game - checkers learning problem and apply the above design choices. For a
checkers learning problem, the three elements will be,
During the design of the checker's learning system, the type of training experience available for a
learning system will have a significant effect on the success or failure of the learning.
1. Direct or Indirect training experience — In the case of direct training experience, an individual board
states and correct move for each board state are given.
In case of indirect training experience, the move sequences for a game and the final result (win, loss
or draw) are given for a number of games. How to assign credit or blame to individual moves is the
credit assignment problem.
2. Teacher or Not — Supervised — The training experience will be labeled, which means, all the board
states will be labeled with the correct move. So the learning takes place in the presence of a
supervisor or a teacher.
Unsupervised — The training experience will be unlabeled, which means, all the board states will not
have the moves. So the learner generates random games and plays against itself with no supervision
or teacher involvement
Semi-supervised — Learner generates game states and asks the teacher for help in finding the
correct move if the board state is confusing.
3. Is the training experience good — Do the training examples represent the distribution of examples
over which the final system performance will be measured? Performance is best when training
examples and test examples are from the same/a similar distribution.
The checker player learns by playing against oneself. Its experience is indirect. It may not encounter
moves that are common in human expert play. Once the proper training experience is available, the next
design step will be choosing the Target Function.
When you are playing the checkers game, at any moment of time, you make a decision on
choosing the best move from different possibilities. You think and apply the learning that you have
gained from the experience. Here the learning is, for a specific board, you move a checker such that your
board state tends towards the winning situation. Now the same learning has to be defined in terms of
the target function.
During the direct experience, the checkers learning system, it needs only to learn how to choose
the best move among some large search space. We need to find a target function that will help
us choose the best move among alternatives. Let us call this function ChooseMove and use the
notation ChooseMove : B →M to indicate that this function accepts as input any board from the
set of legal board states B and produces as output some move from the set of legal moves M.
When there is an indirect experience, it becomes difficult to learn such function. How about
assigning a real score to the board state.
So the function be V : B →R indicating that this accepts as input any board from the set of legal board
states B and produces an output a real score. This function assigns the higher scores to better board
states.
If the system can successfully learn such a target function V, then it can easily use it to select the best
move from any board position.
10
Let us therefore define the target value V(b) for an arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V (b) = V (b’), where b’ is the best final board state that can
be achieved starting from b and playing optimally until the end of the game.
The (4) is a recursive definition and to determine the value of V(b) for a particular board state, it
performs the search ahead for the optimal line of play, all the way to the end of the game. So this
definition is not efficiently computable by our checkers playing program, we say that it is a
nonoperational definition.
The goal of learning, in this case, is to discover an operational description of V ; that is, a description
that can be used by the checkers-playing program to evaluate states and select moves within realistic
time bounds.
It may be very difficult in general to learn such an operational form of V perfectly. We expect learning
algorithms to acquire only some approximation to the target function ^V.
Now that we have specified the ideal target function V, we must choose a representation that
the learning program will use to describe the function ^V that it will learn. As with earlier design
choices, we again have many options. We could, for example, allow the program to represent using a
large table with a distinct entry specifying the value for each distinct board state. Or we could allow it to
represent using a collection of rules that match against features of the board state, or a quadratic
polynomial function of predefined board features, or an artificial
neural network. In general, this choice of representation involves a crucial trade off. On one hand, we
wish to pick a very expressive representation to allow representing as close an approximation as
SYEDA SHAFIA Assistant Professor CSE Department
MVJCE 6
possible to the ideal target function V.
On the other hand, the more expressive the representation, the more training data the program
will require in order to choose among the alternative hypotheses it can represent. To keep the
discussion brief, let us choose a simple representation:
for any given board state, the function ^V will be calculated as a linear combination of the following
board features:
x1(b) — number of black pieces on board b
x2(b) — number of red pieces on b
x3(b) — number of black kings on b
x4(b) — number of red kings on b
x5(b) — number of red pieces threatened by black (i.e., which can be taken on black’s next turn)
x6(b) — number of black pieces threatened by red
^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 · x5(b) + w6 · x6(b)
Where w0 through w6 are numerical coefficients or weights to be obtained by a learning algorithm.
Let Successor(b) denotes the next board state following b for which it is again the program’s turn to
move. ^V is the learner’s current approximation to V. Using these information, assign the training value
of V_train(b) for any intermediate board state b as below :
V_train(b) ← ^V(Successor(b))
Adjusting the weights
Now its time to define the learning algorithm for choosing the weights and best fit the set of
training examples. One common approach is to define the best hypothesis as that which minimizes the
squared error E between the training values and the values predicted by the hypothesis ^V.
The learning algorithm should incrementally refine weights as more training examples become available
and it needs to be robust to errors in training data Least Mean Square (LMS) training rule is the one
training algorithm that will adjust weights a small amount in the direction that reduces the error.
The final design of our checkers learning system can be naturally described by four distinct
program modules that represent the central components in many learning systems.
1. The performance System — Takes a new board as input and outputs a trace of the game it played
against itself.
2. The Critic — Takes the trace of a game as an input and outputs a set of training examples of the
target function.
3. The Generalizer — Takes training examples as input and outputs a hypothesis that estimates the
target function. Good generalization to new cases is crucial.
4. The Experiment Generator — Takes the current hypothesis (currently learned function) as input and
outputs a new problem (an initial board state) for the performance system to explore.
Data Issues
1. Data Quality: Noisy, missing, or biased data can affect model performance.
2. Data Imbalance: Class imbalance can lead to biased models.
3. Data Leakage: Using test data in training can result in overfitting.
4. Data Preprocessing: Incorrect preprocessing can affect model performance.
Model Issues
1. Overfitting: Models that are too complex can memorize training data.
2. Underfitting: Models that are too simple can fail to capture patterns.
3. Model Selection: Choosing the right model for the problem can be challenging.
4. Hyperparameter Tuning: Finding optimal hyperparameters can be time-consuming.
Evaluation Issues
1. Bias and Fairness: Ensuring that models are fair and unbiased.
2. Data Privacy: Protecting sensitive data and ensuring compliance with regulations.
3. Model Transparency: Providing transparency into model decisions and data usage.
These are just some of the common issues in Machine Learning. By being aware of these issues, you can take
steps to mitigate them and develop more effective and responsible ML models.
There are so many different types of Machine Learning systems that it is useful to classify them in broad
categories, based on the following criteria:
1. Whether or not they are trained with human supervision (supervised, unsupervised, semi supervised,
and Reinforcement Learning)
2. Whether or not they can learn incrementally on the fly (online versus batch learning)
3.Whether they work by simply comparing new data points to known data points, or instead by detecting
patterns in the training data and building a predictive model, much like scientists do (instance-based
versus model-based learning).
1. Supervised Machine Learning: As its name suggests, supervised machine learning is based
on supervision.
• It means in the supervised learning technique, we train the machines using the "labelled" dataset,
and based on the training, the machine predicts the output.
• The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.
Categories of Supervised Machine Learning:
• Supervised machine learning can be classified into two types of problems, which are given below:
• Classification
• Regression
Classification: Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
• Some real-world examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
• Random Forest Algorithm
Unsupervised Learning can be further classified into two types, which are given below:
• Clustering
• Association
1) Clustering:
• The clustering technique is used when we want to find the inherent groups from the data.
• It is a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups.
• An example of the clustering algorithm is grouping the customers by their purchasing behavior.
2) Association:
• Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset.
• The main aim of this learning algorithm is to find the dependency of one data item on another
data item and map those variables accordingly so that it can generate maximum profit.
• Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.
Disadvantages:
• The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
• Working with Unsupervised learning is more difficult as it works with the unlabeled dataset that
does not map with the output.
3. Reinforcement Learning:
• Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance.
• Agent gets rewarded for each good action and get punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the rewards.
• In reinforcement learning, there is no labelled data like supervised learning, and agents learn from
their experiences only.
• The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life.
• An example of reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a high score.
• Agent receives feedback in terms of punishment and rewards.
• Due to its way of working, reinforcement learning is employed in different fields such as Game
theory, Operation Research, Information theory, multi-agent systems.
Categories of Reinforcement Learning:
• Reinforcement learning is categorized mainly into two types of methods/algorithms: