0% found this document useful (0 votes)
18 views16 pages

ML Unit-1

MACHINE LEARNING NOTES UNIT-1 JNTU K R23

Uploaded by

codetree24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views16 pages

ML Unit-1

MACHINE LEARNING NOTES UNIT-1 JNTU K R23

Uploaded by

codetree24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MACHINE LEARNING

Unit – I
Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning
Systems, Main Challenges of Machine Learning. Statistical Learning: Introduction, Supervised and
Unsupervised Learning, Training and Test Loss, Trade-offs in Statistical Learning, Estimating Risk
Statistics, Sampling distribution of an estimator, Empirical Risk Minimization.

TOPIC-1: Introduction- Artificial Intelligence, Machine Learning, Deep learning:


• Artificial Intelligence (AI): In today's world, technology is growing very fast, and we are getting
in touch with different new technologies day by day.
• Here, one of the booming technologies of computer science is Artificial Intelligence which is ready
to create a new revolution in the world by making intelligent machines.
• Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines
"man-made," and intelligence defines "thinking power", hence AI means "a man-made thinking
power."
• So, we can define AI as: "It is a branch of computer science by which we can create intelligent
machines which can behave like a human, think like humans, and able to make decisions."
• Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems.

Why Artificial Intelligence?


• With the help of AI, you can create such software or devices which can solve real-world problems
very easily and with accuracy such as health issues, marketing, traffic issues, etc.
• With the help of AI, you can create your personal virtual Assistant, such as Cortana, Google
Assistant, Siri, etc.
• With the help of AI, you can build such Robots which can work in an environment where survival
of humans can be at risk.
• AI opens a path for other new technologies, new devices, and new Opportunities.

CODETREE.GRPHY.COM
Machine Learning:

• Machine learning is a growing technology which enables computers to learn automatically from
past data.
• Machine learning uses various algorithms for building mathematical models and making predictions
using historical data or information.
• Currently, it is being used for various tasks such as image recognition, speech recognition, email
filtering, Facebook auto-tagging, recommender system, and many more.
Arthur Samuel
• The term machine learning was first introduced by Arthur Samuel in 1959. We can define it in a
summarized way as:
•Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.

CODETREE.GRPHY.COM
Deep Learning:
• Deep learning is based on the branch of machine learning, which is a subset of
artificial intelligence.

• Since neural networks imitate the human brain and so deep learning will do. In deep
learning, nothing is programmed explicitly.
• Basically, it is a machine learning class that makes use of numerous nonlinear
processing units so as to perform feature extraction as well as transformation.
• IDEA: Deep learning is implemented with the help of Neural Networks, and the idea
behind the motivation of Neural Network is the biological neurons, which is nothing
but a brain cell.
•Deep learning is a collection of statistical techniques of machine learning for learning
feature hierarchies that are actually based on artificial neural networks.
• Example of Deep Learning:

CODETREE.GRPHY.COM
TOPIC-2: Types of Machine Learning Systems

There are so many different types of Machine Learning systems that it is useful to classify them in broad
categories, based on the following criteria:

CODETREE.GRPHY.COM
1. Whether or not they are trained with human supervision (supervised, unsupervised, semi supervised, and
Reinforcement Learning)
2. Whether or not they can learn incrementally on the fly (online versus batch learning)
3.Whether they work by simply comparing new data points to known data points, or instead by detecting
patterns in the training data and building a predictive model, much like scientists do (instance-based versus
model-based learning).

1. Supervised Machine Learning: As its name suggests, supervised machine learning is based on
supervision.
• It means in the supervised learning technique, we train the machines using the "labelled" dataset,
and based on the training, the machine predicts the output.
• The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.
Categories of Supervised Machine Learning:
• Supervised machine learning can be classified into two types of problems, which are given below:
• Classification
• Regression
Classification: Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.

• The classification algorithms predict the categories present in the dataset.

• Some real-world examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
• Random Forest Algorithm

• Decision Tree Algorithm


• Logistic Regression Algorithm
• Support Vector Machine Algorithm
Regression:
• Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables.
• These are used to predict continuous output variables, such as market trends, weather prediction,
etc.
Some popular Regression algorithms are given below:
• Simple Linear Regression Algorithm
• Multivariate Regression Algorithm
• Decision Tree Algorithm
• Lasso Regression

CODETREE.GRPHY.COM
Advantages and Disadvantages of Supervised Learning:
Advantages:
• Since supervised learning work with the labelled dataset so we can have an exact idea about the
classes of objects.
• These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
• These algorithms are not able to solve complex tasks.
• It may predict the wrong output if the test data is different from the training data.
• It requires lots of computational time to train the algorithm.

2. Unsupervised Machine Learning:


• Unsupervised learning is different from the supervised learning technique; as its name suggests,
there is no need for supervision.
• It means, in unsupervised machine learning, the machine is trained using the unlabeled dataset, and
the machine predicts the output w
• The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences.
• Machines are instructed to find the hidden patterns from the input dataset.

Categories of Unsupervised Machine Learning:

Unsupervised Learning can be further classified into two types, which are given below:
• Clustering
• Association
1) Clustering:
• The clustering technique is used when we want to find the inherent groups from the data.
• It is a way to group the objects into a cluster such that the objects with the most similarities remain
in one group and have fewer or no similarities with the objects of other groups.
• An example of the clustering algorithm is grouping the customers by their purchasing behavior.

Some of the popular clustering algorithms are given below:


• K-Means Clustering algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis

CODETREE.GRPHY.COM
2) Association:
• Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset.
• The main aim of this learning algorithm is to find the dependency of one data item on another data
item and map those variables accordingly so that it can generate maximum profit.
• Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm:


Advantages:
• These algorithms can be used for complicated tasks compared to the supervised ones because these
algorithms work on the unlabeled dataset.
• Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is easier
as compared to the labelled dataset.

Disadvantages:
• The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
• Working with Unsupervised learning is more difficult as it works with the unlabeled dataset that
does not map with the output.

3. Semi-Supervised Learning:
• Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning.
• It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.
To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced.
• We can imagine these algorithms with an example. Supervised learning is where a student is under
the supervision of an instructor at home and college.
• Further, if that student is self- analyzing the same concept without any help from the instructor, it
comes under unsupervised learning.
• Under semi-supervised learning, the student has to revise himself after analyzing the same concept
under the guidance of an instructor at college.
Advantages:
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

CODETREE.GRPHY.COM
Disadvantages:
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level data.
• Accuracy is low.

4. Reinforcement Learning:
• Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning
from experiences, and improving its performance.
• Agent gets rewarded for each good action and get punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the rewards.
• In reinforcement learning, there is no labelled data like supervised learning, and agents learn from
their experiences only.

• The reinforcement learning process is similar to a human being; for example, a child learns various
things by experiences in his day-to-day life.
• An example of reinforcement learning is to play a game, where the Game is the environment, moves
of an agent at each step define states, and the goal of the agent is to get a high score.
• Agent receives feedback in terms of punishment and rewards.
• Due to its way of working, reinforcement learning is employed in different fields such as Game
theory, Operation Research, Information theory, multi-agent systems.
Categories of Reinforcement Learning:
• Reinforcement learning is categorized mainly into two types of methods/algorithms:

• Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the


tendency that the required behavior would occur again by adding something. It enhances the
strength of the behavior of the agent and positively impacts it.
• Negative Reinforcement Learning: Negative reinforcement learning works exactly opposite to the
positive RL. It increases the tendency that the specific behavior would occur again by avoiding the
negative condition.

CODETREE.GRPHY.COM
Real-world Use cases of Reinforcement Learning
• Video Games
• Robotics
• Text Mining

TOPIC-3: Main Challenges of Machine Learning:


1) Lack Of Quality Data
One of the main issues in Machine Learning is the absence of good data. While upgrading, algorithms tend
to make developers exhaust most of their time on artificial intelligence.
 Data can be noisy which will result in inaccurate predictions.
 Incorrect or incomplete information can also lead to faulty programming through Machine
Learning.
2) Fault In Credit Card Fraud Detection
Although this AI-driven software helps to successfully detect credit card fraud, there are issues in Machine
Learning that make the process redundant.
3) Getting Bad Recommendations
Proposal engines are quite regular today. While some might be dependable, others may not appear to
provide the necessary results. Machine Learning algorithms tend to only impose what these proposal
engines have suggested.

4) Talent Deficit
Albeit numerous individuals are pulled into the ML business, however, there are still not many experts who
can take complete control of this innovation.

5) Implementation
Organizations regularly have examination engines working with them when they decide to move up to ML.
The usage of fresher ML strategies with existing procedures is a complicated errand.
6) Making The Wrong Assumptions
ML models can’t manage datasets containing missing data points. Thus, highlights that contain a huge part
of missing data should be erased.
7) Deficient Infrastructure
ML requires a tremendous amount of data stirring abilities. Inheritance frameworks can’t deal with the
responsibility and clasp under tension.

8) Having Algorithms Become Obsolete When Data Grows


ML algorithms will consistently require a lot of data when being trained. Frequently, these ML algorithms will be
trained over a specific data index and afterwards used to foresee future data, a cycle which you can only expect with
a significant amount of effort.
9) Absence Of Skilled Resources
The other issues in Machine Learning are that deep analytics and ML in their present structures are still
new technologies.

CODETREE.GRPHY.COM
10) Customer Segmentation
Let us consider the data of human behaviour by a user during a time for testing and the relevant previous
practices. All things considered, an algorithm is necessary to recognize those customers that will change over to
the paid form of a product and those that won’t.

The lists of supervised learning algorithms in ML are:

 Neural Networks
 Naive Bayesian Model
 Classification
 Support Vector Machines
 Regression
 Random Forest Model
11) Complexity
Although Machine Learning and Artificial Intelligence are booming, a majority of these sectors are still in
their experimental phases, actively undergoing a trial and error method.
12) Slow Results
Another one of the most common issues in Machine Learning is the slow-moving program. The Machine
Learning Models are highly efficient bearing accurate results but the said results take time to be produced.
13) Maintenance
Requisite results for different actions are bound to change and hence the data needed for the same is
different.

14) Concept Drift


This occurs when the target variable changes, resulting in the delivered results being inaccurate. This forces
the decay of the models as changes cannot be easily accustomed to or upgraded.
15) Data Bias
This occurs when certain aspects of a data set need more importance than others.

16) High Chances Of Error


Many algorithms will contain biased programming which will lead to biased datasets. It will not deliver the
right output and produces irrelevant information.
17) Lack Of Explainability
Machine Learning is often termed a “Black box” as deciphering the outcomes from an algorithm is often
complex and sometimes useless.

CODETREE.GRPHY.COM
TOPIC-4 Statistical Learning: Introduction
• Structuring and visualizing data are important aspects of data science, the main challenge lies in the
mathematical analysis of the data.
• When the goal is to interpret the model and quantify the uncertainty in the data, this analysis is
usually referred to as statistical learning.
There are two major goals for modeling data:
• 1) to accurately predict some future quantity of interest, given some observed data, and
• 2) To discover unusual or interesting patterns in the data.

TOPIC-5 Supervised and Unsupervised Learning:


1. Feature, Response:
• Given an input or feature vector x, one of the main goals of machine learning is to predict response
an output or response variable y.
• For example, x could be a digitized signature and y a binary variable that indicates whether the
signature is genuine or false.
2. Prediction function:
• Another example is where x represents the weight and smoking habits of an expecting mother and
y the birth weight of the baby.
• The data science attempt at this prediction is encoded in a mathematical prediction function g,
called the prediction function function, which takes as an input x and outputs a guess g(x) for y.
3. Regression, classification:
• In regression problems, the response variable y can take any real value.

• In contrast, regression when y can only lie in a finite set, say y ∈ {0. . . c − 1}, then predicting y is
conceptually the same as classifying the input x into one of c categories, and so prediction becomes
a classification problem.
• loss function:
• We can measure the accuracy of a prediction by with respect to a given response y by loss function
using some Loss(y,y’).
• In a regression setting the usual choice is the squared error loss (y−y’) 2 .

CODETREE.GRPHY.COM
TOPIC-6 Training and Test Loss:

CODETREE.GRPHY.COM
TOPIC-7 Tradeoffs in Statistical Learning:

CODETREE.GRPHY.COM
TOPIC-8 Estimating Risk:

1. IN-SAMPLE RISK:

2. CROSS-VALIDATION

CODETREE.GRPHY.COM
TOPIC-9 Sampling distributions of estimators
Since our estimators are statistics (particular functions of random variables), their distribution can be
derived from the joint distribution of X1 . . . Xn.
It is called the sampling distribution because it is based on the joint distribution of the random sample.
-Given a sampling distribution, we can – calculate the probability that an estimator will not differ
from the parameter θ by more than a specified amount
– obtain interval estimates rather than point estimates after we have a sample
- An interval estimate is a random interval such that the true parameter lies within this interval with
a given probability (say 95%).
– Choose between to estimators- we can, for instance, calculate the mean-squared error of the
estimator, Eθ[(θˆ − θ) 2 ] using the distribution of θˆ.
Sampling distributions of estimators depend on sample size, and we want to know exactly how the
distribution changes as we change this size so that we can make the right trade-offs between cost and
accuracy.

TOPIC-10 Empirical Risk Minimization:


• Empirical Risk Minimization is a fundamental concept in machine learning, yet surprisingly many
practitioners are not familiar with it.
• Understanding ERM is essential to understanding the limits of machine learning algorithms and to
form a good basis for practical problem-solving skills.
• The theory behind ERM is the theory that explains the VC-dimension, Probably Approximately
Correct (PAC) Learning and other fundamental concepts.

CODETREE.GRPHY.COM
The ERM is a nice idea, if used with care

The plot below shows a regression problem with a training set of 15 points.

The ERM principle is an inference principle which consists in finding the model f^ by minimizing
the empirical risk:
f^= arg minf:X→Y Remp(h)
where the empirical risk is an estimate of the risk computed as the average of the loss function over
the training sample D={(Xi,Yi)}Ni=1:
Remp(f)=1N∑i=1Nℓ(f(Xi),Yi)
with the loss function ℓ.

CODETREE.GRPHY.COM

You might also like