0% found this document useful (0 votes)
33 views

Machine Learning Introduction

1. Machine learning is a subfield of artificial intelligence that allows computers to learn from data without being explicitly programmed. 2. It involves using algorithms to identify patterns in large amounts of data in order to make predictions or decisions. 3. Common applications of machine learning include spam detection, face recognition, product recommendations, medical diagnosis, and more.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Machine Learning Introduction

1. Machine learning is a subfield of artificial intelligence that allows computers to learn from data without being explicitly programmed. 2. It involves using algorithms to identify patterns in large amounts of data in order to make predictions or decisions. 3. Common applications of machine learning include spam detection, face recognition, product recommendations, medical diagnosis, and more.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

MODULE 1

INTRODUCTION TO MACHINE LEARNING

Introduction to Machine Learning - How do machines learn- Examples of Machine


Learning Problems, Structure of Learning, Learning versus Designing, Training versus
Testing, Characteristics of Machine learning tasks, Predictive and descriptive tasks,
Machine learning Models: Geometric Models, Logical Models, Probabilistic Models.
Features: Feature types, Feature Construction and Transformation, Feature Selection.
INTRODUCTION
Machine learning is a Field of study that gives computers the ability to
learn and make predictions without being explicitly programmed.“

ML is a sub-field of Artificial Intelligence. It's based on the idea that


computers can learn from historical experiences, make vital decisions,
and predict future happenings without human intervention.
In recent years, Machine Learning has garnered a lot of attention around
the world, it has become one of the most important ways that people use
Artificial Intelligence.
Features of Machine learning
• Machine learning is data driven technology. Large amount of data generated by
organizations on daily bases. So, by notable relationships in data, organizations
makes better decisions.

• Machine can learn itself from past data and automatically improve.

• From the given dataset it detects various patterns on data.

• For the big organizations branding is important and it will become more easy to
target relatable customer base.

• It is similar to data mining because it is also deals with the huge amount of data.
Difference Between Machine Learning and Artificial
Intelligence
• Artificial Intelligence (AI) and Machine Learning (ML) are two closely
related but distinct fields within the broader field of computer science.

• AI is a discipline that focuses on creating intelligent machines that can


perform tasks that typically require human intelligence, such as visual
perception, speech recognition, decision-making, and natural language
processing.

• It involves the development of algorithms and systems that can reason,


learn, and make decisions based on input data.
Difference Between Machine Learning and Artificial
Intelligence

• On the other hand, Machine Learning (ML) is a subfield of AI that


involves teaching machines to learn from data without being explicitly
programmed.

• ML algorithms can identify patterns and trends in data and use them to
make predictions and decisions.

• ML is used to build predictive models, classify data, and recognize


patterns, and is an essential tool for many AI applications.
ARTIFICIAL INTELLIGENCE MACHINE LEARNING

1956 The terminology “Artificial Intelligence” was The terminology “Machine Learning” was first used in 1952 by
1
. originally used by John McCarthy, who also hosted the IBM computer scientist Arthur Samuel, a pioneer in artificial
first AI conference. intelligence and computer games.

AI stands for Artificial intelligence, where intelligence


2 ML stands for Machine Learning which is defined as the
. is defined as the
acquisition of knowledge or skill
ability to acquire and apply knowledge.
3 The aim is to increase the chance of success and not The aim is to increase accuracy, but it does not care about; the
accuracy. success
4 AI will go for finding the optimal solution. ML will go for a solution whether it is optimal or not.
AI can work with structured, semi-structured, and AI can work with structured, semi-structured, and unstructured
unstructured data. data.
5 I’s key uses include- The most common uses of machine learning-
.
• Siri, customer service via chatbots •Facebook’s automatic friend suggestions
•Expert Systems •Google’s search algorithms
•Machine Translation like Google Translate •Banking fraud analysis
•Intelligent humanoid robots such as Sophia, •Stock price forecast
and so on. •Online recommender systems, and so on.
6 AI systems can be designed to work autonomously or
. In contrast, ML algorithms require human involvement to set
with minimal human intervention, depending on the
up, train, and optimize the system. ML algorithms require the
Computer Networks And Internet Protocol - Week 1 Feedback
Form
Machine learning (ML) is a discipline
of artificial intelligence (AI) that
provides machines with the ability to
automatically learn from data and
past experiences while identifying
patterns to make predictions with
minimal human intervention.
Supervised Learning
Supervised learning uses labeled data (data with known answers) to train algorithms to:
Classify Data
Predict Outcomes
Supervised learning can classify data like "What is spam in an e-mail", based on known spam examples.
Supervised learning can predict outcomes like predicting what kind of video you like, based on the videos you have
played.

Unsupervised Learning
Unsupervised learning is used to predict undefined relationships like meaningful patterns in data.
It is about creating computer algorithms than can improve themselves.
It is expected that machine learning will shift to unsupervised learning to allow programmers to solve problems without
creating models.
Reinforcement Learning
Reinforcement learning is based on non-supervised learning but receives feedback from the user whether the decisions is
good or bad. The feedback contributes to improving the model.

Self-Supervised Learning
Self-supervised learning is similar to unsupervised learning because it works with data without human added labels.
The difference is that unsupervised learning uses clustering, grouping, and dimensionality reduction, while self-
Examples of Machine Learning Problems
• Credit Card Fraud Detection: Given credit card transactions for a customer in a month,
identify those transactions that were made by the customer and those that were not. A
program with a model of this decision could refund those transactions that were fraudulent.

• Digit Recognition: Given a zip codes hand written on envelops, identify the digit for each
hand written character. A model of this problem would allow a computer program to read and
understand handwritten zip codes and sort envelops by geographic region.

• Speech Understanding: Given an utterance from a user, identify the specific request made by
the user. A model of this problem would allow a program to understand and make an
attempt to fulfil that request. The iPhone with Siri has this capability.

• Face Detection: Given a digital photo album of many hundreds of digital photographs,
identify those photos that include a given person. A model of this decision process would
allow a program to organize photos by person. Some cameras and software like iPhoto has
this capability.
• Product Recommendation: Given a purchase history for a customer and a large inventory of
products, identify those products in which that customer will be interested and likely to purchase. A
model of this decision process would allow a program to make recommendations to a customer and
motivate product purchases. Amazon has this capability. Also think of Facebook, GooglePlus
and LinkedIn that recommend users to connect with you after you sign-up.
• Medical Diagnosis: Given the symptoms exhibited in a patient and a database of
anonymized patient records, predict whether the patient is likely to have an illness. A model of this
decision problem could be used by a program to provide decision support to medical professionals.
• Stock Trading: Given the current and past price movements for a stock, determine whether the
stock should be bought, held or sold. A model of this decision problem could provide decision
support to financial analysts.
• Customer Segmentation: Given the pattern of behaviour by a user during a trial period and the
past behaviours of all users, identify those users that will convert to the paid version of the product
and those that will not. A model of this decision problem would allow a program to trigger
customer interventions to persuade the customer to covert early or better engage in the trial.
• Shape Detection: Given a user hand drawing a shape on a touch screen and a database of known
shapes, determine which shape the user was trying to draw. A model of this decision would allow a
program to show the platonic version of that shape the user drew to make crisp diagrams. The
Instaviz iPhone app does this.
Differences between Learning vs Designing:
Learning Designing
It is a process by which a system improves It is a process to design a system based on various
performance from the past experience. requirements.
Learning does not require testing. Designing requires testing
It gains experience from past data. It gains experience when data is fed to the design.
It represents the data with the help of various
As such no representation of the data is required.
functions.
At times it preprocesses the data and does
It doesn’t preprocess the data at all.
filtering of noisy data.
It does require a measuring device. It requires a problem description.
Learning skills is required. Designing skills are required.
Clustering, Description, and Regression is used in Decision tree, table are used while the designing
the learning process. process.
Training data vs Testing data

There are two key types of data used for machine learning training and testing
data.
They each have a specific function to perform when building and evaluating
machine learning models.
Machine learning algorithms are used to learn from data in datasets.
They discover patterns and gain knowledge. make choices, and examine those
decisions.
What is Training data?
• Testing data is used to determine the performance of the trained model,
whereas training data is used to train the machine learning model.
What is Testing Data?
• You will need unknown information to test your machine learning
model after it was created (using your training data). This data is
known as testing data, and it may be used to assess the progress and
efficiency of your algorithms’ training as well as to modify or
optimize them for better results.
Features Training Data Testing Data

The machine-learning model is trained using training


Testing data is used to evaluate the
Purpose data. The more training data a model has, the more
model’s performance.
accurate predictions it can make.

Until evaluation, the testing data is not


exposed to the model. This guarantees
By using the training data, the model can gain knowledge
Exposure that the model cannot learn the testing
and become more accurate in its predictions.
data by heart and produce flawless
forecasts.

The distribution of the testing data and


This training data distribution should be similar to the
Distribution the data from the real world differs
distribution of actual data that the model will use.
greatly.

By making predictions on the testing


data and comparing them to the actual
Use To stop overfitting, training data is utilized.
labels, the performance of the model is
assessed.

Size Typically larger Typically smaller


Characteristics of Machine learning tasks
• Predictive modeling: Data is used by machine learning algorithms to create models that
forecast future events. These models can be used to determine the risk of a loan default
or the likelihood that a consumer would make a purchase, among other things.
• Automation: Machine learning algorithms automate the process of finding patterns in
data, requiring less human involvement and enabling more precise and effective analysis.
• Scalability: Machine learning techniques are well suited for processing big data because
they are made to handle massive amounts of data. As a result, businesses can make
decisions based on information gleaned from such data.
• Generalization: Algorithms for machine learning are capable of discovering broad
patterns in data that can be used to analyze fresh, unexplored data. Even though the data
used to train the model may not be immediately applicable to the task at hand, they are
useful for forecasting future events.
• Adaptiveness: As new data becomes available, machine learning algorithms are built to
learn and adapt continuously. As a result, they can enhance their performance over time,
becoming more precise and efficient as more data is made available to them.
Descriptive and predictive analysis.
Descriptive analysis is used to understand the past and predictive
analysis is used to predict the future.
Both of these concepts are important in machine learning because a
clear understanding of the problem and its implications is the best
way to make the right decisions.
The main differences between descriptive and predictive data mining are:
Purpose:
• Descriptive data mining is used to describe the data and identify patterns and relationships.
• Predictive data mining is used to make predictions about future events.
Approach:
• Descriptive data mining involves analyzing historical data to identify patterns and relationships.
• Predictive data mining involves using statistical models and machine learning algorithms to identify patterns
and relationships that can be used to make predictions.
Output:
• Descriptive data mining produces summaries and visualizations of the data.
• Predictive data mining produces models that can be used to make predictions.
Timeframe:
Descriptive data mining is focused on analyzing historical data.
Predictive data mining is focused on making predictions about future events.
• Applications:
• Descriptive data mining is used in applications such as market segmentation, customer profiling, and product
recommendation.
• Predictive data mining is used in applications such as fraud detection, risk assessment, and demand
forecasting.
Components of Learning
Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four
components, namely, data storage, abstraction, generalization and evaluation.
Figure 1.1 illustrates the various components and the steps involved in the learning process
1. Data storage Facilities for storing and retrieving huge amounts of data are an important component of the
learning process.
Humans and computers alike utilize data storage as a foundation for advanced reasoning. • In a human being, the
data is stored in the brain and data is retrieved using electrochemical signals. • Computers use hard disk drives,
flash memory, random access memory and similar devices to store data and use cables and other technology to
retrieve data.
2. Abstraction
Abstraction is the process of extracting knowledge about stored data. This involves creating general concepts
about the data as a whole. The creation of knowledge involves application of known models and creation of new
models. The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.
3. Generalization
The term generalization describes the process of turning the knowledge about stored data into a form that can
be utilized for future action. These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to discover those properties of the data that will
be most relevant to future tasks.
4. Evaluation Evaluation is the last component of the learning process. It is the process of giving feedback to the
user to measure the utility of the learned knowledge. This feedback is then utilised to effect improvements in the
whole learning process
Learning Models
• Machine learning is concerned with using the right features to build the right
models that achieve the right tasks. The basic idea of Learning models has divided
into three categories.

 Using a Logical expression. (Logical models)


 Using the Geometry of the instance space. (Geometric models)
 Using Probability to classify the instance space. (Probabilistic models)
 Grouping and Grading
Logical models
• Logical models use a logical expression to divide the instance space into segments and
hence construct grouping models. A logical expression is an expression that returns a
Boolean value, i.e., a True or False outcome. Once the data is grouped using a logical
expression, the data is divided into homogeneous groupings for the problem we are
trying to solve. For example, for a classification problem, all the instances in the group
belong to one class.
There are mainly two kinds of logical models:

 Tree models
 Rule models.
Rule models consist of a collection of implications or IF-THEN rules.
For tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the
behaviour of the model for this segment.
Geometric Models

In Geometric models, features could be described as points in two


dimensions (x- and y-axis) or a three-dimensional space (x, y, and z). In
geometric models,
Two ways we could impose similarity.
We could use geometric concepts like lines or planes to segment (classify)
the instance space. These are called Linear models
• Alternatively, we can use the geometric notion of distance to represent
similarity. In this case, if two points are close together, they have similar
values for features and thus can be classed as similar. We call such models as
Distance-based models
Example
Linear Regression
• Linear regression is a type of supervised machine learning algorithm
that computes the linear relationship between a dependent variable and
one or more independent features.
• When the number of the independent feature, is 1 then it is known as
Univariate Linear regression, and in the case of more than one feature,
it is known as multivariate linear regression.
• The goal of the algorithm is to find the best Fit Line equation that
can predict the values based on the independent variables.
The linear regression model gives a sloped
straight line describing the relationship within
the variables.
The above graph presents the linear relationship
between the dependent variable and
independent variables. When the value of x
(independent variable) increases, the value of
y (dependent variable) is likewise increasing.
The red line is referred to as the best fit straight
line. Based on the given data points, we try to
plot a line that models the points the best.
Y= a + bX, where Y is the dependent
variable (that’s the variable that goes on
the Y axis), X is the independent variable
(i.e. it is plotted on the X axis),
b is the slope of the line and a is the
y-intercept.
Probabilistic Models
• Probabilistic models see features and target variables as random variables.
• The process of modelling represents and manipulates the level of uncertainty with respect to these
variables.

There are two types of probabilistic models:


Predictive and Generative.

• Predictive probability models use the idea of a conditional probability distribution P (Y |X) from
which Y can be predicted from X

• Generative models estimate the joint distribution P (Y, X). Once we know the joint distribution for
the generative models, we can derive any conditional or marginal distribution involving the same
variables Probabilistic models use the idea of probability to classify new entities Naïve Bayes is an
example of a probabilistic classifier.
What is Bayes Theorem in ML?
• ML Bayes Theorem:

• One of the most recent developments in artificial intelligence is machine learning.


• The Bayes Theorem is the key idea in machine learning.
• The Bayes theorem is frequently referred to as the Bayes rule or Bayes Law.
• One of the most well-known theories in machine learning, the Bayes theorem helps determine
the likelihood that one event will occur with unclear information while another has already
happened.
• The mathematical formulation of the Bayes theorem is
Naïve Bayes Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis
of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features
Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.


P(B) is Marginal Probability: Probability of Evidence
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
• Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we
need to follow the below steps:

1.Convert the given dataset into frequency tables.


2.Generate Likelihood table by finding the probabilities of given features.
3.Now, use Bayes theorem to calculate the posterior probability.

• Problem: If the weather is sunny, then the Player should play or not?
Frequency table for the Weather Conditions:
Outlook Play
Weather Yes No
Overcast 5 0
0 Rainy Yes
Rainy 2 2
1 Sunny Yes
Sunny 3 2
2 Overcast Yes
Total 10 5
3 Overcast Yes
4 Sunny No Likelihood table weather condition:
5 Rainy Yes Weather No Yes
6 Sunny Yes Overcast 0 5 5/14= 0.35
7 Overcast Yes Rainy 2 2 4/14=0.29
8 Rainy No Sunny 2 3 5/14=0.35
9 Sunny No
All 4/14=0.29 10/14=0.71
10 Sunny Yes
11 Rainy No Applying Bayes'theorem:
12 Overcast Yes P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

13 Overcast Yes P(No|Sunny)=


P(Sunny|No)*P(No)/P(Sunny)
Advantages of Naïve Bayes Classifier:

• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other Algorithms.
• It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

• Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.

Applications of Naïve Bayes Classifier:

• It is used for Credit Scoring.


• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment analysis.
Distance Metrics Used in Machine
Learning
• Distance metrics are a key part of several machine learning algorithms
.
• These distance metrics are used in both supervised and unsupervised
learning, generally to calculate the similarity between data points.
• An effective distance metric improves the performance of our machine
learning model, whether that’s for classification tasks or clustering.
• Let’s say you need to create clusters using a clustering algorithm such
as K-Means Clustering or k-nearest neighbor algorithm (knn), which
uses nearest neighbors to solve a classification or regression problem.
1.Euclidean Distance
2.Manhattan Distance
3.Minkowski Distance
4.Hamming Distance
Feature Transformation Techniques in Machine Learning

Most machine learning algorithms are statistics dependent, meaning that


all of the algorithms are indirectly using a statistical approach to solve
the complex problems in the data
But for the algorithm scenario, a normal distribution of the data can not
be desired every time with every type of dataset, which means the data
which is not normally distributed needs preprocessing and cleaning
before applying the machine learning algorithm to it.
The transformers are the type of functions that are applied to data that is
not normally distributed, and once applied there is a high of getting
normally distributed data.
There are 3 types of Feature transformation techniques:
1.Function Transformers
2.Power Transformers
3.Quantile Transformers
• Function Transformers
• Function transformers are the type of feature transformation technique that uses a
particular function to transform the data to the normal distribution. Here the
particular function is applied to the data observations.
• there are 5 types of function transformers that are used and which also solve the
issue of normal distribution almost every time.
1.Log Transform
2.Square Transform
3.Square Root Transform
4.Reciprocal Transform
5.Custom Transform
Algorithms for Supervised Learning
There are several algorithms available for supervised learning. Some of
the widely used algorithms of supervised learning are as shown below:
• k-Nearest Neighbours
• Decision Trees
• Naive Bayes
• Logistic Regression
• Support Vector Machines
Log Transform
• Log transform is one of the simplest transformations on the data in which
the log is applied to every single distribution of the data and the result
from the log is considered the final day to feed the machine learning
algorithms.
log transforms performs so well on the right-skewed data. It transforms the
right-skewed data into normally distributed data so well.

from sklearn.preprocessing import


FunctionTransformer
transform = FunctionTransformer(func=np.log1p)
transformed_data = transform.fit_transform(data)
Square Transform
• Square transform is the type of transformer in which the square of the data is considered instead of the normal data. In
simple words, in this transformed the data is applied with the square function, where the square of every single observation
will be considered as the final transformed data.
import numpy as np
tranformed_data = np.square(data)
Square Root Transform
• In this transform, the square root of the data is calculated. This transform performs so well on the left-skewed data and
efficiently transformed the left-skewed data into normally distributed data.
import numpy as np
tranformed_data = np.sqrt(data)
Reciprocal Transform
• In this transformation, the reciprocal of every observation is considered. This transform is useful in some of the datasets as
the reciprocal of the observations works well to achieve normal distributions.
import numpy as np
tranformed_data = np.reciprocal(data)
Custom Transforms
• In every dataset, the log and square root transforms can not be used, as every data can have different patterns and
complexity. Based on the domain knowledge of the data, custom transformations can be applied to transform the data into a
normal distribution. The custom transforms here can be any function or parameter like sin, cos, tan, cube, etc.
Some broad categories of models:
Geometric models Probabilistic models Logical models
E.g. K-nearest neighbors, linear
regression, support vector
machine, logistic regression, …
Naïve Bayes, Gaussian process
regression, conditional random

You might also like