0% found this document useful (0 votes)
7 views60 pages

Lecture 2 Unit 1

The document provides an overview of the Machine Learning life cycle, detailing its seven major steps: Gathering Data, Data Preparation, Data Wrangling, Data Analysis, Train Model, Test Model, and Deployment. It also categorizes Machine Learning into four types: Supervised, Unsupervised, Semi-Supervised, and Reinforcement Learning, explaining their methodologies and applications. Additionally, it discusses various algorithms used in classification and regression tasks, along with their advantages and disadvantages.

Uploaded by

for181fun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views60 pages

Lecture 2 Unit 1

The document provides an overview of the Machine Learning life cycle, detailing its seven major steps: Gathering Data, Data Preparation, Data Wrangling, Data Analysis, Train Model, Test Model, and Deployment. It also categorizes Machine Learning into four types: Supervised, Unsupervised, Semi-Supervised, and Reinforcement Learning, explaining their methodologies and applications. Additionally, it discusses various algorithms used in classification and regression tasks, along with their advantages and disadvantages.

Uploaded by

for181fun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

The Yenepoya Institute of Arts, Science, Commerce and

Management (YIASCM)

Course: IV Semester BCA ( All specializations) & BSc

Introduction to Machine
Learning
Lecture 3: Unit-1:- Introduction to Machine Learning
Objective
• At the end of this session the learner will able to understand
⮚ Machine learning Life cycle
⮚ Types of Machine Learning
• Summary
• Reference
Machine learning Life cycle
• Machine learning has given the computer systems the abilities to
automatically learn without being explicitly programmed.
• But how does a machine learning system work?
• So, it can be described using the life cycle of machine learning.
• Machine learning life cycle is a cyclic process to build an efficient
machine learning project.
• The main purpose of the life cycle is to find a solution to the problem
or project.
• Machine learning life cycle involves seven major steps, which are
given below:
Gathering Data
• Data Gathering is the first step of the machine learning life cycle.
• The goal of this step is to identify and obtain all data-related problems.
• In this step, we need to identify the different data sources, as data can
be collected from various sources such as files, database, internet,
or mobile devices.
• It is one of the most important steps of the life cycle.
• The quantity and quality of the collected data will determine the
efficiency of the output.
• The more will be the data, the more accurate will be the prediction.
• This step includes the below tasks
1. Identify various data sources
2. Collect data
3. Integrate the data obtained from different sources
Data preparation
• After collecting the data, we need to prepare it for further steps.
• Data preparation is a step where we put our data into a suitable place
and prepare it to use in our machine learning training.
• In this step, first, we put all data together, and then randomize the
ordering of data.
• This step can be further divided into two processes
❖Data Exploration:
It is used to understand the nature of data that we have to work with.
We need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we
find Correlations, general trends, and outliers.
❖Data Preprocessing:
Now the next step is preprocessing of data for its analysis.
Data Wrangling
• Data wrangling is the process of cleaning and converting raw data into a
useable format.
• It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for
analysis in the next step.
• It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.
• It is not necessary that data we have collected is always of our use as
some of the data may not be useful.
• In real-world applications, collected data may have various issues,
including:
✔Missing Values
✔Duplicate data
✔Invalid data
✔Noise
Data Analysis
• Now the cleaned and prepared data is passed on to the analysis step.
• This step involves:
✔Selection of analytical techniques
✔Building models
✔Review the result
• The aim of this step is to build a machine learning model to analyze the
data using various analytical techniques and review the outcome.
• It starts with the determination of the type of the problems, where we
select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build
the model using prepared data, and evaluate the model.
Train Model
• Now the next step is to train the model, in this step we train our
model to improve its performance for better outcome of the problem.
• We use datasets to train the model using various machine learning
algorithms.
• Training a model is required so that it can understand the various
patterns, rules, and, features.
Test Model
• Once our machine learning model has been trained on a given
dataset, then we test the model.
• In this step, we check for the accuracy of our model by providing a
test dataset to it.
• Testing the model determines the percentage accuracy of the model
as per the requirement of project or problem.
Deployment
• The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
• If the above-prepared model is producing an accurate result as per
our requirement with acceptable speed, then we deploy the model in
the real system.
• But before deploying the project, we will check whether it is
improving its performance using available data or not.
• The deployment phase is similar to making the final report for a
project.
Types of Machine Learning
• Machine learning is a subset of AI, which enables the machine to
automatically learn from data, improve performance from past
experiences, and make predictions.
• Machine learning contains a set of algorithms that work on a huge
amount of data.
• Data is fed to these algorithms to train them, and on the basis of
training, they build the model & perform a specific task.
• These ML algorithms help to solve different business problems like
Regression, Classification, Forecasting, Clustering, and Associations,
etc.
• Based on the methods and way of learning, machine learning is divided
into mainly four types, which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Machine Learning
• Supervised machine learning is based on supervision.
• It means in the supervised learning technique, we train the machines
using the "labelled" dataset, and based on the training, the machine
predicts the output.
• The labelled data specifies that some of the inputs are already
mapped to the output.
• More preciously, we can say; first, we train the machine with the
input and corresponding output, and then we ask the machine to
predict the output using the test dataset
• Suppose we have an input dataset of cats and dog images.
• So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes,
colour, height (dogs are taller, cats are smaller), etc.
• After completion of training, we input the picture of a cat and ask the
machine to identify the object and predict the output.
• Now, the machine is well trained, so it will check all the features of the
object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a
cat. So, it will put it in the Cat category.
• This is the process of how the machine identifies the objects in Supervised
Learning.
• The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y).
• Some real-world applications of supervised learning are Risk Assessment,
Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
• Supervised machine learning can be classified into two types of
problems, which are given below:
1. Classification
2. Regression
Classification
• Classification is a process of categorizing data or objects into
predefined classes or categories based on their features or attributes.
• In machine learning, classification is a type of supervised
learning technique where an algorithm is trained on a labeled dataset
to predict the class or category of new, unseen data.
• The main objective of classification is to build a model that
can accurately assign a label or category to a new
observation based on its features.
• For example, a classification model might be trained on a
dataset of images labeled as either dogs or cats and then
used to predict the class of new, unseen images of dogs or
cats based on their features such as color, texture, and
shape.
• Some real-world examples of classification algorithms
are Spam Detection, Email filtering, etc.
Types of Classification
• Classification is of two types:
a. Binary Classification
b. Multiclass Classification
Binary Classification
• In binary classification, the goal is to classify
the input into one of two classes or
categories.
• Example – On the basis of the given health
conditions of a person, we have to
determine whether the person has a
certain disease or not.
Multiclass Classification
• In multi-class classification, the goal is to
classify the input into one of several
classes or categories.
• For Example – On the basis of data about
different species of flowers, we have to
determine which specie our observation
belongs to.
Types of classification algorithms
• There are various types of classifiers.
⮚ Linear Classifiers
⮚ Non-linear Classifiers
Linear Classifiers
• Linear models create a linear decision boundary between classes.
They are simple and computationally efficient.

Non-linear Classifiers
• Non-linear models create a non-linear decision boundary between
classes.
• They can capture more complex relationships between the input
features and the target variable.
Popular classification algorithms
(Detail study in Upcoming Units)
• We will learn these algorithms in upcoming classes
1. Random Forest Algorithm
2. Decision Tree Algorithm
3. Logistic Regression Algorithm
4. Support Vector Machine Algorithm
5. K-Nearest Neighbours
6. Naive Bayes
Regression
• Regression algorithms are used to solve regression problems in which
there is a linear relationship between input and output variables.
• These are used to predict continuous output variables, such as market
trends, weather prediction, etc
Popular Regression algorithms
1. Simple Linear Regression Algorithm
2. Multivariate Regression Algorithm
3. Decision Tree Algorithm
4. Lasso Regression
Advantages of Supervised Learning
• Since supervised learning work with the labelled dataset so we can
have an exact idea about the classes of objects.
• These algorithms are helpful in predicting the output on the basis of
prior experience.
Disadvantages of Supervised Learning
• These algorithms are not able to solve complex tasks.
• It may predict the wrong output if the test data is different from the
training data.
• It requires lots of computational time to train the algorithm.
Applications of Supervised Learning
1. Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In
this process, image classification is performed on different image
data with predefined labels.
2. Medical Diagnosis:
Supervised algorithms are also used in the medical field for
diagnosis purposes. It is done by using medical images and past
labelled data with labels for disease conditions. With such a
process, the machine can identify a disease for the new patients.
3. Fraud Detection - Supervised Learning classification algorithms are
used for identifying fraud transactions, fraud customers, etc. It is done
by using historic data to identify the patterns that can lead to possible
fraud.
4. Spam detection - In spam detection & filtering, classification
algorithms are used. These algorithms classify an email as spam or not
spam. The spam emails are sent to the spam folder.
5. Speech Recognition - Supervised learning algorithms are also used in
speech recognition. The algorithm is trained with voice data, and
various identifications can be done using the same, such as voice-
activated passwords, voice commands, etc.
Unsupervised Machine Learning
• Unsupervised learning is different from the Supervised learning
technique; as its name suggests, there is no need for supervision.
• It means, in unsupervised machine learning, the machine is
trained using the unlabeled dataset, and the machine predicts
the output without any supervision.
• In unsupervised learning, the models are trained with the data
that is neither classified nor labelled, and the model acts on that
data without any supervision.
• The main aim of the unsupervised learning algorithm is to group or
categories the unsorted dataset according to the similarities, patterns,
and differences.
• Machines are instructed to find the hidden patterns from the input
dataset.
• Let's take an example to understand it more preciously; suppose there is
a basket of fruit images, and we input it into the machine learning
model.
• The images are totally unknown to the model, and the task of the
machine is to find the patterns and categories of the objects.
• So, now the machine will discover its patterns and differences, such as
colour difference, shape difference, and predict the output when it is
tested with the test dataset.
Categories of Unsupervised Machine
Learning
• Unsupervised Learning can be further classified into two type
1. Clustering
2. Association
Clustering
• The clustering technique is used when we want to find the inherent
groups from the data.
• It is a way to group the objects into a cluster such that the objects
with the most similarities remain in one group and have fewer or no
similarities with the objects of other groups.
• An example of the clustering algorithm is grouping the customers by
their purchasing behaviour.
Popular clustering algorithms (Detail study in
Upcoming Units)
1. K-Means Clustering algorithm
2. Mean-shift algorithm
3. DBSCAN Algorithm
4. Principal Component Analysis
5. Independent Component Analysis
Association
• Association rule learning is an unsupervised learning technique, which
finds interesting relations among variables within a large dataset.
• The main aim of this learning algorithm is to find the dependency of one
data item on another data item and map those variables accordingly so
that it can generate maximum profit.
• This algorithm is mainly applied in Market Basket analysis, Web usage
mining, continuous production, etc.
• Some popular algorithms of Association rule learning are
1. Apriori Algorithm
2. Eclat
3. FP-growth algorithm.
Advantages Unsupervised Learning
• These algorithms can be used for complicated tasks compared to the
supervised ones because these algorithms work on the unlabeled
dataset.
• Unsupervised algorithms are preferable for various tasks as getting
the unlabeled dataset is easier as compared to the labelled dataset.
Disadvantages Unsupervised Learning
• The output of an unsupervised algorithm can be less accurate as the
dataset is not labelled, and algorithms are not trained with the exact
output in prior.
• Working with Unsupervised learning is more difficult as it works with
the unlabelled dataset that does not map with the output.
Applications of Unsupervised Learning
• Network Analysis: Unsupervised learning is used for identifying
plagiarism and copyright in document network analysis of text data
for scholarly articles.
• Recommendation Systems: Recommendation systems widely use
unsupervised learning techniques for building recommendation
applications for different web applications and e-commerce websites.
• Anomaly Detection: Anomaly detection is a popular application of
unsupervised learning, which can identify unusual data points within
the dataset. It is used to discover fraudulent transactions.
• Singular Value Decomposition: Singular Value Decomposition or SVD
is used to extract particular information from the database. For
example, extracting information of each user located at a particular
location.
Semi-Supervised Learning
• Semi-Supervised learning is a type of Machine Learning algorithm
that lies between Supervised and Unsupervised machine learning.
• It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled
training data) algorithms and uses the combination of labelled and
unlabeled datasets during the training period
• To overcome the drawbacks of supervised learning and unsupervised
learning algorithms, the concept of Semi-supervised learning is
introduced.
• The main aim of semi-supervised learning is to effectively use all the
available data, rather than only labelled data like in supervised learning.
• Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabeled data into labelled
data.
• It is because labelled data is a comparatively more expensive acquisition
than unlabeled data.
• We can imagine these algorithms with an example.
• Supervised learning is where a student is under the supervision of an
instructor at home and college.
• Further, if that student is self-analysing the same concept without any
help from the instructor, it comes under unsupervised learning.
• Under semi-supervised learning, the student has to revise himself
after analyzing the same concept under the guidance of an instructor
at college.
Advantages Semi-supervised Learning
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised
Learning algorithms.
Disadvantages Semi-supervised Learning
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level data.
• Accuracy is low.
Reinforcement Learning
• Reinforcement learning works on a feedback-based process, in which an
AI agent (A software component) automatically explore its surrounding by
hitting & trail, taking action, learning from experiences, and improving its
performance.
• Agent gets rewarded for each good action and get punished for each bad
action; hence the goal of reinforcement learning agent is to maximize the
rewards.
• In reinforcement learning, there is no labelled data like supervised learning,
and agents learn from their experiences only.
• The reinforcement learning process is similar to a human being; for
example, a child learns various things by experiences in his day-to-day
life.
• An example of reinforcement learning is to play a game, where the Game
is the environment, moves of an agent at each step define states, and the
goal of the agent is to get a high score.
• Agent receives feedback in terms of punishment and rewards.
• Due to its way of working, reinforcement learning is employed in
different fields such as Game theory, Operation Research, Information
theory, multi-agent systems.
• A reinforcement learning problem can be formalized using Markov
Decision Process(MDP). In MDP, the agent constantly interacts with the
environment and performs actions; at each action, the environment
responds and generates a new state.
Categories of Reinforcement Learning
1. Positive Reinforcement Learning: Positive reinforcement learning
specifies increasing the tendency that the required behaviour would
occur again by adding something. It enhances the strength of the
behaviour of the agent and positively impacts it.
2. Negative Reinforcement Learning: Negative reinforcement learning
works exactly opposite to the positive RL. It increases the tendency
that the specific behaviour would occur again by avoiding the
negative condition.
Real-world Use cases of Reinforcement
Learning
• Video Games:
RL algorithms are much popular in gaming applications. It is used to
gain super-human performance. Some popular games that use RL
algorithms are AlphaGO and AlphaGO Zero.
• Resource Management:
The "Resource Management with Deep Reinforcement Learning"
paper showed that how to use RL in computer to automatically learn
and schedule resources to wait for different jobs in order to minimize
average job slowdown.
• Robotics:
RL is widely being used in Robotics applications. Robots are used in
the industrial and manufacturing area, and these robots are made
more powerful with reinforcement learning. There are different
industries that have their vision of building intelligent robots using AI
and Machine learning technology.
• Text Mining
Text-mining, one of the great applications of NLP, is now being
implemented with the help of Reinforcement Learning by Salesforce
company.
Advantages Reinforcement Learning
• It helps in solving complex real-world problems which are difficult to
be solved by general techniques.
• The learning model of RL is similar to the learning of human beings;
hence most accurate results can be found.
• Helps in achieving long term results.
Disadvantages Reinforcement Learning
• RL algorithms are not preferred for simple problems.
• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an overload of states
which can weaken the results.
Summary
• The machine learning life cycle encompasses problem definition, data
collection, preprocessing, feature engineering, model selection,
training, evaluation, tuning, deployment, and ongoing monitoring.
• This iterative process ensures the development and maintenance of
effective machine learning models.
• Types of Machine Learning
1. Supervised Learning: Trains on labeled data for predictive modeling.
2. Unsupervised Learning: Discovers patterns in unlabeled data without
predefined outputs.
3. Semi-Supervised Learning: Uses a combination of labeled and unlabeled
data for training.
4. Reinforcement Learning: Learns optimal actions through interaction with
an environment and feedback.
5. Deep Learning: Utilizes neural networks with multiple layers to
automatically learn hierarchical representations.
6. Transfer Learning: Pre-trains on one task and fine-tunes for a related task,
leveraging existing knowledge.
Reference
• “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools and Techniques to Build Intelligent System” by Aurelien
Geron. (Publisher: O’Reilly Media, Year :2019)
• “ Pattern Recognition and Machine Learning” by Christopher M. Bishop.
(Publisher: Springer, Year: 2006)
• “Machine Learning: A Probabilistics Perspective” by Kevin P Murphy.
(Publisher: The MIT Press, Year: 2012)
• “Python Machine Learning” by Sebastine Rashka and Vahid Mirjalili.
(Publisher: Packt Publishing, Year: 2019)

You might also like