0% found this document useful (0 votes)

182 views34 pages

ML - CSA 301 - ML Perspective and Issues

The document discusses machine learning techniques from the perspective of an assistant professor. It describes three types of machine learning - supervised learning, unsupervised learning, and reinforcement learning. It provides examples of each type including predicting heart disease risk, handwritten digit recognition, and an AI that plays chess. The document also discusses common issues in machine learning like having inadequate or poor quality training data, and data that is not representative of real-world cases.

Uploaded by

Shatakshi sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views34 pages

ML - CSA 301 - ML Perspective and Issues

Uploaded by

Shatakshi sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Amity School of Engineering & Technology

Dr. Kuldeep N. Tripathi

Assistant Professor
ASET-CSE

CSA 301 :MACHINE LEARNING TECHNIQUES

1
Amity School of Engineering & Technology

Machine Learning: Perspectives

2
Amity School of Engineering & Technology

Machine Learning: Perspective

• It involves searching a very large space of possible hypothesis to determine the
one that best fits the observed data.

• The goal in the machine learning is to recognize the pattern in the dataset, in
general manner. After you recognize the patterns, you can use this information to
model the data, to interpret the data, or to predict the outcome of the new data
which hasn’t seen before.

• Machine learning is a subfield of artificial intelligence and machine learning

algorithms are used in other related fields like natural language processing and
computer vision.

• In general, there are three types of learning and these are supervised learning,
unsupervised learning, and reinforcement learning. Their names tell the main idea
behind them actually.

3
Amity School of Engineering & Technology

Supervised learning Machine Learning:

• In supervised learning, your system learns under the supervision of the data
outputs so supervised algorithms are preferred if your dataset contains output
information.

• Let me give you an example in there. Let’s assume you have a medical statistic
company and you have a dataset which contains patients’ features like blood
pressure, sugar rate in their blood, heart rate per minute, etc. and also you have
the information about if they have experienced heart disease in their life or not.

• By training a machine learning algorithm, your system can find a pattern between
features and the probability to experience heart disease. Therefore your algorithm
can predict whether a new patient has a risk to experience a heart disease, so
doctor takes the precautions and save a person’s life.

4
Amity School of Engineering & Technology

Supervised Learning Example

5
Amity School of Engineering & Technology
Supervised
Learning
Example

6
Amity School of Engineering & Technology

Unsupervised Machine Learning:

• You prefer to use unsupervised algorithms if your data doesn’t contain output and
if you would like to discover the clusters in dataset.

• A good example of unsupervised learning is handwritten digit recognition. In this

application you know that there should be 10 clusters {0,1,2,3,4,5,6,7,8,9} but the
problem in handwritten digits is that there are countless ways to write a digit by
hand, and everyone write digits differently.

• How does a computer understand what is written with hand? In there, you should
use an unsupervised algorithm like K-means or EM-algorithm.

• What you do with these algorithms is that you start with initial random cluster
means and iteratively these mean points converge to real cluster mean values.

7
Amity School of Engineering & Technology

Un-Supervised Learning Example

8
Amity School of Engineering & Technology

9
Amity School of Engineering & Technology

Machine Learning: Perspective (Contd…)

• After you complete the training, if you visualize the means of the clusters you can
see that they really look like digits. Then you label these clusters with
corresponding digits, and when the computer encounters a new handwritten digit,
algorithm labels the digit with the mean which is closest to it.

• Lastly let’s talk about reinforcement learning. Let’s assume you want to create an
intelligent agent which plays chess.

• In chess, you can’t handle movements one by one. Your agent should consider a
series of movements and then decide to take an action which would maximize the
utility.

• Therefore your agent should play a couple of turns against itself and decide the
best action to take. We call this type of learning as reinforcement learning and it is
generally used in games.

10
Amity School of Engineering & Technology

11
Amity School of Engineering & Technology

12
Amity School of Engineering & Technology

Issues in Machine Learning

13
Amity School of Engineering & Technology

Issues

• Which algorithm performs best for which types of problems & representation?
• How much training data is sufficient?
• Can prior knowledge be helpful even when it is only approximately correct?
• The best strategy for choosing a useful next training experience.
• What specific function should the system attempt to learn?
• How can learner automatically alter it’s representation to improve it’s ability to
represent and learn the target function?
Amity School of Engineering & Technology

Issues in Machine Learning:

• In Machine Learning, there occurs a process of analyzing data for building or training models.
It is just everywhere; from Amazon product recommendations to self-driven cars, it beholds
great value throughout.

• Although machine learning is being used in every industry and helps organizations make
more informed and data-driven choices that are more effective than classical methodologies,
it still has so many problems that cannot be ignored.

• There are a lot of challenges that machine learning professionals face to inculcate ML skills
and create an application from scratch. Here are some common issues in Machine Learning
that professionals face to inculcate ML skills and create an application from scratch.

1. Inadequate Training Data / Poor Quality of Data: Data plays a significant role in the
machine learning process. One of the significant issues that machine learning professionals
face is the absence of good quality data.

• The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data.

15
Amity School of Engineering & Technology

Issues in Machine Learning:

• Although data plays a vital role in the processing of machine learning algorithms,
many data scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting for the machine learning algorithms.

• Unclean and noisy data can make the whole process extremely exhausting. We
don’t want our algorithm to make inaccurate or faulty predictions.

• Hence the quality of data is essential to enhance the output.

• Therefore, we need to ensure that the process of data preprocessing which

includes removing outliers, filtering missing values, and removing unwanted
features, is done with the utmost level of perfection.

16
Amity School of Engineering & Technology

Inadequate Training Data / Poor Quality of Data: :

• For example, a simple task requires thousands of sample data, and an advanced
task such as speech or image recognition needs millions of sample data examples.

• Further, data quality is also important for the algorithms to work ideally, but the
absence of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:

* Noisy Data- Noisy data are data with a large amount of additional meaningless
information called noise. It is responsible for an inaccurate prediction that affects
the decision as well as accuracy in classification tasks.

* Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.

* Generalizing of output data- Sometimes, it is also found that generalizing output

data becomes complex, which results in comparatively poor future actions.
17
Amity School of Engineering & Technology

18
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

2. Non-representative training data

• To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring.

• Further, if we are using non-representative training data in the model, it results in less
accurate predictions.

• A machine learning model is said to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data, then there will be a sampling noise
in the model, called the non-representative training set. It won't be accurate in predictions.
To overcome this, it will be biased against one class or a group.

• Hence, we should use representative data in training to protect against being biased and
make accurate predictions without any drift.

19
Amity School of Engineering & Technology

Issues in Machine Learning:

3. Underfitting of Training Data: This process occurs when data is unable to establish an
accurate relationship between input and output variables. It simply means trying to fit in
undersized jeans. It signifies the data is too simple to establish a precise relationship.

• Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.

• In such scenarios, the complexity of the model destroys, and rules of the machine learning
model become too easy to be applied on this data set, and the model starts doing wrong
predictions as well.

• Underfitting occurs when our model is too simple to understand the base structure of the
data, just like an undersized pant. This generally happens when we have limited data into the
data set, and we try to build a linear model with non-linear data.

20
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to reduce Underfitting:

• Maximize the training time

• Enhance the complexity of the model
• Add more features to the data
• Reduce regular parameters
• Remove noise from the data
• Trained on increased and better features
• Reduce the constraints
• Increasing the training time of model
• Increase the number of epochs to get better results.

21
Amity School of Engineering & Technology

Issues in Machine Learning:

4. Overfitting of Training Data: Overfitting refers to a machine learning model trained with a
massive amount of data that negatively affect its performance. It is like trying to fit in Oversized
jeans. Unfortunately, this is one of the significant issues faced by machine learning professionals.

• This means that the algorithm is trained with noisy and biased data, which will affect its
overall performance.

• Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists.

• Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.

• The main reason behind overfitting is using non-linear methods used in machine learning
algorithms as they build non-realistic data models.

22
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to reduce overfitting:

• Increase training data in a dataset.

• Reduce model complexity by simplifying the model by selecting one with fewer
parameters
• Ridge Regularization and Lasso Regularization
• Early stopping during the training phase
• Reduce the noise
• Reduce the number of attributes in training data.
• Constraining the model.
• Analyzing the data with the utmost level of perfection
• Use data augmentation technique
• Remove outliers in the training set
• Select a model with lesser features

23
Amity School of Engineering & Technology

Issues in Machine Learning:

• Let’s understand this with the help of an example. Let’s consider a model trained to
differentiate between a cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats,
1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will
identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased;
hence the prediction was negatively affected.

5. Machine Learning is a Complex Process: The machine learning industry is young and is
continuously changing. Rapid hit and trial experiments are being carried on.

• The process is transforming, and hence there are high chances of error which makes the
learning complex.

• It includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, and a lot more.

• Hence it is a really complicated process which is another big challenge for Machine learning
professionals.
24
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

• The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists.

• However, Machine Learning and Artificial Intelligence are very new technologies but are still
in an experimental phase and continuously being changing over time.

• There is the majority of hits and trial experiments; hence the probability of error is higher
than expected.

• Further, it also includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated and quite
tedious.

25
Amity School of Engineering & Technology

Issues in Machine Learning:

6. Lack of Training Data: The most important task you need to do in the machine learning
process is to train the data to achieve an accurate output. Less amount training data will produce
inaccurate or too biased predictions. Let us understand this with the help of an example.

• Consider a machine learning algorithm similar to training a child. One day you decided to
explain to a child how to distinguish between an apple and a watermelon. You will take an
apple and a watermelon and show him the difference between both based on their color,
shape, and taste.

• In this way, soon, he will attain perfection in differentiating between the two. But on the
other hand, a machine-learning algorithm needs a lot of data to distinguish.

• For complex problems, it may even require millions of data to be trained. Therefore we need
to ensure that Machine learning algorithms are trained with sufficient amounts of data.

26
Amity School of Engineering & Technology

Issues in Machine Learning:

7. Slow Implementation: The machine learning models are highly efficient in providing
accurate results, but it takes a tremendous amount of time.

• Slow programs, data overload, and excessive requirements usually take a lot of time to
provide accurate results. Further, it requires constant monitoring and maintenance to deliver
the best output.

• This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-consuming.

• Slow programming, excessive requirements' and overloaded data take more time to provide
accurate results than expected. This needs continuous maintenance and monitoring of the
model for delivering accurate results.

27
Amity School of Engineering & Technology

Issues in Machine Learning:

8. Imperfections in the Algorithm When Data Grows: So you have found quality data,
trained it amazingly, and the predictions are really concise and accurate.

• The model may become useless in the future as data grows.

• The best model of the present may become inaccurate in the coming Future and require
further rearrangement. So you need regular monitoring and maintenance to keep the
algorithm working.

• This is one of the most exhausting issues faced by machine learning professionals.

28
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

9. Monitoring and maintenance
• As we know that generalized output data is mandatory for any machine learning
model; hence, regular monitoring and maintenance become compulsory for the
same. Different results for different actions require data change; hence editing of
codes as well as resources for monitoring them also become necessary.

10. Getting bad recommendations

• A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model.
• Let's understand with an example where at a specific time customer is looking for
some gadgets, but now customer requirement changed over time but still machine
learning model showing same recommendations to the customer while customer
expectation has been changed. This incident is called a Data Drift.
• It generally occurs when new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and monitoring data
according to the expectations.

29
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

11. Lack of skilled resources
• Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others. The absence of skilled resources in
the form of manpower is also an issue.

• Hence, we need manpower having in-depth knowledge of mathematics, science, and

technologies for developing and managing scientific substances for machine learning.

12. Customer Segmentation

• Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the model
and who don't even check them.

• Hence, an algorithm is necessary to recognize the customer behavior and trigger a relevant
recommendation for the user based on past experience.

30
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

13. Data Bias

• Data Biasing is also found a big challenge in Machine Learning. These errors exist
when certain elements of the dataset are heavily weighted or need more
importance than others.

• Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors.

• However, we can resolve this error by determining where data is actually biased in
the dataset. Further, take necessary steps to reduce it.

31
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to remove Data Bias:

• Research more for customer segmentation.

• Be aware of your general use cases and potential outliers.

• Combine inputs from multiple sources to ensure data diversity.

• Include bias testing in the development process.

• Analyze data regularly and keep tracking errors to resolve them easily.

• Review the collected and annotated data.

• Use multi-pass annotation such as sentiment analysis, content moderation, and intent
recognition.

32
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

14. Irrelevant features

• Although machine learning models are intended to give the best possible outcome,
if we feed garbage data as input, then the result will also be garbage.

• Hence, we should use relevant features in our training sample. A machine learning
model is said to be good if training data has a good set of features or less to no
irrelevant features.

33
Amity School of Engineering & Technology

Thank you

5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
Probabilistic Reasoning in Artificial Intelligence: Register Now
No ratings yet
Probabilistic Reasoning in Artificial Intelligence: Register Now
8 pages
ML Unit-3 Notes
No ratings yet
ML Unit-3 Notes
26 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
AI Unit4 LogicAgents
No ratings yet
AI Unit4 LogicAgents
17 pages
ML MCQs
55% (11)
ML MCQs
17 pages
Module 3 Games Optimal Decisions in Games Minimax Algorithm
No ratings yet
Module 3 Games Optimal Decisions in Games Minimax Algorithm
18 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
14 pages
AI Spectrum U5
No ratings yet
AI Spectrum U5
30 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
Artificial Intelligence Unit IV
No ratings yet
Artificial Intelligence Unit IV
105 pages
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
No ratings yet
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
49 pages
AI Digital Notes Complete
100% (1)
AI Digital Notes Complete
202 pages
Reasoning Systems For Categories
No ratings yet
Reasoning Systems For Categories
13 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
Ai-Unit-Iii Notes
No ratings yet
Ai-Unit-Iii Notes
46 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
HISTORY of AI
No ratings yet
HISTORY of AI
103 pages
Lab Program
100% (1)
Lab Program
15 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
AI 2ndunit
No ratings yet
AI 2ndunit
25 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
ARTIFICIAl iNTELLIGENCE Unit III &iv
No ratings yet
ARTIFICIAl iNTELLIGENCE Unit III &iv
39 pages
Unit 3
No ratings yet
Unit 3
99 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Neuro Fuzzy Systems
100% (1)
Neuro Fuzzy Systems
27 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Ai-Unit-I Notes
No ratings yet
Ai-Unit-I Notes
74 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Unit 4
No ratings yet
Unit 4
79 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
23 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
ML Unit-5
No ratings yet
ML Unit-5
83 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
AI CH3 Unit3
No ratings yet
AI CH3 Unit3
40 pages
Unit 5
No ratings yet
Unit 5
61 pages
Representing Knowledge Using
No ratings yet
Representing Knowledge Using
22 pages
IITPatna AIML Brochure V2
100% (1)
IITPatna AIML Brochure V2
10 pages
AI 2marks Questions
100% (1)
AI 2marks Questions
121 pages
Chapter 1 - Data Representation 1.1 - Data Types
No ratings yet
Chapter 1 - Data Representation 1.1 - Data Types
12 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Ontology Engineering PDF
No ratings yet
Ontology Engineering PDF
25 pages
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
No ratings yet
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
65 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
Planning and Search: Classical Planning: Planning Graphs, Graphplan
No ratings yet
Planning and Search: Classical Planning: Planning Graphs, Graphplan
22 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
Ai Module 3
No ratings yet
Ai Module 3
41 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
Generative AI Roadmap 1740183235
No ratings yet
Generative AI Roadmap 1740183235
15 pages
AWS Academy Machine Learning Foundations Course Outline (English)
No ratings yet
AWS Academy Machine Learning Foundations Course Outline (English)
7 pages
21AI63AI
No ratings yet
21AI63AI
2 pages
Important File Sheets Class 10
No ratings yet
Important File Sheets Class 10
3 pages
ML Notion 1
No ratings yet
ML Notion 1
18 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
Fundamentals of GenAI Webinar - 28-0948
No ratings yet
Fundamentals of GenAI Webinar - 28-0948
22 pages
Car Damage Assessment
No ratings yet
Car Damage Assessment
14 pages
Plant Disease Detection
No ratings yet
Plant Disease Detection
13 pages
Course Material - Artificail Intelligence-Week1 - Update
No ratings yet
Course Material - Artificail Intelligence-Week1 - Update
78 pages
Project Report
No ratings yet
Project Report
20 pages
Ai Syllabus
No ratings yet
Ai Syllabus
5 pages
Linear Regression and Logistic Regression
No ratings yet
Linear Regression and Logistic Regression
19 pages
Computer Vision Intern - JD
No ratings yet
Computer Vision Intern - JD
3 pages
NNDL Presentation Report Full
No ratings yet
NNDL Presentation Report Full
9 pages
Capsule Network - Kumar Shaswat
No ratings yet
Capsule Network - Kumar Shaswat
21 pages
CHP1 Introduction To Machine Learning
No ratings yet
CHP1 Introduction To Machine Learning
52 pages
Abishek-Damle-Mlp Vs CNN
No ratings yet
Abishek-Damle-Mlp Vs CNN
15 pages
Knowing When To Look-Adaptive Attention Via A Visual Sentinel For Image Captioning
No ratings yet
Knowing When To Look-Adaptive Attention Via A Visual Sentinel For Image Captioning
12 pages
Machine Learning For Astronomy: Rob Fergus
No ratings yet
Machine Learning For Astronomy: Rob Fergus
80 pages
Performance Analysis of NASNet On
No ratings yet
Performance Analysis of NASNet On
26 pages
De-GAN A Conditional Generative Adversarial Network For Document Enhancement
No ratings yet
De-GAN A Conditional Generative Adversarial Network For Document Enhancement
12 pages
Research Paper1
No ratings yet
Research Paper1
11 pages
GOKUL's Resume
No ratings yet
GOKUL's Resume
1 page
Search Term-1
No ratings yet
Search Term-1
4 pages
Geometric Description DH/DHM Stiffness (2) : R Ef Erences
No ratings yet
Geometric Description DH/DHM Stiffness (2) : R Ef Erences
1 page
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet