0% found this document useful (0 votes)
182 views34 pages

ML - CSA 301 - ML Perspective and Issues

The document discusses machine learning techniques from the perspective of an assistant professor. It describes three types of machine learning - supervised learning, unsupervised learning, and reinforcement learning. It provides examples of each type including predicting heart disease risk, handwritten digit recognition, and an AI that plays chess. The document also discusses common issues in machine learning like having inadequate or poor quality training data, and data that is not representative of real-world cases.

Uploaded by

Shatakshi sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views34 pages

ML - CSA 301 - ML Perspective and Issues

The document discusses machine learning techniques from the perspective of an assistant professor. It describes three types of machine learning - supervised learning, unsupervised learning, and reinforcement learning. It provides examples of each type including predicting heart disease risk, handwritten digit recognition, and an AI that plays chess. The document also discusses common issues in machine learning like having inadequate or poor quality training data, and data that is not representative of real-world cases.

Uploaded by

Shatakshi sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Amity School of Engineering & Technology

Dr. Kuldeep N. Tripathi


Assistant Professor
ASET-CSE

CSA 301 :MACHINE LEARNING TECHNIQUES


1
Amity School of Engineering & Technology

Machine Learning: Perspectives

2
Amity School of Engineering & Technology

Machine Learning: Perspective


• It involves searching a very large space of possible hypothesis to determine the
one that best fits the observed data.

• The goal in the machine learning is to recognize the pattern in the dataset, in
general manner. After you recognize the patterns, you can use this information to
model the data, to interpret the data, or to predict the outcome of the new data
which hasn’t seen before. 

• Machine learning is a subfield of artificial intelligence and machine learning


algorithms are used in other related fields like natural language processing and
computer vision.

• In general, there are three types of learning and these are supervised learning,
unsupervised learning, and reinforcement learning. Their names tell the main idea
behind them actually.

3
Amity School of Engineering & Technology

Supervised learning Machine Learning:


• In supervised learning, your system learns under the supervision of the data
outputs so supervised algorithms are preferred if your dataset contains output
information.

• Let me give you an example in there. Let’s assume you have a medical statistic
company and you have a dataset which contains patients’ features like blood
pressure, sugar rate in their blood, heart rate per minute, etc. and also you have
the information about if they have experienced heart disease in their life or not.

• By training a machine learning algorithm, your system can find a pattern between
features and the probability to experience heart disease. Therefore your algorithm
can predict whether a new patient has a risk to experience a heart disease, so
doctor takes the precautions and save a person’s life.

4
Amity School of Engineering & Technology

Supervised Learning Example

5
Amity School of Engineering & Technology
Supervised
Learning
Example

6
Amity School of Engineering & Technology

Unsupervised Machine Learning:


• You prefer to use unsupervised algorithms if your data doesn’t contain output and
if you would like to discover the clusters in dataset.

• A good example of unsupervised learning is handwritten digit recognition. In this


application you know that there should be 10 clusters {0,1,2,3,4,5,6,7,8,9} but the
problem in handwritten digits is that there are countless ways to write a digit by
hand, and everyone write digits differently.

• How does a computer understand what is written with hand? In there, you should
use an unsupervised algorithm like K-means or EM-algorithm. 

• What you do with these algorithms is that you start with initial random cluster
means and iteratively these mean points converge to real cluster mean values.

7
Amity School of Engineering & Technology

Un-Supervised Learning Example

8
Amity School of Engineering & Technology

9
Amity School of Engineering & Technology

Machine Learning: Perspective (Contd…)


• After you complete the training, if you visualize the means of the clusters you can
see that they really look like digits. Then you label these clusters with
corresponding digits, and when the computer encounters a new handwritten digit,
algorithm labels the digit with the mean which is closest to it.

• Lastly let’s talk about reinforcement learning. Let’s assume you want to create an
intelligent agent which plays chess.

• In chess, you can’t handle movements one by one. Your agent should consider a
series of movements and then decide to take an action which would maximize the
utility.

• Therefore your agent should play a couple of turns against itself and decide the
best action to take. We call this type of learning as reinforcement learning and it is
generally used in games.

10
Amity School of Engineering & Technology

11
Amity School of Engineering & Technology

12
Amity School of Engineering & Technology

Issues in Machine Learning

13
Amity School of Engineering & Technology

Issues

• Which algorithm performs best for which types of problems & representation?
• How much training data is sufficient?
• Can prior knowledge be helpful even when it is only approximately correct?
• The best strategy for choosing a useful next training experience.
• What specific function should the system attempt to learn?
• How can learner automatically alter it’s representation to improve it’s ability to
represent and learn the target function?
Amity School of Engineering & Technology

Issues in Machine Learning:


• In Machine Learning, there occurs a process of analyzing data for building or training models.
It is just everywhere; from Amazon product recommendations to self-driven cars, it beholds
great value throughout.

• Although machine learning is being used in every industry and helps organizations make
more informed and data-driven choices that are more effective than classical methodologies,
it still has so many problems that cannot be ignored.

• There are a lot of challenges that machine learning professionals face to inculcate ML skills
and create an application from scratch. Here are some common issues in Machine Learning
that professionals face to inculcate ML skills and create an application from scratch.

1. Inadequate Training Data / Poor Quality of Data: Data plays a significant role in the
machine learning process. One of the significant issues that machine learning professionals
face is the absence of good quality data.

• The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data.

15
Amity School of Engineering & Technology

Issues in Machine Learning:

• Although data plays a vital role in the processing of machine learning algorithms,
many data scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting for the machine learning algorithms.

• Unclean and noisy data can make the whole process extremely exhausting. We
don’t want our algorithm to make inaccurate or faulty predictions.

• Hence the quality of data is essential to enhance the output.

• Therefore, we need to ensure that the process of data preprocessing which


includes removing outliers, filtering missing values, and removing unwanted
features, is done with the utmost level of perfection.

16
Amity School of Engineering & Technology

Inadequate Training Data / Poor Quality of Data: :


• For example, a simple task requires thousands of sample data, and an advanced
task such as speech or image recognition needs millions of sample data examples.

• Further, data quality is also important for the algorithms to work ideally, but the
absence of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:

* Noisy Data- Noisy data are data with a large amount of additional meaningless
information called noise. It is responsible for an inaccurate prediction that affects
the decision as well as accuracy in classification tasks.

* Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.

* Generalizing of output data- Sometimes, it is also found that generalizing output


data becomes complex, which results in comparatively poor future actions.
17
Amity School of Engineering & Technology

18
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

2. Non-representative training data


• To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring.

• Further, if we are using non-representative training data in the model, it results in less
accurate predictions. 

• A machine learning model is said to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data, then there will be a sampling noise
in the model, called the non-representative training set. It won't be accurate in predictions.
To overcome this, it will be biased against one class or a group.

• Hence, we should use representative data in training to protect against being biased and
make accurate predictions without any drift.

19
Amity School of Engineering & Technology

Issues in Machine Learning:


3. Underfitting of Training Data: This process occurs when data is unable to establish an
accurate relationship between input and output variables. It simply means trying to fit in
undersized jeans. It signifies the data is too simple to establish a precise relationship.

• Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.

• In such scenarios, the complexity of the model destroys, and rules of the machine learning
model become too easy to be applied on this data set, and the model starts doing wrong
predictions as well.

• Underfitting occurs when our model is too simple to understand the base structure of the
data, just like an undersized pant. This generally happens when we have limited data into the
data set, and we try to build a linear model with non-linear data. 

20
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to reduce Underfitting:

• Maximize the training time


• Enhance the complexity of the model
• Add more features to the data
• Reduce regular parameters
• Remove noise from the data
• Trained on increased and better features
• Reduce the constraints
• Increasing the training time of model
• Increase the number of epochs to get better results.

21
Amity School of Engineering & Technology

Issues in Machine Learning:

4. Overfitting of Training Data: Overfitting refers to a machine learning model trained with a
massive amount of data that negatively affect its performance. It is like trying to fit in Oversized
jeans. Unfortunately, this is one of the significant issues faced by machine learning professionals.

• This means that the algorithm is trained with noisy and biased data, which will affect its
overall performance.

• Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists.

• Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.

• The main reason behind overfitting is using non-linear methods used in machine learning
algorithms as they build non-realistic data models. 

22
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to reduce overfitting:

• Increase training data in a dataset.


• Reduce model complexity by simplifying the model by selecting one with fewer
parameters
• Ridge Regularization and Lasso Regularization
• Early stopping during the training phase
• Reduce the noise
• Reduce the number of attributes in training data.
• Constraining the model.
• Analyzing the data with the utmost level of perfection
• Use data augmentation technique
• Remove outliers in the training set
• Select a model with lesser features

23
Amity School of Engineering & Technology

Issues in Machine Learning:

• Let’s understand this with the help of an example. Let’s consider a model trained to
differentiate between a cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats,
1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will
identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased;
hence the prediction was negatively affected.

5. Machine Learning is a Complex Process: The machine learning industry is young and is
continuously changing. Rapid hit and trial experiments are being carried on.

• The process is transforming, and hence there are high chances of error which makes the
learning complex.

• It includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, and a lot more.

• Hence it is a really complicated process which is another big challenge for Machine learning
professionals.
24
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

• The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists.

• However, Machine Learning and Artificial Intelligence are very new technologies but are still
in an experimental phase and continuously being changing over time.

• There is the majority of hits and trial experiments; hence the probability of error is higher
than expected.

• Further, it also includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated and quite
tedious.

25
Amity School of Engineering & Technology

Issues in Machine Learning:

6. Lack of Training Data: The most important task you need to do in the machine learning
process is to train the data to achieve an accurate output. Less amount training data will produce
inaccurate or too biased predictions. Let us understand this with the help of an example.

• Consider a machine learning algorithm similar to training a child. One day you decided to
explain to a child how to distinguish between an apple and a watermelon. You will take an
apple and a watermelon and show him the difference between both based on their color,
shape, and taste.

• In this way, soon, he will attain perfection in differentiating between the two. But on the
other hand, a machine-learning algorithm needs a lot of data to distinguish.

• For complex problems, it may even require millions of data to be trained. Therefore we need
to ensure that Machine learning algorithms are trained with sufficient amounts of data.

26
Amity School of Engineering & Technology

Issues in Machine Learning:

7. Slow Implementation: The machine learning models are highly efficient in providing
accurate results, but it takes a tremendous amount of time.

• Slow programs, data overload, and excessive requirements usually take a lot of time to
provide accurate results. Further, it requires constant monitoring and maintenance to deliver
the best output.

• This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-consuming.

• Slow programming, excessive requirements' and overloaded data take more time to provide
accurate results than expected. This needs continuous maintenance and monitoring of the
model for delivering accurate results.

27
Amity School of Engineering & Technology

Issues in Machine Learning:

8. Imperfections in the Algorithm When Data Grows: So you have found quality data,
trained it amazingly, and the predictions are really concise and accurate.

• The model may become useless in the future as data grows.

• The best model of the present may become inaccurate in the coming Future and require
further rearrangement. So you need regular monitoring and maintenance to keep the
algorithm working.

• This is one of the most exhausting issues faced by machine learning professionals.

28
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):


9. Monitoring and maintenance
• As we know that generalized output data is mandatory for any machine learning
model; hence, regular monitoring and maintenance become compulsory for the
same. Different results for different actions require data change; hence editing of
codes as well as resources for monitoring them also become necessary.

10. Getting bad recommendations


• A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model.
• Let's understand with an example where at a specific time customer is looking for
some gadgets, but now customer requirement changed over time but still machine
learning model showing same recommendations to the customer while customer
expectation has been changed. This incident is called a Data Drift.
• It generally occurs when new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and monitoring data
according to the expectations.

29
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):


11. Lack of skilled resources
• Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others. The absence of skilled resources in
the form of manpower is also an issue.

• Hence, we need manpower having in-depth knowledge of mathematics, science, and


technologies for developing and managing scientific substances for machine learning.

12. Customer Segmentation


• Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the model
and who don't even check them.

• Hence, an algorithm is necessary to recognize the customer behavior and trigger a relevant
recommendation for the user based on past experience.

30
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):


13. Data Bias

• Data Biasing is also found a big challenge in Machine Learning. These errors exist
when certain elements of the dataset are heavily weighted or need more
importance than others.

• Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors.

• However, we can resolve this error by determining where data is actually biased in
the dataset. Further, take necessary steps to reduce it.

31
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

Methods to remove Data Bias:

• Research more for customer segmentation.

• Be aware of your general use cases and potential outliers.

• Combine inputs from multiple sources to ensure data diversity.

• Include bias testing in the development process.

• Analyze data regularly and keep tracking errors to resolve them easily.

• Review the collected and annotated data.

• Use multi-pass annotation such as sentiment analysis, content moderation, and intent
recognition.

32
Amity School of Engineering & Technology

Issues in Machine Learning (Contd…):

14. Irrelevant features

• Although machine learning models are intended to give the best possible outcome,
if we feed garbage data as input, then the result will also be garbage. 

• Hence, we should use relevant features in our training sample. A machine learning
model is said to be good if training data has a good set of features or less to no
irrelevant features.

33
Amity School of Engineering & Technology

Thank you

34

You might also like