ML - CSA 301 - ML Perspective and Issues
ML - CSA 301 - ML Perspective and Issues
2
Amity School of Engineering & Technology
• The goal in the machine learning is to recognize the pattern in the dataset, in
general manner. After you recognize the patterns, you can use this information to
model the data, to interpret the data, or to predict the outcome of the new data
which hasn’t seen before.
• In general, there are three types of learning and these are supervised learning,
unsupervised learning, and reinforcement learning. Their names tell the main idea
behind them actually.
3
Amity School of Engineering & Technology
• Let me give you an example in there. Let’s assume you have a medical statistic
company and you have a dataset which contains patients’ features like blood
pressure, sugar rate in their blood, heart rate per minute, etc. and also you have
the information about if they have experienced heart disease in their life or not.
• By training a machine learning algorithm, your system can find a pattern between
features and the probability to experience heart disease. Therefore your algorithm
can predict whether a new patient has a risk to experience a heart disease, so
doctor takes the precautions and save a person’s life.
4
Amity School of Engineering & Technology
5
Amity School of Engineering & Technology
Supervised
Learning
Example
6
Amity School of Engineering & Technology
• How does a computer understand what is written with hand? In there, you should
use an unsupervised algorithm like K-means or EM-algorithm.
• What you do with these algorithms is that you start with initial random cluster
means and iteratively these mean points converge to real cluster mean values.
7
Amity School of Engineering & Technology
8
Amity School of Engineering & Technology
9
Amity School of Engineering & Technology
• Lastly let’s talk about reinforcement learning. Let’s assume you want to create an
intelligent agent which plays chess.
• In chess, you can’t handle movements one by one. Your agent should consider a
series of movements and then decide to take an action which would maximize the
utility.
• Therefore your agent should play a couple of turns against itself and decide the
best action to take. We call this type of learning as reinforcement learning and it is
generally used in games.
10
Amity School of Engineering & Technology
11
Amity School of Engineering & Technology
12
Amity School of Engineering & Technology
13
Amity School of Engineering & Technology
Issues
• Which algorithm performs best for which types of problems & representation?
• How much training data is sufficient?
• Can prior knowledge be helpful even when it is only approximately correct?
• The best strategy for choosing a useful next training experience.
• What specific function should the system attempt to learn?
• How can learner automatically alter it’s representation to improve it’s ability to
represent and learn the target function?
Amity School of Engineering & Technology
• Although machine learning is being used in every industry and helps organizations make
more informed and data-driven choices that are more effective than classical methodologies,
it still has so many problems that cannot be ignored.
• There are a lot of challenges that machine learning professionals face to inculcate ML skills
and create an application from scratch. Here are some common issues in Machine Learning
that professionals face to inculcate ML skills and create an application from scratch.
1. Inadequate Training Data / Poor Quality of Data: Data plays a significant role in the
machine learning process. One of the significant issues that machine learning professionals
face is the absence of good quality data.
• The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data.
15
Amity School of Engineering & Technology
• Although data plays a vital role in the processing of machine learning algorithms,
many data scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting for the machine learning algorithms.
• Unclean and noisy data can make the whole process extremely exhausting. We
don’t want our algorithm to make inaccurate or faulty predictions.
16
Amity School of Engineering & Technology
• Further, data quality is also important for the algorithms to work ideally, but the
absence of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:
* Noisy Data- Noisy data are data with a large amount of additional meaningless
information called noise. It is responsible for an inaccurate prediction that affects
the decision as well as accuracy in classification tasks.
* Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
18
Amity School of Engineering & Technology
• Further, if we are using non-representative training data in the model, it results in less
accurate predictions.
• A machine learning model is said to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data, then there will be a sampling noise
in the model, called the non-representative training set. It won't be accurate in predictions.
To overcome this, it will be biased against one class or a group.
• Hence, we should use representative data in training to protect against being biased and
make accurate predictions without any drift.
19
Amity School of Engineering & Technology
• Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.
• In such scenarios, the complexity of the model destroys, and rules of the machine learning
model become too easy to be applied on this data set, and the model starts doing wrong
predictions as well.
• Underfitting occurs when our model is too simple to understand the base structure of the
data, just like an undersized pant. This generally happens when we have limited data into the
data set, and we try to build a linear model with non-linear data.
20
Amity School of Engineering & Technology
21
Amity School of Engineering & Technology
4. Overfitting of Training Data: Overfitting refers to a machine learning model trained with a
massive amount of data that negatively affect its performance. It is like trying to fit in Oversized
jeans. Unfortunately, this is one of the significant issues faced by machine learning professionals.
• This means that the algorithm is trained with noisy and biased data, which will affect its
overall performance.
• Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists.
• Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.
• The main reason behind overfitting is using non-linear methods used in machine learning
algorithms as they build non-realistic data models.
22
Amity School of Engineering & Technology
23
Amity School of Engineering & Technology
• Let’s understand this with the help of an example. Let’s consider a model trained to
differentiate between a cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats,
1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will
identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased;
hence the prediction was negatively affected.
5. Machine Learning is a Complex Process: The machine learning industry is young and is
continuously changing. Rapid hit and trial experiments are being carried on.
• The process is transforming, and hence there are high chances of error which makes the
learning complex.
• It includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, and a lot more.
• Hence it is a really complicated process which is another big challenge for Machine learning
professionals.
24
Amity School of Engineering & Technology
• The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists.
• However, Machine Learning and Artificial Intelligence are very new technologies but are still
in an experimental phase and continuously being changing over time.
• There is the majority of hits and trial experiments; hence the probability of error is higher
than expected.
• Further, it also includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated and quite
tedious.
25
Amity School of Engineering & Technology
6. Lack of Training Data: The most important task you need to do in the machine learning
process is to train the data to achieve an accurate output. Less amount training data will produce
inaccurate or too biased predictions. Let us understand this with the help of an example.
• Consider a machine learning algorithm similar to training a child. One day you decided to
explain to a child how to distinguish between an apple and a watermelon. You will take an
apple and a watermelon and show him the difference between both based on their color,
shape, and taste.
• In this way, soon, he will attain perfection in differentiating between the two. But on the
other hand, a machine-learning algorithm needs a lot of data to distinguish.
• For complex problems, it may even require millions of data to be trained. Therefore we need
to ensure that Machine learning algorithms are trained with sufficient amounts of data.
26
Amity School of Engineering & Technology
7. Slow Implementation: The machine learning models are highly efficient in providing
accurate results, but it takes a tremendous amount of time.
• Slow programs, data overload, and excessive requirements usually take a lot of time to
provide accurate results. Further, it requires constant monitoring and maintenance to deliver
the best output.
• This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-consuming.
• Slow programming, excessive requirements' and overloaded data take more time to provide
accurate results than expected. This needs continuous maintenance and monitoring of the
model for delivering accurate results.
27
Amity School of Engineering & Technology
8. Imperfections in the Algorithm When Data Grows: So you have found quality data,
trained it amazingly, and the predictions are really concise and accurate.
• The best model of the present may become inaccurate in the coming Future and require
further rearrangement. So you need regular monitoring and maintenance to keep the
algorithm working.
• This is one of the most exhausting issues faced by machine learning professionals.
28
Amity School of Engineering & Technology
29
Amity School of Engineering & Technology
• Hence, an algorithm is necessary to recognize the customer behavior and trigger a relevant
recommendation for the user based on past experience.
30
Amity School of Engineering & Technology
• Data Biasing is also found a big challenge in Machine Learning. These errors exist
when certain elements of the dataset are heavily weighted or need more
importance than others.
• Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors.
• However, we can resolve this error by determining where data is actually biased in
the dataset. Further, take necessary steps to reduce it.
31
Amity School of Engineering & Technology
• Analyze data regularly and keep tracking errors to resolve them easily.
• Use multi-pass annotation such as sentiment analysis, content moderation, and intent
recognition.
32
Amity School of Engineering & Technology
• Although machine learning models are intended to give the best possible outcome,
if we feed garbage data as input, then the result will also be garbage.
• Hence, we should use relevant features in our training sample. A machine learning
model is said to be good if training data has a good set of features or less to no
irrelevant features.
33
Amity School of Engineering & Technology
Thank you
34