ML Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

GROW MORE FACULTY OF DIPLOMA ENGINEERING

SUBJECT - FUNDAMENTALS OF MACHINE LEARNING


CODE – 4341603

QUESTION BANK

UNIT 1 - Introduction to Machine Learning

1.1.1 Overview of Human Learning and Machine Learning

Human Learning
is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes,
and preferences.

The ability to learn is possessed by humans, animals, and some machines; there is also
evidence for some kind of learning in certain plants.

Some learning is immediate, induced by a single event (e.g. being burned by a hot stove), but
much skill and knowledge accumulate from repeated experiences .

The changes induced by learning often last a lifetime, and it is hard to distinguish learned
material that seems to be "lost" from that which cannot be retrieved.

Machine Learning is a system of computer algorithms that can learn from examples through
self-improvement without being explicitly coded by a programmer.

Machine learning is a part of artificial intelligence which combines data with statistical tools
to predict an output that can be used to make actionable insights.

The breakthrough comes with the idea that a machine can singularly learn from the data
(i.e., an example) to produce accurate results.

Machine learning is closely related to data mining and Bayesian predictive modeling.
The machine receives data as input and uses an algorithm to formulate answers.

A typical machine learning tasks are to provide a recommendation.

1.1.2 Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions.
Machine learning contains a set of algorithms that work on a huge amount of data.
Data is fed to these algorithms to train them, and on the basis of training, they build the model
& perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
1. Supervised Machine Learning
As its name suggests, Supervised machine learning is based on supervision.
It means in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output.
Here, the labelled data specifies that some of the inputs are already mapped to the output.
More preciously, we can say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the test dataset.
Let's understand supervised learning with an example.
Suppose we have an input dataset of cats and dog images.
So, first, we will provide the training to the machine to understand the images, such as
the shape & size of the tail of cat and dog, Shape of eyes, colour, height (dogs are
taller, cats are smaller), etc.
After completion of training, we input the picture of a cat and ask the machine to identify the
object and predict the output.
Now, the machine is well trained, so it will check all the features of the object, such as height,
shape, colour, eyes, ears, tail, etc., and find that it's a cat.
So, it will put it in the Cat category.
This is the process of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y).
Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:

Classification
Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset.
Some real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.
Some popular classification algorithms are given below:

Random Forest Algorithm


Decision Tree Algorithm
Logistic Regression Algorithm
Support Vector Machine Algorithm

b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables.
These are used to predict continuous output variables, such as market trends, weather
prediction, etc.
Some popular Regression algorithms are given below:

Simple Linear Regression Algorithm


Multivariate Regression Algorithm
Decision Tree Algorithm
Lasso Regression

Advantages and Disadvantages of Supervised Learning


Advantages:
Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

These algorithms are not able to solve complex tasks.


It may predict the wrong output if the test data is different from the training data.
It requires lots of computational time to train the algorithm.

Applications of Supervised Learning


Some common applications of Supervised Learning are given below:

Image Segmentation :Supervised Learning algorithms are used in image segmentation.


In this process, image classification is performed on different image data with pre-defined
labels.

Medical Diagnosis:Supervised algorithms are also used in the medical field for
diagnosis purposes. It is done by using medical images and past labelled data with labels
for disease conditions. With such a process, the machine can identify a disease for the
new patients.

Fraud Detection - Supervised Learning classification algorithms are used for identifying
fraud transactions, fraud customers, etc. It is done by using historic data to identify the
patterns that can lead to possible fraud.
Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam
folder.
Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can be
done using the same, such as voice-activated passwords, voice commands, etc.

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision.
It means, in unsupervised machine learning, the machine is trained using the unlabeled
dataset, and the machine predicts the output without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences.
Machines are instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously;
suppose there is a basket of fruit images, and we input it into the machine learning model.
The images are totally unknown to the model, and the task of the machine is to find the
patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning


Unsupervised Learning can be further classified into two types, which are given below:

Clustering
Association

1) Clustering
The clustering technique is used when we want to find the inherent groups from the data.
It is a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups.
An example of the clustering algorithm is grouping the customers by their purchasing
behaviour.
Some of the popular clustering algorithms are given below:

K-Means Clustering algorithm


Mean-shift algorithm
DBSCAN Algorithm
Principal Component Analysis
Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting


relations among variables within a large dataset.
The main aim of this learning algorithm is to find the dependency of one data item on another
data item and map those variables accordingly so that it can generate maximum profit.
This algorithm is mainly applied in Market Basket analysis, Web usage mining, continuous
production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset
is easier as compared to the labelled dataset.

Disadvantages:

The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.

Applications of Unsupervised Learning


Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to discover
fraudulent transactions.
Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each
user located at a particular location.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning.
It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.
Although Semi-supervised learning is the middle ground between supervised and
unsupervised learning and operates on the data that consists of a few labels, it mostly
i t f l b l dd t
consists of unlabeled data.
As labels are costly, but for corporate purposes, they may have few labels.
It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.
To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.
The main aim of semi-supervised learning is to effectively use all the available data, rather
than only labelled data like in supervised learning.
Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it
helps to label the unlabeled data into labelled data.
It is because labelled data is a comparatively more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example.


Supervised learning is where a student is under the supervision of an instructor at home and
college.
Further, if that student is self-analysing the same concept without any help from the instructor,
it comes under unsupervised learning.
Under semi-supervised learning, the student has to revise himself after analyzing the same
concept under the guidance of an instructor at college.

Advantages and disadvantages of Semi-supervised Learning


Advantages:

It is simple and easy to understand the algorithm.


It is highly efficient.
It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

Iterations results may not be stable.


We cannot apply these algorithms to network-level data.
Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance.
Agent gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
The reinforcement learning process is similar to a human being;
for example, a child learns various things by experiences in his day-to-day life.
An example of reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a high score.
Agent receives feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). I
n MDP, the agent constantly interacts with the environment and performs actions; at each
action, the environment responds and generates a new state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of methods/algorithms:

Positive Reinforcement Learning: Positive reinforcement learning specifies increasing


the tendency that the required behaviour would occur again by adding something. It
enhances the strength of the behaviour of the agent and positively impacts it.
Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour would
occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning


Video Games:RL algorithms are much popular in gaming applications. It is used to gain super-
human performance. Some popular games that use RL algorithms are
AlphaGO and AlphaGO Zero.
Resource Management:The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.

Robotics:RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement learning.
There are different industries that have their vision of building intelligent robots using AI and
Machine learning technology.

Text MiningText-mining, one of the great applications of NLP, is now being implemented with the
help of Reinforcement Learning by Salesforce
company.
Advantages and Disadvantages of Reinforcement Learning
Advantages

It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
Helps in achieving long term results.

Disadvantage

RL algorithms are not preferred for simple problems.


RL algorithms require huge data and computations.
Too much reinforcement learning can lead to an overload of states which can weaken the
results.

1.1.3 Applications of Machine Learning


We are using machine learning in our daily life even without knowing it such as Google Maps,
Google assistant, Alexa, etc.

Below are some most trending real-world applications of Machine Learning:


1. Image Recognition:
Image recognition is one of the most common applications of machine learning.
It is used to identify objects, persons, places, digital images, etc.

The popular use case of image recognition and face detection is, Automatic friend tagging
suggestion:
Facebook provides us a feature of auto friend tagging suggestion.

Whenever we upload a photo with our Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known
as "Speech to text", or "Computer speech recognition."

At present, machine learning algorithms are widely used by various applications of speech
recognition.

Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow
the voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an advertisement
for the same product while internet surfing on the same browser and this is because of
machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars.
Tesla, the most popular car manufacturing company is working on self-driving car. It is using
unsupervised learning method to train the car models to detect people and objects while
driving.

6. Email Spam and Malware Filtering:


Whenever we receive a new email, it is filtered automatically as important, normal, and spam.

We always receive an important mail in our inbox with the important symbol and spam emails
in our spam box, and the technology behind this is Machine learning.

Below are some spam filters used by Gmail:

Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:


We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.

As the name suggests, they help us in finding the information using our voice instruction.

These assistants can help us in various ways just by our voice instructions such as Play
music, call someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.

8. Online Fraud Detection:


Machine learning is making our online transaction safe and secure by detecting fraud
transaction.

Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of
a transaction.
So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round.

For each genuine transaction, there is a specific pattern which gets change for the fraud
transaction hence, it detects it and makes our online transactions more secure.

9. Stock Market trading:


Machine learning is widely used in stock market trading.

In the stock market, there is always a risk of up and downs in shares, so for this machine
learning's long short term memory neural network is used for the prediction of stock market
trends.

10. Medical Diagnosis:


In medical science, machine learning is used for diseases diagnoses.

With this, medical technology is growing very fast and able to build 3D models that can
predict the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:


Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our known
languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.
The technology behind the automatic translation is a sequence to sequence learning
algorithm, which is used with image recognition and translates the text from one language to
another language.

1.1.4 Tools and Technology for Machine Learning


Machine learning is one of the most revolutionary technologies that is making lives simpler.

It is a subfield of Artificial Intelligence, which analyses the data, build the model, and
make predictions.

Due to its popularity and great applications, every tech enthusiast wants to learn and build
new machine learning Apps.
However, to build ML models, it is important to master machine learning tools. Mastering
machine learning tools will enable you to play with the data, train your models, discover new
methods, and create algorithms.

There are different tools, software, and platform available for machine learning, and also new
software and tools are evolving day by day.

Although there are many options and availability of Machine learning tools, choosing the best
tool per your model is a challenging task.
If you choose the right tool for your model, you can make it faster and more efficient. In this
topic, we will discuss some popular and commonly used Machine learning tools and their
features.

1. TensorFlow
TensorFlow is one of the most popular open-source libraries used to train and build both
machine learning and deep learning models.
It provides a JS library and was developed by Google Brain Team.

It is much popular among machine learning enthusiasts, and they use it for building different
ML applications.
It offers a powerful library, tools, and resources for numerical computation, specifically for
large scale machine learning and deep learning projects.

It enables data scientists/ML developers to build and deploy machine learning applications
efficiently.
For training and building the ML models, TensorFlow provides a high-level Keras API, which
lets users easily start with TensorFlow and machine learning.

Features:
Below are some top features:

TensorFlow enables us to build and train our ML models easily.


It also enables you to run the existing models using the TensorFlow.js
It provides multiple abstraction levels that allow the user to select the correct resource as
per the requirement.
It helps in building a neural network.
Provides support of distributed computing.
While building a model, for more need of flexibility, it provides eager execution that
enables immediate iteration and intuitive debugging.
This is open-source software and highly flexible.
It also enables the developers to perform numerical computations using data flow graphs.
Run-on GPUs and CPUs, and also on various mobile computing platforms.
It provides a functionality of auto diff (Automatically computing gradients is called
automatic differentiation or auto diff).
It enables to easily deploy and training the model in the cloud.
It can be used in two ways, i.e., by installing through NPM or by script tags.
It is free to use.

2. PyTorch

PyTorch is an open-source machine learning framework, which is based on the Torch library.

This framework is free and open-source and developed by FAIR(Facebook's AI Research lab).

It is one of the popular ML frameworks, which can be used for various applications, including
computer vision and natural language processing. PyTorch has Python and C++ interfaces;
however, the Python interface is more interactive.

Different deep learning software is made up on top of PyTorch, such as PyTorch Lightning,
Hugging Face's Transformers, Tesla autopilot, etc.

It specifies a Tensor class containing an n-dimensional array that can perform tensor
computations along with GPU support.

Features:
Below are some top features:

It enables the developers to create neural networks using Autograde Module.


It is more suitable for deep learning researches with good speed and flexibility.
It can also be used on cloud platforms.
It includes tutorial courses, various tools, and libraries.
It also provides a dynamic computational graph that makes this library more popular.
It allows changing the network behaviour randomly without any lag.
It is easy to use due to its hybrid front-end.
It is freely available.

3. Google Cloud ML Engine


While training a classifier with a huge amount of data, a computer system might not perform
well.
However, various machine learning or deep learning projects requires millions or billions of
training datasets.

Or the algorithm that is being used is taking a long time for execution.
In such a case, one should go for the Google Cloud ML Engine.

It is a hosted platform where ML developers and data scientists build and run optimum quality
machine, learning models.

It provides a managed service that allows developers to easily create ML models with any
type of data and of any size.

Features:
Below are the top features:

Provides machine learning model training, building, deep learning and predictive
modelling.
The two services, namely, prediction and training, can be used independently or
combinedly.
It can be used by enterprises, i.e., for identifying clouds in a satellite image, responding
faster to emails of customers.
It can be widely used to train a complex model.

4. Amazon Machine Learning (AML)


Amazon provides a great number of machine learning tools, and one of them is Amazon
Machine Learning or AML.
Amazon Machine Learning (AML) is a cloud-based and robust machine learning software
application, which is widely used for building machine learning models and making
predictions.
Moreover, it integrates data from multiple sources, including Redshift, Amazon S3, or RDS.

Features
Below are some top features:

AML offers visualization tools and wizards.


Enables the users to identify the patterns, build mathematical models, and make
predictions.
It provides support for three types of models, which are multi-class classification, binary
classification, and regression.
It permits users to import the model into or export the model out from Amazon Machine
Learning.
It also provides core concepts of machine learning, including ML models, Data sources,
Evaluations, Real-time predictions and Batch predictions.
It enables the user to retrieve predictions with the help of batch APIs for bulk requests or
real-time APIs for individual requests.

5 . Google ML kit for Mobile


For Mobile app developers, Google brings ML Kit, which is packaged with the expertise of
machine learning and technology to create more robust, optimized, and personalized apps.
This tools kit can be used for face detection, text recognition, landmark detection, image
labelling, and barcode scanning applications. One can also use it for working offline.

Features:
Below are some top features:

The ML kit is optimized for mobile.


It includes the advantages of different machine learning technologies.
It provides easy-to-use APIs that enables powerful use cases in your mobile apps.
It includes Vision API and Natural Language APIS to detect faces, text, and objects, and
identify different languages & provide reply suggestions.

You might also like