Machine Learning Tutorial
Machine Learning Tutorial
Machine Learning tutorial provides basic and advanced concepts of machine learning. Our
machine learning tutorial is designed for students and working professionals.
This machine learning tutorial gives you an introduction to machine learning along with the
wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining
more data.
We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically.
The performance of the machine learning algorithm depends on the amount of data, and it can
be determined by the cost function. With the help of machine learning, we can save both time
and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are using a vast amount of data
to analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine
Learning:
o Rapid increment in the production of data
o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group
of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
Note: We will learn about the above types of machine learning in detail in later chapters.
o 1940: In 1940, the first manually operated computer, "ENIAC" was invented,
which was the first electronic general-purpose computer. After that stored
program computer such as EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit.
In 1950, the scientists started applying their idea to work and analyzed how
human neurons might work.
o 1952: Arthur Samuel, who was the pioneer of machine learning, created a
program that helped an IBM computer to play a checkers game. It performed
better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur
Samuel.
o The duration of 1974 to 1980 was the tough time for AI and ML researchers,
and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.
o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce
20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against
the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new
name to neural net research as "deep learning," and nowadays, it has become
one of the most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to
recognize the image of humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was
the first Chabot who convinced the 33% of human judges that it was not a
machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they
claimed that it could recognize a person with the same precision as a human
can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go
game. In 2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was
able to learn the online trolling. It used to read millions of comments of
different websites to learn to stop online trolling.
Modern machine learning models can be used for making various predictions,
including weather prediction, disease prediction, stock market analysis, etc.
Prerequisites
Before learning machine learning, you must have the basic knowledge of followings so that
you can easily understand the concepts of machine learning:
Audience
Our Machine learning tutorial is designed to help beginner and professionals.
Problems
We assure you that you will not find any difficulty while learning our Machine
learning tutorial. But if there is any mistake in this tutorial, kindly post the problem or
error in the contact form so that we can improve it.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is
used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the
performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction,
there is a specific pattern which gets change for the fraud transaction hence, it
detects it and makes our online transactions more secure.
Machine learning life cycle involves seven major steps, which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and
to know the purpose of the problem. Therefore, before starting the life cycle, we
need to understand the problem because the good result depends on the better
understanding of the problem.
In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train
a model, we need data, hence, life cycle starts by collecting data.
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this
step is to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected
from various sources such as files, database, internet, or mobile devices. It is one
of the most important steps of the life cycle. The quantity and quality of the collected
data will determine the efficiency of the output. The more will be the data, the more
accurate will be the prediction.
By performing the above task, we get a coherent set of data, also called as a dataset.
It will be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:
The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine learning
techniques such as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build
the model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns, rules,
and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test
the model. In this step, we check for the accuracy of our model by providing a test
dataset to it.
Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the
model in the real-world system.
In this topic, we will learn to install Python and an IDE with the help of Anaconda
distribution.
Below some steps are given to show the downloading and installing process of
Anaconda and IDE:
o To download Anaconda in your system, firstly, open your favorite browser and
type Download Anaconda Python, and then click on the first link as given in
the below image. Alternatively, you can directly download it by clicking on this
link, https://www.anaconda.com/distribution/#download-section.
o After clicking on the first link, you will reach to download page of Anaconda,
as shown in the below image:
o Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you can
download it as per your OS type by clicking on available options shown in
below image. It will provide you Python 2.7 and Python 3.7 versions, but the
latest version is 3.7, hence we will download Python 3.7 version. After clicking
on the download option, it will start downloading on your computer.
Note: In this topic, we are downloading Anaconda for Windows you can choose it as per
your OS.
o In the next window, you will get two options for installations as given in the
below image. Select the first option (Just me) and click on Next.
o Now you will get a window for installing location, here, you can leave it as
default or change it by browsing a location, and then click on Next. Consider
the below image:
o Now installation is completed, tick the checkbox if you want to learn more
about Anaconda and Anaconda cloud. Click on Finish to end the process.
Note: Here, we will use the Spyder IDE to run Python programs.
Although these are two related technologies and sometimes people use them as a
synonym for each other, but still both are the two different terms in various cases.
AI is a bigger concept to create intelligent machines that can simulate human thinking
capability and behavior, whereas, machine learning is an application or subset of AI that
allows machines to learn from data without being programmed explicitly.
Below are some main differences between AI and machine learning along with the
overview of Artificial intelligence and machine learning.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Artificial Intelligence
Artificial intelligence is a field of computer science which makes a computer system
that can mimic human intelligence. It is comprised of two words "Artificial" and
"intelligence", which means "a human-made thinking power." Hence we can define
it as,
Artificial intelligence is a technology using which we can create intelligent systems that
can simulate human intelligence.
The Artificial intelligence system does not require to be pre-programmed, instead of
that, they use such algorithms which can work with their own intelligence. It involves
machine learning algorithms such as Reinforcement learning algorithm and deep
learning neural networks. AI is being used in multiple places such as Siri, Google?s
AlphaGo, AI in Chess playing, etc.
o Weak AI
o General AI
o Strong AI
Currently, we are working with weak AI and general AI. The future of AI is Strong AI
for which it is said that it will be intelligent than humans.
Machine learning
Machine learning is about extracting knowledge from the data. It can be defined as,
Machine learning works on algorithm which learn by it?s own using historical data. It
works only for specific domains such as if we are creating a machine learning model
to detect pictures of dogs, it will only give result for dog images, but if we provide a
new data like cat image then it will become unresponsive. Machine learning is being
used in various places such as for online recommender system, for Google search
algorithms, Email spam filter, Facebook Auto friend tagging suggestion, etc.
o Supervised learning
o Reinforcement learning
o Unsupervised learning
Key differences between Artificial
Intelligence (AI) and Machine learning
(ML):
In AI, we make intelligent systems to In ML, we teach machines with data to perform a
perform any task like a human. particular task and give an accurate result.
Machine learning and deep learning Deep learning is a main subset of machine
are the two main subsets of AI. learning.
AI has a very wide range of scope. Machine learning has a limited scope.
The main applications of AI are Siri, The main applications of machine learning
customer support using catboats, are Online recommender system, Google
Expert System, Online game playing, search algorithms, Facebook auto friend
intelligent humanoid robot, etc. tagging suggestions, etc.
On the basis of capabilities, AI can Machine learning can also be divided into mainly
be divided into three types, which three types that are Supervised
are, Weak AI, General AI, learning, Unsupervised learning,
and Strong AI. and Reinforcement learning.
Before knowing the sources of the machine learning dataset, let's discuss datasets.
What is a dataset?
A dataset is a collection of data in which data is arranged in some order. A dataset
can contain any data from a series of an array to a database table. Below table shows
an example of the dataset:
India 38 48000 No
Germany 30 54000 No
France 48 65000 No
Germany 40 Yes
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Need of Dataset
To work with machine learning projects, we need a huge amount of data, because,
without the data, one cannot train ML/AI models. Collecting and preparing the
dataset is one of the most crucial parts while creating an ML/AI project.
The technology applied behind any ML projects cannot work properly if the dataset
is not well prepared and pre-processed.
During the development of the ML project, the developers completely rely on the
datasets. In building ML applications, datasets are divided into two parts:
o Training dataset:
o Test Dataset
Note: The datasets are of large size, so to download these
datasets, you must have fast internet on your computer.
1. Kaggle Datasets
Kaggle is one of the best sources for providing datasets for Data Scientists and
Machine Learners. It allows users to find, download, and publish datasets in an easy
way. It also provides the opportunity to work with other machine learning engineers
and solve difficult Data Science related tasks.
Kaggle provides a high-quality dataset in different formats that we can easily find
and download.
Since the year 1987, it has been widely used by students, professors, researchers as a
primary source of machine learning dataset.
It classifies the datasets as per the problems and tasks of machine learning such
as Regression, Classification, Clustering, etc. It also contains some of the popular
datasets such as the Iris dataset, Car Evaluation dataset, Poker Hand dataset, etc.
Anyone can analyze and build various services using shared data via AWS resources.
The shared dataset on cloud helps users to spend more time on data analysis rather
than on acquisitions of data.
This source provides the various types of datasets with examples and ways to use the
dataset. It also provides the search box using which we can search for the required
dataset. Anyone can add any dataset or example to the Registry of Open Data on
AWS.
5. Microsoft Datasets
The Microsoft has launched the "Microsoft Research Open data" repository with
the collection of free datasets in various areas such as natural language processing,
computer vision, and domain-specific sciences.
Using this resource, we can download the datasets to use on the current device, or
we can also directly use it on the cloud infrastructure.
Awesome public dataset collection provides high-quality datasets that are arranged
in a well-organized manner within a list according to topics such as Agriculture,
Biology, Climate, Complex networks, etc. Most of the datasets are available free, but
some may not, so it is better to check the license before downloading the dataset.
The link to download the dataset from Awesome public dataset collection
is https://github.com/awesomedata/awesome-public-datasets.
7. Government Datasets
There are different sources to get government-related data. Various countries
publish government data for public use collected by them from different
departments.
The goal of providing these datasets is to increase transparency of government work
among the people and to use the data in an innovative approach. Below are some
links of government datasets:
Visual data provides multiple numbers of the great dataset that are specific to
computer visions such as Image Classification, Video classification, Image
Segmentation, etc. Therefore, if you want to build a project on deep learning or
image processing, then you can refer to this source.
The link for downloading the dataset from this source is https://www.visualdata.io/.
9. Scikit-learn dataset
Scikit-learn is a great source for machine learning enthusiasts. This source provides
both toy and real-world datasets. These datasets can be obtained from
sklearn.datasets package and using general dataset API.
The toy dataset available on scikit-learn can be loaded using some predefined
functions such as, load_boston([return_X_y]), load_iris([return_X_y]), etc, rather
than importing any file from external sources. But these datasets are not suitable for
real-world projects.
When creating a machine learning project, it is not always a case that we come
across the clean and formatted data. And while doing any operation with data, it is
mandatory to clean it and put in a formatted way. So for this, we use data
preprocessing task.
Why do we need Data Preprocessing?
A real-world data generally contains noises, missing values, and maybe in an
unusable format which cannot be directly used for machine learning models. Data
preprocessing is required tasks for cleaning the data and making it suitable for a
machine learning model which also increases the accuracy and efficiency of a
machine learning model.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Dataset may be of different formats for different purposes, such as, if we want to
create a machine learning model for business purpose, then dataset will be different
with the dataset required for a liver patient. So each dataset is different from another
dataset. To use the dataset in our code, we usually put it into a CSV file. However,
sometimes, we may also need to use an HTML or xlsx file.
Here we will use a demo dataset for data preprocessing, and for practice, it can be
downloaded from here, "https://www.superdatascience.com/pages/machine-
learning. For real-world problems, we can download datasets online from various
sources such as https://www.kaggle.com/uciml/datasets, https://archive.ics.uci.edu/
ml/index.php etc.
We can also create our dataset by gathering data using various API with Python and
put that data into a .csv file.
2) Importing Libraries
In order to perform data preprocessing using Python, we need to import some
predefined Python libraries. These libraries are used to perform some specific jobs.
There are three specific libraries that we will use for data preprocessing, which are:
Numpy: Numpy Python library is used for including any type of mathematical
operation in the code. It is the fundamental package for scientific calculation in
Python. It also supports to add large, multidimensional arrays and matrices. So, in
Python, we can import it as:
1. import numpy as nm
Here we have used nm, which is a short name for Numpy, and it will be used in the
whole program.
Here, we have used pd as a short name for this library. Consider the below image:
Note: We can set any directory as a working directory, but it must contain the required
dataset.
Here, in the below image, we can see the Python file along with required dataset.
Now, the current folder is set as a working directory.
read_csv() function:
Now to import the dataset, we will use read_csv() function of pandas library, which is
used to read a csv file and performs various operations on it. Using this function, we
can read a csv file locally as well as through an URL.
1. data_set= pd.read_csv('Dataset.csv')
Here, data_set is a name of the variable to store our dataset, and inside the function,
we have passed the name of our dataset. Once we execute the above line of code, it
will successfully import the dataset in our code. We can also check the imported
dataset by clicking on the section variable explorer, and then double click
on data_set. Consider the below image:
As in the above image, indexing is started from 0, which is the default indexing in
Python. We can also change the format of our dataset by clicking on the format
option.
1. x= data_set.iloc[:,:-1].values
In the above code, the first colon(:) is used to take all the rows, and the second
colon(:) is for all the columns. Here we have used :-1, because we don't want to take
the last column as it contains the dependent variable. So by doing this, we will get
the matrix of features.
As we can see in the above output, there are only three variables.
1. y= data_set.iloc[:,3].values
Here we have taken all the rows with the last column only. It will give the array of
dependent variables.
Output:
array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'],
dtype=object)
Note: If you are using Python language for machine learning, then extraction is
mandatory, but for R language it is not required.
There are mainly two ways to handle missing data, which are:
By deleting the particular row: The first way is used to commonly deal with null
values. In this way, we just delete the specific row or column which consists of null
values. But this way is not so efficient and removing data may lead to loss of
information which will not give the accurate output.
By calculating the mean: In this way, we will calculate the mean of that column or
row which contains any missing value and will put it on the place of missing value.
This strategy is useful for the features which have numeric data such as age, salary,
year, etc. Here, we will use this approach.
To handle missing values, we will use Scikit-learn library in our code, which contains
various libraries for building machine learning models. Here we will
use Imputer class of sklearn.preprocessing library. Below is the code for it:
1. #handling missing data (Replacing missing data with the mean value)
2. from sklearn.preprocessing import Imputer
3. imputer= Imputer(missing_values ='NaN', strategy='mean', axis = 0)
4. #Fitting imputer object to the independent variables x.
5. imputerimputer= imputer.fit(x[:, 1:3])
6. #Replacing missing data with the calculated mean value
7. x[:, 1:3]= imputer.transform(x[:, 1:3])
Output:
As we can see in the above output, the missing values have been replaced with the
means of rest column values.
Since machine learning model completely works on mathematics and numbers, but if
our dataset would have a categorical variable, then it may create trouble while
building the model. So it is necessary to encode these categorical variables into
numbers.
Firstly, we will convert the country variables into categorical data. So to do this, we
will use LabelEncoder() class from preprocessing library.
1. #Catgorical data
2. #for Country Variable
3. from sklearn.preprocessing import LabelEncoder
4. label_encoder_x= LabelEncoder()
5. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
Output:
Out[15]:
array([[2, 38.0, 68000.0],
[0, 43.0, 45000.0],
[1, 30.0, 54000.0],
[0, 48.0, 65000.0],
[1, 40.0, 65222.22222222222],
[2, 35.0, 58000.0],
[1, 41.111111111111114, 53000.0],
[0, 49.0, 79000.0],
[2, 50.0, 88000.0],
[0, 37.0, 77000.0]], dtype=object)
Explanation:
In above code, we have imported LabelEncoder class of sklearn library. This class
has successfully encoded the variables into digits.
But in our case, there are three country variables, and as we can see in the above
output, these variables are encoded into 0, 1, and 2. By these values, the machine
learning model may assume that there is some correlation between these variables
which will produce the wrong output. So to remove this issue, we will use dummy
encoding.
Dummy Variables:
Dummy variables are those variables which have values 0 or 1. The 1 value gives the
presence of that variable in a particular column, and rest variables become 0. With
dummy encoding, we will have a number of columns equal to the number of
categories.
In our dataset, we have 3 categories so it will produce three columns having 0 and 1
values. For Dummy Encoding, we will use OneHotEncoder class
of preprocessing library.
Output:
As we can see in the above output, all the variables are encoded into numbers 0 and
1 and divided into three columns.
It can be seen more clearly in the variables explorer section, by clicking on x option
as:
For Purchased Variable:
1. labelencoder_y= LabelEncoder()
2. y= labelencoder_y.fit_transform(y)
For the second categorical variable, we will only use labelencoder object
of LableEncoder class. Here we are not using OneHotEncoder class because the
purchased variable has only two categories yes or no, and which are automatically
encoded into 0 and 1.
Output:
Suppose, if we have given training to our machine learning model by a dataset and
we test it by a completely different dataset. Then, it will create difficulties for our
model to understand the correlations between the models.
If we train our model very well and its training accuracy is also very high, but we
provide a new dataset to it, then it will decrease the performance. So we always try to
make a machine learning model which performs well with the training set and also
with the test dataset. Here, we can define these datasets as:
Training Set: A subset of dataset to train the machine learning model, and we
already know the output.
Test set: A subset of dataset to test the machine learning model, and by using the
test set, model predicts the output.
For splitting the dataset, we will use the below lines of code:
Explanation:
o In the above code, the first line is used for splitting arrays of the dataset into
random train and test subsets.
o In the second line, we have used four variables for our output that are
o x_train: features for the training data
o x_test: features for testing data
o y_train: Dependent variables for training data
o y_test: Independent variable for testing data
o In train_test_split() function, we have passed four parameters in which first
two are for arrays of data, and test_size is for specifying the size of the test
set. The test_size maybe .5, .3, or .2, which tells the dividing ratio of training
and testing sets.
o The last parameter random_state is used to set a seed for a random
generator so that you always get the same result, and the most used value for
this is 42.
Output:
By executing the above code, we will get 4 different variables, which can be seen
under the variable explorer section.
As we can see in the above image, the x and y variables are divided into 4 different
variables with corresponding values.
7) Feature Scaling
Feature scaling is the final step of data preprocessing in machine learning. It is a
technique to standardize the independent variables of the dataset in a specific range.
In feature scaling, we put our variables in the same range and in the same scale so
that no any variable dominate the other variable.
Standardization
Normalization
Here, we will use the standardization method for our dataset.
Now, we will create the object of StandardScaler class for independent variables or
features. And then we will fit and transform the training dataset.
1. st_x= StandardScaler()
2. x_train= st_x.fit_transform(x_train)
1. x_test= st_x.transform(x_test)
Output:
By executing the above lines of code, we will get the scaled values for x_train and
x_test as:
x_train:
x_test:
As we can see in the above output, all the variables are scaled between values -1 to
1.
Note: Here, we have not scaled the dependent variable because there are only two
values 0 and 1. But if these variables will have more range of values, then we will also
need to scale those variables.
Now, in the end, we can combine all the steps together to make our complete code
more understandable.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Dataset.csv')
8.
9. #Extracting Independent Variable
10. x= data_set.iloc[:, :-1].values
11.
12. #Extracting Dependent variable
13. y= data_set.iloc[:, 3].values
14.
15. #handling missing data(Replacing missing data with the mean value)
16. from sklearn.preprocessing import Imputer
17. imputer= Imputer(missing_values ='NaN', strategy='mean', axis = 0)
18.
19. #Fitting imputer object to the independent varibles x.
20. imputerimputer= imputer.fit(x[:, 1:3])
21.
22. #Replacing missing data with the calculated mean value
23. x[:, 1:3]= imputer.transform(x[:, 1:3])
24.
25. #for Country Variable
26. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
27. label_encoder_x= LabelEncoder()
28. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
29.
30. #Encoding for dummy variables
31. onehot_encoder= OneHotEncoder(categorical_features= [0])
32. x= onehot_encoder.fit_transform(x).toarray()
33.
34. #encoding for purchased variable
35. labelencoder_y= LabelEncoder()
36. y= labelencoder_y.fit_transform(y)
37.
38. # Splitting the dataset into training and test set.
39. from sklearn.model_selection import train_test_split
40. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
41.
42. #Feature Scaling of datasets
43. from sklearn.preprocessing import StandardScaler
44. st_x= StandardScaler()
45. x_train= st_x.fit_transform(x_train)
46. x_test= st_x.transform(x_test)
In the above code, we have included all the data preprocessing steps together. But
there are some steps or lines of code which are not necessary for all machine
learning models. So we can exclude them from our code to make it reusable for all
models.
In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the
same concept as a student learns in the supervision of the teacher.
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The working of Supervised learning can be easily understood by the below example
and diagram:
Suppose we have a dataset of different types of shapes which includes square,
rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is
to identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape,
it classifies the shape on the bases of a number of sides, and predicts the output.
Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which
make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw data to
find the hidden patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Example: Suppose we have an image of different types of fruits. The task of our
supervised learning model is to identify the fruits and classify them accordingly. So to
identify the image in supervised learning, we will give the input data as well as
output for that, which means we will train the model by the shape, size, color, and
taste of each fruit. Once the training is completed, we will test the model by giving
the new set of fruit. The model will identify the fruit and predict the output using a
suitable algorithm.
Example: To understand the unsupervised learning, we will use the example given
above. So unlike supervised learning, here we will not provide any supervision to the
model. We will just provide the input dataset to the model and allow the model to
find the patterns from the data. With the help of a suitable algorithm, the model will
train itself and divide the fruits into different groups according to the most similar
features between them.
The main differences between Supervised and Unsupervised learning are given
below:
Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting take any feedback.
correct output or not.
Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
Supervised learning can be used for Unsupervised learning can be used for
those cases where we know the input as those cases where we have only input
well as corresponding outputs. data and no corresponding output
data.
Note: The supervised and unsupervised learning both are the machine learning
methods, and selection of any of these learning depends on the factors related to the
structure and volume of your dataset and the use cases of the problem.
We can understand the concept of regression analysis using the below example:
Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such type
of prediction problems in machine learning, we need regression analysis.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based on
the one or more predictor variables. It is mainly used for prediction, forecasting,
time series modeling, and determining the causal-effect relationship between
variables.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." The
distance between datapoints and line tells whether a model has captured a strong
relationship or not.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
1. Y= aX+b
Logistic Regression:
When we provide the input values (data) to the function, it gives the S-curve as
follows:
o It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between
the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a
non-linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b 0+ b1x, is transformed
into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic
Note: This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the
same degree.
Here, the blue line is called hyperplane, and the other two lines are known as
boundary lines.
Decision Tree Regression:
o Decision Tree is a supervised learning algorithm which can be used for solving
both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal
node represents the "test" for an attribute, each branch represent the result of
the test, and each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node
(dataset), which splits into left and right child nodes (subsets of dataset).
These child nodes are further divided into their children node, and themselves
become the parent node of those nodes. Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.
Ridge Regression:
Lasso Regression:
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model
representation.
The different values for weights or the coefficient of lines (a 0, a1) gives a different line
of regression, so we need to calculate the best values for a 0 and a1 to find the best fit
line, so to calculate this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual
values. It can be written as:
Where,
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the
regression line, then the residual will be small and hence the cost function.
Gradient Descent:
Model Performance:
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization. It can be achieved by below method:
1. R-squared method:
The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on
continuous or categorical values.
o Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year,
etc.
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
Implementation of Simple Linear
Regression Algorithm using Python
Problem Statement example for Simple Linear Regression:
Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:
In this section, we will create a Simple Linear Regression model to find out the best
fitting line for representing the relationship between these two variables.
To implement the Simple Linear regression model in machine learning using Python,
we need to follow the below steps:
The first step for creating the Simple Linear Regression model is data pre-processing.
We have already done it earlier in this tutorial. But there will be some changes, which
are given in the below steps:
o First, we will import the three important libraries, which will help us for loading
the dataset, plotting the graphs, and creating the Simple Linear Regression
model.
1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd
1. data_set= pd.read_csv('Salary_Data.csv')
By executing the above line of code (ctrl+ENTER), we can read the dataset on our
Spyder IDE screen by clicking on the variable explorer option.
The above output shows the dataset, which has two variables: Salary and Experience.
Note: In Spyder IDE, the folder containing the code file must be saved as a working
directory, and the dataset or csv file should be in the same folder.
o After that, we need to extract the dependent and independent variables from
the given dataset. The independent variable is years of experience, and the
dependent variable is salary. Below is code for it:
1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values
In the above lines of code, for x variable, we have taken -1 value since we want to
remove the last column from the dataset. For y variable, we have taken 1 value as a
parameter, since we want to extract the second column and indexing starts from the
zero.
By executing the above line of code, we will get the output for X and Y variable as:
In the above output image, we can see the X (independent) variable and Y
(dependent) variable has been extracted from the given dataset.
o Next, we will split both variables into the test set and training set. We have 30
observations, so we will take 20 observations for the training set and 10
observations for the test set. We are splitting our dataset so that we can train
our model using a training dataset and then test the model using a test
dataset. The code for this is given below:
By executing the above code, we will get x-test, x-train and y-test, y-train dataset.
Consider the below images:
Test-dataset:
Training Dataset:
o For simple linear Regression, we will not use Feature Scaling. Because Python
libraries take care of it for some cases, so we don't need to perform it here.
Now, our dataset is well prepared to work on it and we are going to start
building a Simple Linear Regression model for the given problem.
Step-2: Fitting the Simple Linear Regression to the Training Set:
Now the second step is to fit our model to the training dataset. To do so, we will
import the LinearRegression class of the linear_model library from the scikit learn.
After importing the class, we are going to create an object of the class named as
a regressor. The code for this is given below:
In the above code, we have used a fit() method to fit our Simple Linear Regression
object to the training set. In the fit() function, we have passed the x_train and y_train,
which is our training dataset for the dependent and an independent variable. We
have fitted our regressor object to the training set so that the model can easily learn
the correlations between the predictor and target variables. After executing the
above lines of code, we will get the below output.
Output:
dependent (salary) and an independent variable (Experience). So, now, our model is
ready to predict the output for the new observations. In this step, we will provide the
test dataset (new observations) to the model to check whether it can predict the
correct output or not.
We will create a prediction vector y_pred, and x_pred, which will contain predictions
of test dataset, and prediction of training set respectively.
On executing the above lines of code, two variables named y_pred and x_pred will
generate in the variable explorer options that contain salary predictions for the
training set and test set.
Output:
You can check the variable by clicking on the variable explorer option in the IDE, and
also compare the result by comparing values from y_pred and y_test. By comparing
these values, we can check how good our model is performing.
Now in this step, we will visualize the training set result. To do so, we will use the
scatter() function of the pyplot library, which we have already imported in the pre-
processing step. The scatter () function will create a scatter plot of observations.
In the x-axis, we will plot the Years of Experience of employees and on the y-axis,
salary of employees. In the function, we will pass the real values of training set, which
means a year of experience x_train, training set of Salaries y_train, and color of the
observations. Here we are taking a green color for the observation, but it can be any
color as per the choice.
Now, we need to plot the regression line, so for this, we will use the plot()
function of the pyplot library. In this function, we will pass the years of experience
for training set, predicted salary for training set x_pred, and color of the line.
Next, we will give the title for the plot. So here, we will use the title() function of
the pyplot library and pass the name ("Salary vs Experience (Training Dataset)".
After that, we will assign labels for x-axis and y-axis using xlabel() and ylabel()
function.
Finally, we will represent all above things in a graph using show(). The code is given
below:
Output:
By executing the above lines of code, we will get the below graph plot as an output.
In the above plot, we can see the real values observations in green dots and
predicted values are covered by the red regression line. The regression line shows a
correlation between the dependent and independent variable.
The good fit of the line can be observed by calculating the difference between actual
values and predicted values. But as we can see in the above plot, most of the
observations are close to the regression line, hence our model is good for the
training set.
In the previous step, we have visualized the performance of our model on the
training set. Now, we will do the same for the Test set. The complete code will remain
the same as the above code, except in this, we will use x_test, and y_test instead of
x_train and y_train.
Here we are also changing the color of observations and regression line to
differentiate between the two plots, but it is optional.
Output:
By executing the above line of code, we will get the output as:
In the above plot, there are observations given by the blue color, and prediction is
given by the red regression line. As we can see, most of the observations are close to
the regression line, hence we can say our Simple Linear Regression is a good model
and able to make good predictions.
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.
o For MLR, the dependent or target variable(Y) must be the continuous/real, but
the predictor or independent variable may be of continuous or categorical
form.
o Each feature variable must model the linear relationship with the dependent
variable.
o MLR tries to fit a regression line through a multidimensional space of data-
points.
MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of
multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear
Regression, so the same is applied for the multiple linear regression equation, the
equation becomes:
1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</
sub>x<sub>2</sub>+ b<sub>3</sub>x<sub>3</sub>+...... bnxn ..............
. (a)
Where,
Y= Output/Response variable
Problem Description:
Since we need to find the Profit, so it is the dependent variable, and the other four
variables are independent variables. Below are the main steps of deploying the MLR
model:
The very first step is data pre-processing, which we have already discussed in this
tutorial. This process contains the below steps:
o Importing libraries: Firstly we will import the library which will help in
building the model. Below is the code for it:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
In above output, we can clearly see that there are five variables, in which four
variables are continuous and one is categorical variable.
Output:
Out[5]:
As we can see in the above output, the last column contains categorical variables
which are not suitable to apply directly for fitting the model. So we need to encode
this variable.
As we have one categorical variable (State), which cannot be directly applied to the
model, so we will encode it. To encode the categorical variable into numbers, we will
use the LabelEncoder class. But it is not sufficient because it still has some relational
order, which may create a wrong model. So in order to remove this problem, we will
use OneHotEncoder, which will create the dummy variables. Below is code for it:
1. #Catgorical data
2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. labelencoder_x= LabelEncoder()
4. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
5. onehotencoder= OneHotEncoder(categorical_features= [3])
6. x= onehotencoder.fit_transform(x).toarray()
Here we are only encoding one independent variable, which is state as other
variables are continuous.
Output:
As we can see in the above output, the state column has been converted into dummy
variables (0 and 1). Here each dummy variable column is corresponding to the
one State. We can check by comparing it with the original dataset. The first column
corresponds to the California State, the second column corresponds to the Florida
State, and the third column corresponds to the New York State.
Note: We should not use all the dummy variables at the same time, so it must be 1 less
than the total number of dummy variables, else it will create a dummy variable trap.
o Now, we are writing a single line of code just to avoid the dummy variable
trap:
If we do not remove the first dummy variable, then it may introduce multicollinearity
in the model.
As we can see in the above output image, the first column has been removed.
o Now we will split the dataset into training and test set. The code for this is
given below:
1. # Splitting the dataset into training and test set.
2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
The above code will split our dataset into a training set and test set.
Output: The above code will split the dataset into training set and test set. You can
check the output by clicking on the variable explorer option given in Spyder IDE. The
test set and training set will look like the below image:
Test set:
Training set:
Note: In MLR, we will not do feature scaling as it is taken care by the library, so we don't
need to do it manually.
Output:
Now, we have successfully trained our model using the training dataset. In the next
step, we will test the performance of the model using the test dataset.
By executing the above lines of code, a new vector will be generated under the
variable explorer option. We can test our model by comparing the predicted values
and test set values.
Output:
In the above output, we have predicted result set and test set. We can check model
performance by comparing these two value index by index. For example, the first
index has a predicted value of 103015$ profit and test/real value of 103282$ profit.
The difference is only of 267$, which is a good prediction, so, finally, our model is
completed here.
o We can also check the score for training dataset and test dataset. Below is the
code for it:
The above score tells that our model is 95% accurate with the training dataset
and 93% accurate with the test dataset.
Note: In the next topic, we will see how we can improve the performance of the model
using the Backward Elimination process.
1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison
Above are the possible methods for building the model in Machine learning, but we
will only use here the Backward Elimination process as it is the fastest method.
Step-1: Firstly, We need to select a significance level to stay in the model. (SL=0.05)
PlayNext
Unmute
Current Time 0:00
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Step-2: Fit the complete model with all possible predictors/independent variables.
Step-3: Choose the predictor which has the highest P-value, such that.
Step-5: Rebuild and fit the model with the remaining variables.
Unnecessary features increase the complexity of the model. Hence it is good to have
only the most significant features and keep our model simple to get the better result.
So, in order to optimize the performance of the model, we will use the Backward
Elimination method. This process is used to optimize the performance of the MLR
model as it will only include the most affecting feature and remove the least affecting
feature. Let's start to apply it to our MLR model.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('50_CompList.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, :-1].values
11. y= data_set.iloc[:, 4].values
12.
13. #Catgorical data
14. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
15. labelencoder_x= LabelEncoder()
16. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
17. onehotencoder= OneHotEncoder(categorical_features= [3])
18. x= onehotencoder.fit_transform(x).toarray()
19.
20. #Avoiding the dummy variable trap:
21. x = x[:, 1:]
22.
23.
24. # Splitting the dataset into training and test set.
25. from sklearn.model_selection import train_test_split
26. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
27.
28. #Fitting the MLR model to the training set:
29. from sklearn.linear_model import LinearRegression
30. regressor= LinearRegression()
31. regressor.fit(x_train, y_train)
32.
33. #Predicting the Test set result;
34. y_pred= regressor.predict(x_test)
35.
36. #Checking the score
37. print('Train Score: ', regressor.score(x_train, y_train))
38. print('Test Score: ', regressor.score(x_test, y_test))
From the above code, we got training and test set result as:
Note: On the basis of this score, we will estimate the effect of features on our model
after using the Backward elimination process.
Here we have used axis =1, as we wanted to add a column. For adding a row, we can
use axis =0.
Output: By executing the above line of code, a new column will be added into our
matrix of features, which will have all values equal to 1. We can check it by clicking
on the x dataset under the variable explorer option.
As we can see in the above output image, the first column is added successfully,
which corresponds to the constant term of the MLR equation.
Step: 2:
Output: By executing the above lines of code, we will get a summary table. Consider
the below image:
In the above image, we can clearly see the p-values of all the variables. Here x1, x2
are dummy variables, x3 is R&D spend, x4 is Administration spend, and x5 is
Marketing spend.
From the table, we will choose the highest p-value, which is for x1=0.953 Now, we
have the highest p-value which is greater than the SL value, so will remove the x1
variable (dummy variable) from the table and will refit the model. Below is the code
for it:
1. x_opt=x[:, [0,2,3,4,5]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()
Output:
As we can see in the output image, now five variables remain. In these variables, the
highest p-value is 0.961. So we will remove it in the next iteration.
o Now the next highest value is 0.961 for x1 variable, which is another dummy
variable. So we will remove it and refit the model. Below is the code for it:
Output:
In the above output image, we can see the dummy variable(x2) has been removed.
And the next highest value is .602, which is still greater than .5, so we need to remove
it.
o Now we will remove the Admin spend which is having .602 p-value and again
refit the model.
1. x_opt=x[:, [0,3,5]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()
Output:
As we can see in the above output image, the variable (Admin spend) has been
removed. But still, there is one variable left, which is marketing spend as it has a
high p-value (0.60). So we need to remove it.
o Finally, we will remove one more variable, which has .60 p-value for marketing
spend, which is more than a significant level.
Below is the code for it:
1. x_opt=x[:, [0,3]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()
Output:
As we can see in the above output image, only two variables are left. So only
the R&D independent variable is a significant variable for the prediction. So we can
now predict efficiently using this variable.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('50_CompList1.csv')
8.
9. #Extracting Independent and dependent Variable
10. x_BE= data_set.iloc[:, :-1].values
11. y_BE= data_set.iloc[:, 1].values
12.
13.
14. # Splitting the dataset into training and test set.
15. from sklearn.model_selection import train_test_split
16. x_BE_train, x_BE_test, y_BE_train, y_BE_test= train_test_split(x_BE, y_BE, test_size
= 0.2, random_state=0)
17.
18. #Fitting the MLR model to the training set:
19. from sklearn.linear_model import LinearRegression
20. regressor= LinearRegression()
21. regressor.fit(nm.array(x_BE_train).reshape(-1,1), y_BE_train)
22.
23. #Predicting the Test set result;
24. y_pred= regressor.predict(x_BE_test)
25.
26. #Cheking the score
27. print('Train Score: ', regressor.score(x_BE_train, y_BE_train))
28. print('Test Score: ', regressor.score(x_BE_test, y_BE_test))
Output:
After executing the above code, we will get the Training and test scores as:
As we can see, the training score is 94% accurate, and the test score is also 94%
accurate. The difference between both scores is .00149. This score is very much close
to the previous score, i.e., 0.0154, where we have included all the variables.
We got this result by using one independent variable (R&D spend) only instead
of four variables. Hence, now, our model is simple and accurate.
ML Polynomial Regression
o Polynomial Regression is a regression algorithm that models the relationship
between a dependent(y) and independent variable(x) as nth degree
polynomial. The Polynomial Regression equation is given below:
o It is also called the special case of Multiple Linear Regression in ML. Because
we add some polynomial terms to the Multiple Linear regression equation to
convert it into Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear
functions and datasets.
o Hence, "In Polynomial regression, the original features are converted
into Polynomial features of required degree (2,3,..,n) and then modeled
using a linear model."
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
When we compare the above three equations, we can clearly see that all three
equations are Polynomial equations but differ by the degree of variables. The Simple
and Multiple Linear equations are also Polynomial equations with a single degree,
and the Polynomial regression equation is Linear equation with the nth degree. So if
we add a degree to our linear equations, then it will be converted into Polynomial
Linear equations.
Note: To better understand Polynomial Regression, you must have knowledge of Simple
Linear Regression.
Implementation of Polynomial Regression
using Python:
Here we will implement the Polynomial Regression using Python. We will understand
it by comparing Polynomial Regression model with the Simple Linear Regression
model. So first, let's understand the problem for which we are going to build the
model.
o Data Pre-processing
o Build a Linear Regression model and fit it to the dataset
o Build a Polynomial Regression model and fit it to the dataset
o Visualize the result for Linear Regression and Polynomial Regression model.
o Predicting the output.
Note: Here, we will build the Linear regression model as well as Polynomial Regression
to see the results between the predictions. And Linear regression model is for reference.
The data pre-processing step will remain the same as in previous regression models,
except for some changes. In the Polynomial Regression model, we will not use
feature scaling, and also we will not split our dataset into training and test set. It has
two reasons:
o The dataset contains very less information which is not suitable to divide it
into a test and training set, else our model will not be able to find the
correlations between the salaries and levels.
o In this model, we want very accurate predictions for salary, so the model
should have enough information.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Position_Salaries.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, 1:2].values
11. y= data_set.iloc[:, 2].values
Explanation:
o In the above lines of code, we have imported the important Python libraries to
import dataset and operate on it.
o Next, we have imported the dataset 'Position_Salaries.csv', which contains
three columns (Position, Levels, and Salary), but we will consider only two
columns (Salary and Levels).
o After that, we have extracted the dependent(Y) and independent variable(X)
from the dataset. For x-variable, we have taken parameters as [:,1:2], because
we want 1 index(levels), and included :2 to make it as a matrix.
Output:
As we can see in the above output, there are three columns present (Positions,
Levels, and Salaries). But we are only considering two columns because Positions are
equivalent to the levels or may be seen as the encoded form of Positions.
Here we will predict the output for level 6.5 because the candidate has 4+ years'
experience as a regional manager, so he must be somewhere between levels 7 and 6.
Now, we will build and fit the Linear regression model to the dataset. In building
polynomial regression, we will take the Linear regression model as reference and
compare both the results. The code is given below:
In the above code, we have created the Simple Linear model using lin_regs object
of LinearRegression class and fitted it to the dataset variables (x and y).
Output:
Now we will build the Polynomial Regression model, but it will be a little different
from the Simple Linear model. Because here we will use PolynomialFeatures class
of preprocessing library. We are using this class to add some extra features to our
dataset.
After executing the code, we will get another matrix x_poly, which can be seen under
the variable explorer option:
Next, we have used another LinearRegression object, namely lin_reg_2, to fit
our x_poly vector to the linear model.
Output:
Now we will visualize the result for Linear regression model as we did in Simple
Linear Regression. Below is the code for it:
Output:
In the above output image, we can clearly see that the regression line is so far from
the datasets. Predictions are in a red straight line, and blue points are actual values. If
we consider this output to predict the value of CEO, it will give a salary of approx.
600000$, which is far away from the real value.
So we need a curved model to fit the dataset other than a straight line.
Here we will visualize the result of Polynomial regression model, code for which is
little different from the above model.
Output:
As we can see in the above output image, the predictions are close to the real values.
The above plot will vary as we will change the degree.
For degree= 3:
If we change the degree=3, then we will give a more accurate plot, as shown in the
below image.
SO as we can see here in the above output image, the predicted salary for level 6.5 is
near to 170K$-190k$, which seems that future employee is saying the truth about his
salary.
Degree= 4: Let's again change the degree to 4, and now will get the most accurate
plot. Hence we can get more accurate results by increasing the degree of Polynomial.
Now, we will predict the final output using the Linear regression model to see
whether an employee is saying truth or bluff. So, for this, we will use
the predict() method and will pass the value 6.5. Below is the code for it:
1. lin_pred = lin_regs.predict([[6.5]])
2. print(lin_pred)
Output:
[330378.78787879]
Now, we will predict the final output using the Polynomial Regression model to
compare with Linear model. Below is the code for it:
1. poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
2. print(poly_pred)
Output:
[158862.45265153]
As we can see, the predicted output for the Polynomial Regression is
[158862.45265153], which is much closer to real value hence, we can say that future
employee is saying true.
Unlike regression, the output variable of Classification is a category, not a value, such
as "Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a
Supervised learning technique, hence it takes labeled input data, which means it
contains input with the corresponding output.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the
categorical data.
Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other classes.
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done on the
basis of the most related data stored in the training dataset. It takes less time
in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a
training dataset before receiving a test dataset. Opposite to Lazy learners,
Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
1. ?(ylog(p)+(1?y)log(1?p))
2. Confusion Matrix:
3. AUC-ROC curve:
o ROC curve stands for Receiver Operating Characteristics Curve and AUC
stands for Area Under the Curve.
o It is a graph that shows the performance of the classification model at
different thresholds.
o To visualize the performance of the multi-class classification model, we use
the AUC-ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on
Y-axis and FPR(False Positive Rate) on X-axis.
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Example: There is a dataset given which contains the information of various users
obtained from the social networking sites. There is a car making company that has
recently launched a new SUV car. So the company wanted to check how many users
from the dataset, wants to purchase the car.
For this problem, we will build a Machine Learning model using the Logistic
regression algorithm. The dataset is shown in the below image. In this problem, we
will predict the purchased variable (Dependent Variable) by using age and salary
(Independent variables).
Steps in Logistic Regression: To implement the Logistic Regression using Python,
we will use the same steps as we have done in previous topics of Regression. Below
are the steps:
By executing the above lines of code, we will get the dataset as the output. Consider
the given image:
Now, we will extract the dependent and independent variables from the given
dataset. Below is the code for it:
In the above code, we have taken [2, 3] for x because our independent variables are
age and salary, which are at index 2, 3. And we have taken 4 for y variable because
our dependent variable is at index 4. The output will be:
Now we will split the dataset into a training set and test set. Below is the code for it:
1. #feature Scaling
2. from sklearn.preprocessing import StandardScaler
3. st_x= StandardScaler()
4. x_train= st_x.fit_transform(x_train)
5. x_test= st_x.transform(x_test)
We have well prepared our dataset, and now we will train the dataset using the
training set. For providing training or fitting the model to the training set, we will
import the LogisticRegression class of the sklearn library.
After importing the class, we will create a classifier object and use it to fit the model
to the logistic regression. Below is the code for it:
Output: By executing the above code, we will get the below output:
Out[5]:
Our model is well trained on the training set, so we will now predict the result by
using test set data. Below is the code for it:
In the above code, we have created a y_pred vector to predict the test set result.
Output: By executing the above code, a new vector (y_pred) will be created under
the variable explorer option. It can be seen as:
The above output image shows the corresponding predicted users who want to
purchase or not purchase the car.
Now we will create the confusion matrix here to check the accuracy of the
classification. To create it, we need to import the confusion_matrix function of the
sklearn library. After importing the function, we will call it using a new variable cm.
The function takes two parameters, mainly y_true( the actual values) and y_pred (the
targeted value return by the classifier). Below is the code for it:
Output:
By executing the above code, a new confusion matrix will be created. Consider the
below image:
We can find the accuracy of the predicted result by interpreting the confusion matrix.
By above output, we can interpret that 65+24= 89 (Correct Output) and 8+3=
11(Incorrect Output).
Finally, we will visualize the training set result. To visualize the result, we will
use ListedColormap class of matplotlib library. Below is the code for it:
In the above code, we have imported the ListedColormap class of Matplotlib library
to create the colormap for visualizing the result. We have created two new
variables x_set and y_set to replace x_train and y_train. After that, we have used
the nm.meshgrid command to create a rectangular grid, which has a range of -
1(minimum) to 1 (maximum). The pixel points we have taken are of 0.01 resolution.
Output: By executing the above code, we will get the below output:
o In the above graph, we can see that there are some Green points within the
green region and Purple points within the purple region.
o All these data points are the observation points from the training set, which
shows the result for purchased variables.
o This graph is made by using two independent variables i.e., Age on the x-
axis and Estimated salary on the y-axis.
o The purple point observations are for which purchased (dependent variable)
is probably 0, i.e., users who did not purchase the SUV car.
o The green point observations are for which purchased (dependent variable)
is probably 1 means user who purchased the SUV car.
o We can also estimate from the graph that the users who are younger with low
salary, did not purchase the car, whereas older users with high estimated
salary purchased the car.
o But there are some purple points in the green region (Buying the car) and
some green points in the purple region(Not buying the car). So we can say
that younger users with a high estimated salary purchased the car, whereas an
older user with a low estimated salary did not purchase the car.
We have successfully visualized the training set result for the logistic regression, and
our goal for this classification is to divide the users who purchased the SUV car and
who did not purchase the car. So from the output graph, we can clearly see the two
regions (Purple and Green) with the observation points. The Purple region is for
those users who didn't buy the car, and Green Region is for those users who
purchased the car.
Linear Classifier:
As we can see from the graph, the classifier is a Straight line or linear in nature as we
have used the Linear model for Logistic Regression. In further topics, we will learn for
non-linear Classifiers.
Our model is well trained using the training dataset. Now, we will visualize the result
for new observations (Test set). The code for the test set will remain same as above
except that here we will use x_test and y_test instead of x_train and y_train. Below
is the code for it:
Output:
The above graph shows the test set result. As we can see, the graph is divided into
two regions (Purple and Green). And Green observations are in the green region, and
Purple observations are in the purple region. So we can say it is a good prediction
and model. Some of the green and purple data points are in different regions, which
can be ignored as we have already calculated this error using the confusion matrix
(11 Incorrect output).
Hence our model is pretty good and ready to make new predictions for this
classification problem.
K-Nearest Neighbor(KNN) Algorithm
for Machine Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
o Example: Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data set to
the cats and dogs images and based on the most similar features it will put it
in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve this
type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily
identify the category or class of a particular dataset. Consider the below diagram:
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
o As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o There is no particular way to determine the best value for "K", so we need to
try some values to find the best out of them. The most preferred value for K is
5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects
of outliers in the model.
o Large values for K are good, but it may find some difficulties.
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the
data points for all the training samples.
Python implementation of the KNN
algorithm
To do the Python implementation of the K-NN algorithm, we will use the same
problem and dataset which we have used in Logistic Regression. But here we will
improve the performance of the model. Below is the problem description:
Problem for K-NN Algorithm: There is a Car manufacturer company that has
manufactured a new SUV car. The company wants to give the ads to the users who
are interested in buying that SUV. So for this problem, we have a dataset that
contains multiple user's information through the social network. The dataset contains
lots of information but the Estimated Salary and Age we will consider for the
independent variable and the Purchased variable is for the dependent variable.
Below is the dataset:
The Data Pre-processing step will remain exactly the same as Logistic Regression.
Below is the code for it:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
By executing the above code, our dataset is imported to our program and well pre-
processed. After feature scaling our test dataset will look like:
From the above output image, we can see that our data is successfully scaled.
And then we will fit the classifier to the training data. Below is the code for it:
Output: By executing the above code, we will get the output as:
Out[10]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
o Predicting the Test Result: To predict the test set result, we will create
a y_pred vector as we did in Logistic Regression. Below is the code for it:
Output:
In above code, we have imported the confusion_matrix function and called it using
the variable cm.
Output: By executing the above code, we will get the matrix as below:
In the above image, we can see there are 64+29= 93 correct predictions and 3+4= 7
incorrect predictions, whereas, in Logistic Regression, there were 11 incorrect
predictions. So we can say that the performance of the model is improved by using
the K-NN algorithm.
Output:
The output graph is different from the graph which we have occurred in Logistic
Regression. It can be understood in the below points:
o As we can see the graph is showing the red point and green points. The
green points are for Purchased(1) and Red Points for not Purchased(0)
variable.
o The graph is showing an irregular boundary instead of showing any
straight line or any curve because it is a K-NN algorithm, i.e., finding
the nearest neighbor.
o The graph has classified users in the correct categories as most of the
users who didn't buy the SUV are in the red region and users who
bought the SUV are in the green region.
o The graph is showing good result but still, there are some green points
in the red region and red points in the green region. But this is no big
issue as by doing this model is prevented from overfitting issues.
o Hence our model is well trained.
o Visualizing the Test set result:
After the training of the model, we will now test the result by putting a new
dataset, i.e., Test dataset. Code remains the same except some minor changes:
such as x_train and y_train will be replaced by x_test and y_test.
Below is the code for it:
Output:
The above graph is showing the output for the test data set. As we can see in the
graph, the predicted output is well good as most of the red points are in the red
region and most of the green points are in the green region.
However, there are few green points in the red region and a few red points in the
green region. So these are the incorrect observations that we have observed in the
confusion matrix(7 Incorrect output).
The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is called
a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and
dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called as Non-
linear SVM classifier.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.
The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify the pair(x1, x2) of
coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point
of the lines from both the classes. These points are called support vectors. The
distance between the vectors and the hyperplane is called as margin. And the goal
of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for
non-linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add a
third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Now we will implement the SVM algorithm using Python. Here we will use the same
dataset user_data, which we have used in Logistic regression and KNN classification.
Till the Data pre-processing step, the code will remain the same. Below is the code:
After executing the above code, we will pre-process the data. The code will give the
dataset as:
The scaled output for the test set will be:
Fitting the SVM classifier to the training set:
Now the training set will be fitted to the SVM classifier. To create the SVM classifier,
we will import SVC class from Sklearn.svm library. Below is the code for it:
In the above code, we have used kernel='linear', as here we are creating SVM for
linearly separable data. However, we can change it for non-linear data. And then we
fitted the classifier to the training dataset(x_train, y_train)
Output:
Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)
After getting the y_pred vector, we can compare the result of y_pred and y_test to
check the difference between the actual value and predicted value.
Output: Below is the output for the prediction of the test set:
Output:
As we can see in the above output image, there are 66+24= 90 correct predictions
and 8+2= 10 correct predictions. Therefore we can say that our SVM model
improved as compared to the Logistic regression model.
Output:
As we can see, the above output is appearing similar to the Logistic regression
output. In the output, we got the straight line as hyperplane because we have used a
linear kernel in the classifier. And we have also discussed above that for the 2d
space, the hyperplane in SVM is a straight line.
Output:
As we can see in the above output image, the SVM classifier has divided the users
into two regions (Purchased or Not purchased). Users who purchased the SUV are in
the red region with the red scatter points. And users who did not purchase the SUV
are in the green region with green scatter points. The hyperplane has divided the two
classes into Purchased and not purchased variable.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
o The formula for Bayes' theorem is given as:
Where,
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Overcast 0 5 5/14=
0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.
In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test
set, and then we have scaled the feature variable.
The output for the dataset is given as:
In the above code, we have used the GaussianNB classifier to fit it to the training
dataset. We can also use other classifiers as per our requirement.
Output:
Output:
The above output shows the result for prediction vector y_pred and real vector
y_test. We can see that some predications are different from the real values, which
are the incorrect predictions.
4) Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the Confusion
matrix. Below is the code for it:
Output:
As we can see in the above confusion matrix output, there are 7+3= 10 incorrect
predictions, and 65+25=90 correct predictions.
Output:
In the above output we can see that the Naïve Bayes classifier has segregated the
data points with the fine boundary. It is Gaussian curve as we have
used GaussianNB classifier in our code.
Output:
The above output is final output for test set data. As we can see the classifier has
created a Gaussian curve to divide the "purchased" and "not purchased" variables.
There are some wrong predictions which we have calculated in Confusion matrix. But
still it is pretty good classifier.
Regression vs. Classification in
Machine Learning
Regression and Classification algorithms are Supervised Learning algorithms. Both
the algorithms are used for prediction in Machine learning and work with the labeled
datasets. But the difference between both is how they are used for different machine
learning problems.
Classification:
Classification is a process of finding a function which helps in dividing the dataset
into classes based on different parameters. In Classification, a computer program is
trained on the training dataset and based on that training, it categorizes the data
into different classes.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The task of the classification algorithm is to find the mapping function to map the
input(x) to the discrete output(y).
Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different
parameters, and whenever it receives a new email, it identifies whether the email is
spam or not. If the email is spam, then it is moved to the Spam folder.
o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Regression:
Regression is a process of finding the correlations between dependent and
independent variables. It helps in predicting the continuous variables such as
prediction of Market Trends, prediction of House prices, etc.
The task of the Regression algorithm is to find the mapping function to map the
input variable(x) to the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data,
and once the training is completed, it can easily predict the weather for future days.
Types of Regression Algorithm:
In Regression, we try to find the best fit line, In Classification, we try to find the
which can predict the output more decision boundary, which can
accurately. divide the dataset into different
classes.
Linear Regression:
o Linear Regression is one of the most simple Machine learning algorithm that
comes under Supervised Learning technique and used for solving regression
problems.
o It is used for predicting the continuous dependent variable with the help of
independent variables.
o The goal of the Linear regression is to find the best fit line that can accurately
predict the output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple
Linear Regression and if there are more than two independent variables then
such regression is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between
dependent variable and independent variable. And the relationship should be
of linear nature.
o The output for Linear regression should only be the continuous values such as
price, age, salary, etc. The relationship between the dependent variable and
independent variable can be shown in below image:
In above image the dependent variable is on Y-axis (salary) and independent variable
is on x-axis(experience). The regression line can be written as:
y= a0+a1x+ ε
Logistic Regression:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Logistic Regression
Linear Regression
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.
In linear regression, we find the best fit In Logistic Regression, we find the S-
line, by which we can easily predict the curve by which we can classify the
output. samples.
The output for Linear Regression must The output of Logistic Regression must
be a continuous value, such as price, be a Categorical value such as 0 or 1,
age, etc. Yes or No, etc.
Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for
the given dataset and problem is the main point to remember while creating a
machine learning model. Below are the two reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision,
so it is easy to understand.
o The logic behind the decision tree can be easily understood because it shows
a tree-like structure.
In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further. It continues the process until it reaches the leaf node of
the tree. The complete process can be better understood using the below algorithm:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits further
into the next decision node (distance from the office) and one leaf node based on
the corresponding labels. The next decision node further gets split into one decision
node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer). Consider the below diagram:
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
Where,
2. Gini Index:
A too-large tree increases the risk of overfitting, and a small tree may not capture all
the important features of the dataset. Therefore, a technique that decreases the size
of the learning tree without reducing accuracy is known as Pruning. There are mainly
two types of tree pruning technology used:
Steps will also remain the same, which are given below:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting a Decision-Tree algorithm to the Training
set
Now we will fit the model to the training set. For this, we will import
the DecisionTreeClassifier class from sklearn.tree library. Below is the code for it:
In the above code, we have created a classifier object, in which we have passed two
main parameters;
Out[8]:
DecisionTreeClassifier(class_weight=None, criterion='entropy',
max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=0, splitter='best')
Output:
In the below output image, the predicted output and real test output are given. We
can clearly see that there are some values in the prediction vector, which are different
from the real vector values. These are prediction errors.
4. Test accuracy of the result (Creation of
Confusion matrix)
In the above output, we have seen that there were some incorrect predictions, so if
we want to know the number of correct and incorrect predictions, we need to use
the confusion matrix. Below is the code for it:
Output:
In the above output image, we can see the confusion matrix, which has 6+3= 9
incorrect predictions and62+29=91 correct predictions. Therefore, we can say
that compared to other classification models, the Decision Tree classifier made
a good prediction.
Output:
The above output is completely different from the rest classification models. It has
both vertical and horizontal lines that are splitting the dataset according to the age
and estimated salary variable.
As we can see, the tree is trying to capture each dataset, which is the case of
overfitting.
6. Visualizing the test set result:
Visualization of test set result will be similar to the visualization of the training set
except that the training set will be replaced with the test set.
Output:
As we can see in the above image that there are some green data points within the
purple region and vice versa. So, these are the incorrect predictions which we have
discussed in the confusion matrix.
The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Note: To better understand the Random Forest Algorithm, you should have knowledge
of the Decision Tree Algorithm.
o There should be some actual values in the feature variable of the dataset so
that the classifier can predict accurate results rather than a guessed result.
o The predictions from each tree must have very low correlations.
<="" li="">
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign
the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into subsets
and given to each decision tree. During the training phase, each decision tree
produces a prediction result, and when a new data point occurs, then based on the
majority of results, the Random Forest classifier predicts the final decision. Consider
the below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of
loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting the Random Forest algorithm to the
training set:
Now we will fit the Random forest algorithm to the training set. To fit it, we will
import the RandomForestClassifier class from the sklearn.ensemble library. The
code is given below:
Output:
RandomForestClassifier(bootstrap=True, class_weight=None,
criterion='entropy',
max_depth=None, max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)
Output:
Output:
As we can see in the above matrix, there are 4+4= 8 incorrect
predictions and 64+28= 92 correct predictions.
Output:
The above image is the visualization result for the Random Forest classifier working
with the training set result. It is very much similar to the Decision tree classifier. Each
data point corresponds to each user of the user_data, and the purple and green
regions are the prediction regions. The purple region is classified for the users who
did not purchase the SUV car, and the green region is for the users who purchased
the SUV.
So, in the Random Forest classifier, we have taken 10 trees that have predicted Yes or
NO for the Purchased variable. The classifier took the majority of the predictions and
provided the result.
Output:
The above image is the visualization result for the test set. We can check that there is
a minimum number of incorrect predictions (8) without the Overfitting issue. We will
get different results by changing the number of trees in the classifier.
After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Note: Clustering is somewhere similar to the classification algorithm, but the difference is
the type of dataset that we are using. In classification, we work with the labeled data set,
whereas in clustering, we work with the unlabelled dataset.
Example: Let's understand the clustering technique with the real-world example of
Mall: When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way. Other examples of
clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks. Some most common
uses of this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of
products. Netflix also uses this technique to recommend the movies and web-series
to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see the
different fruits are divided into several groups with similar properties.
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define
the number of pre-defined groups. The cluster center is created in such a way that
the distance between the data points of one cluster is minimum as compared to
another cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying different clusters in the dataset
and connects the areas of high densities into clusters. The dense areas in data space
are divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is
done by assuming some distributions commonly Gaussian Distribution.
Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters to be created. In
this technique, the dataset is divided into clusters to create a tree-like structure,
which is also called a dendrogram. The observations or any number of clusters can
be selected by cutting the tree at the correct level. The most common example of
this method is the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means
algorithm is the example of this type of clustering; it is sometimes also known as the
Fuzzy k-means algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few
are commonly used. The clustering algorithm is based on the kind of data that we
are using. Such as, some algorithms need to guess the number of clusters in the
given dataset, whereas some are required to find the minimum distance between the
observation of the dataset.
Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular
clustering algorithms. It classifies the dataset by dividing the samples into
different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in
the smooth density of data points. It is an example of a centroid-based model,
that works on updating the candidates for centroid to be the center of the
points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of
Applications with Noise. It is an example of a density-based model similar to
the mean-shift, but with some remarkable advantages. In this algorithm, the
areas of high density are separated by the areas of low density. Because of
this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be
used as an alternative for the k-means algorithm or for those cases where K-
means can be failed. In GMM, it is assumed that the data points are Gaussian
distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical
algorithm performs the bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it
does not require to specify the number of clusters. In this, each data point
sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.
Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no requirement
to predetermine the number of clusters as we did in the K-Means algorithm.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.
2. Complete Linkage: It is the farthest distance between the two points of two
different clusters. It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.
From the above-given approaches, we can apply any of them according to the type
of problem or business requirement.
The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
We can cut the dendrogram tree structure at any level as per our requirement.
The dataset is containing the information of customers that have visited a mall for
shopping. So, the mall owner wants to find some patterns or some particular
behavior of his customers using the dataset information.
1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters
The above lines of code are used to import the libraries to perform specific tasks,
such as numpy for the Mathematical operations, matplotlib for drawing the graphs
or scatter plot, and pandas for importing the dataset.
Here we will extract only the matrix of features as we don't have any further
information about the dependent variable. Code is given below:
Here we have extracted only 3 and 4 columns as we will use a 2D plot to see the
clusters. So, we are considering the Annual income and spending score as the matrix
of features.
In the above lines of code, we have imported the hierarchy module of scipy library.
This module provides us a method shc.denrogram(), which takes the linkage() as a
parameter. The linkage function is used to define the distance between two clusters,
so here we have passed the x(matrix of features), and method "ward," the popular
method of linkage in hierarchical clustering.
The remaining lines of code are to describe the labels for the dendrogram plot.
Output:
By executing the above lines of code, we will get the below output:
Using this Dendrogram, we will now determine the optimal number of clusters for
our model. For this, we will find the maximum vertical distance that does not cut
any horizontal bar. Consider the below diagram:
In the above diagram, we have shown the vertical distances that are not cutting their
horizontal bars. As we can visualize, the 4 th distance is looking the maximum, so
according to this, the number of clusters will be 5(the vertical lines in this range).
We can also take the 2nd number as it approximately equals the 4 th distance, but we
will consider the 5 clusters because the same we calculated in the K-means
algorithm.
So, the optimal number of clusters will be 5, and we will train the model in the
next step, using the same.
Then we have created the object of this class named as hc. The
AgglomerativeClustering class takes the following parameters:
In the last line, we have created the dependent variable y_pred to fit or train the
model. It does train not only the model but also returns the clusters to which each
data point belongs.
After executing the above lines of code, if we go through the variable explorer option
in our Sypder IDE, we can check the y_pred variable. We can compare the original
dataset with the y_pred variable. Consider the below image:
As we can see in the above image, the y_pred shows the clusters value, which means
the customer id 1 belongs to the 5 th cluster (as indexing starts from 0, so 4 means
5th cluster), the customer id 2 belongs to 4 th cluster, and so on.
Step-4: Visualizing the clusters
As we have trained our model successfully, now we can visualize the clusters
corresponding to the dataset.
Here we will use the same lines of code as we did in k-means clustering, except one
change. Here we will not plot the centroid that we did in k-means, because here we
have used dendrogram to determine the optimal number of clusters. The code is
given below:
Output: By executing the above lines of code, we will get the below output:
K-Means Clustering Algorithm
K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science. In this topic, we will learn
what is K-means clustering algorithm, how the algorithm works, along with the
Python implementation of k-means clustering.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for
any training.
PlayNext
Unmute
Current Time 0:00
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best clusters.
The value of k should be predetermined in this algorithm.
Hence each cluster has datapoints with some commonalities, and it is away from
other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets
into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point. So,
here we are selecting the below two points as k points, which are not the part
of our dataset. Consider the below image:
o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have
studied to calculate the distance between two points. So, we will draw a
median between both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or
blue centroid, and points to the right of the line are close to the yellow centroid. Let's
color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:
From the above image, we can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three points will be assigned to
new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding
new centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign
the data points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:
How to choose the value of "K number of
clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms. But choosing the optimal number of clusters is a big task. There
are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
The method is given below:
Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each
data point and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method
such as Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
Since the graph shows the sharp bend, which looks like an elbow, hence it is known
as the elbow method. The graph for the elbow method looks like the below image:
Note: We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero,
and that will be the endpoint of the plot.
In the given dataset, we have Customer_Id, Gender, Age, Annual Income ($), and
Spending Score (which is the calculated value of how much a customer has spent in
the mall, the more the value, the more he has spent). From this dataset, we need to
calculate some patterns, as it is an unsupervised method, so we don't know what to
calculate exactly.
o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters
o Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our model,
which is part of data pre-processing. The code is given below:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
In the above code, the numpy we have imported for the performing mathematics
calculation, matplotlib is for plotting the graph, and pandas are for managing the
dataset.
By executing the above lines of code, we will get our dataset in the Spyder IDE. The
dataset looks like the below image:
Here we don't need any dependent variable for data pre-processing step as it is a
clustering problem, and we have no idea about what to determine. So we will just
add a line of code for the matrix of features.
As we can see, we are extracting only 3 rd and 4th feature. It is because we need a 2d
plot to visualize the model, and some features are not required, such as customer_id.
Step-2: Finding the optimal number of clusters
using the elbow method
In the second step, we will try to find the optimal number of clusters for our
clustering problem. So, as discussed above, here we are going to use the elbow
method for this purpose.
As we know, the elbow method uses the WCSS concept to draw the plot by plotting
WCSS values on the Y-axis and the number of clusters on the X-axis. So we are going
to calculate the value for WCSS for different k values ranging from 1 to 10. Below is
the code for it:
As we can see in the above code, we have used the KMeans class of sklearn. cluster
library to form the clusters.
Next, we have created the wcss_list variable to initialize an empty list, which is used
to contain the value of wcss computed for different values of k ranging from 1 to 10.
After that, we have initialized the for loop for the iteration on a different value of k
ranging from 1 to 10; since for loop in Python, exclude the outbound limit, so it is
taken as 11 to include 10th value.
The rest part of the code is similar as we did in earlier topics, as we have fitted the
model on a matrix of features and then plotted the graph between the number of
clusters and WCSS.
Output: After executing the above code, we will get the below output:
From the above plot, we can see the elbow point is at 5. So the number of clusters
here will be 5.
Step- 3: Training the K-means algorithm on the
training dataset
As we have got the number of clusters, so we can now train the model on the
dataset.
To train the model, we will use the same two lines of code as we have used in the
above section, but here instead of using i, we will use 5, as we know there are 5
clusters that need to be formed. The code is given below:
The first line is the same as above for creating the object of KMeans class.
In the second line of code, we have created the dependent variable y_predict to train
the model.
By executing the above lines of code, we will get the y_predict variable. We can check
it under the variable explorer option in the Spyder IDE. We can now compare the
values of y_predict with our original dataset. Consider the below image:
From the above image, we can now relate that the CustomerID 1 belongs to a cluster
3(as index starts from 0, hence 2 will be considered as 3), and 2 belongs to cluster 4,
and so on.
Step-4: Visualizing the Clusters
The last step is to visualize the clusters. As we have 5 clusters for our model, so we
will visualize each cluster one by one.
To visualize the clusters will use scatter plot using mtp.scatter() function of
matplotlib.
In above lines of code, we have written code for each clusters, ranging from 1 to 5.
The first coordinate of the mtp.scatter, i.e., x[y_predict == 0, 0] containing the x value
for the showing the matrix of features values, and the y_predict is ranging from 0 to
1.
Output:
The output image is clearly showing the five different clusters with different colors.
The clusters are formed between two parameters of the dataset; Annual income of
customer and Spending. We can change the colors and labels as per the requirement
or choice. We can also observe some points from the above patterns, which are given
below:
o Cluster1 shows the customers with average salary and average spending so
we can categorize these customers as
o Cluster2 shows the customer has a high income but low spending, so we can
categorize them as careful.
o Cluster3 shows the low income and also low spending so they can be
categorized as sensible.
o Cluster4 shows the customers with low income with very high spending so
they can be categorized as careless.
o Cluster5 shows the customers with high income and high spending so they
can be categorized as target, and these customers can be the most profitable
customers for the mall owner.
This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly
used for market basket analysis and helps to find those products that can be bought
together. It can also be used in the healthcare field to find drug reactions for
patients.
Frequent itemsets are those items whose support is greater than the threshold value
or user-specified minimum support. It means if A & B are the frequent itemsets
together, then individually A and B should also be the frequent itemset.
Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two
transactions, 2 and 3 are the frequent itemsets.
Note: To better understand the apriori algorithm, and related term such as support and
confidence, it is recommended to understand the association rule learning.
Step-1: Determine the support of itemsets in the transactional database, and select
the minimum support and confidence.
Step-2: Take all supports in the transaction with higher support value than the
minimum or selected support value.
Step-3: Find all the rules of these subsets that have higher confidence value than the
threshold or minimum confidence.
Example: Suppose we have the following dataset that has various transactions, and
from this dataset, we need to find the frequent itemsets and generate the association
rules using the Apriori algorithm:
Solution:
o In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in the
given dataset. This table is called the Candidate set or C1.
o Now, we will take out all the itemsets that have the greater support count that
the Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum
support, except the E, so E itemset will be removed.
o Again, we need to compare the C2 Support count with the minimum support
count, and after comparing, the itemset with less support count will be
eliminated from the table C2. It will give us the below table for L2
o For C3, we will repeat the same two processes, but now we will form the C3
table with subsets of three itemsets together, and will calculate the support
count from the dataset. It will give the below table:
o Now we will create the L3 table. As we can see from the above C3 table, there
is only one combination of itemset that has support count equal to the
minimum support count. So, the L3 will have only one combination, i.e., {A, B,
C}.
Step-4: Finding the association rules for the
subsets:
To generate the association rules, first, we will create a new table with the possible
rules from the occurred combination {A, B.C}. For all the rules, we will calculate the
Confidence using formula sup( A ^B)/A. After calculating the confidence value for
all rules, we will exclude the rules that have less confidence than the minimum
threshold(50%).
As the given threshold or minimum confidence is 50%, so the first three rules A ^B
→ C, B^C → A, and A^C → B can be considered as the strong association rules for
the given problem.
The retailer has a dataset information that contains a list of transactions made by his
customer. In the dataset, each row shows the products purchased by customers or
transactions made by the customer. To solve this problem, we will perform the below
steps:
o Data Pre-processing
o Training the Apriori model on the dataset
o Visualizing the results
The first step is data pre-processing step. Under this, first, we will perform the
importing of the libraries. The code for this is given below:
Before importing the libraries, we will use the below line of code to install the apyori
package to use further, as Spyder IDE does not contain it:
Below is the code to implement the libraries that will be used for different tasks of
the model:
1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd
In the above code, the first line is showing importing the dataset into pandas format.
The second line of the code is used because the apriori() that we will use for training
our model takes the dataset in the format of the list of the transactions. So, we have
created an empty list of the transaction. This list will contain all the itemsets from 0
to 7500. Here we have taken 7501 because, in Python, the last index is not
considered.
To train the model, we will use the apriori function that will be imported from
the apyroi package. This function will return the rules to train the model on the
dataset. Consider the below code:
In the above code, the first line is to import the apriori function. In the second line,
the apriori function returns the output as the rules. It takes the following parameters:
Now we will visualize the output for our apriori model. Here we will follow some
more steps, which are given below:
o Displaying the result of the rules occurred from the apriori function
1. results= list(rules)
2. results
By executing the above lines of code, we will get the 9 rules. Consider the below
output:
Output:
As we can see, the above output is in the form that is not easily understandable. So,
we will print all the rules in a suitable format.
Output:
By executing the above lines of code, we will get the below output:
Rule: chicken -> light cream
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
=====================================
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
=====================================
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
=====================================
Rule: fromage blanc -> honey
Support: 0.0033333333333333335
Confidence: 0.2450980392156863
Lift: 5.178127589063795
=====================================
Rule: ground beef -> herb & pepper
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
=====================================
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
=====================================
Rule: olive oil -> light cream
Support: 0.0032
Confidence: 0.20512820512820515
Lift: 3.120611639881417
=====================================
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
=====================================
Rule: pasta -> shrimp
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
=====================================
From the above output, we can analyze each rule. The first rules, which is Light
cream → chicken, states that the light cream and chicken are bought frequently by
most of the customers. The support for this rule is 0.0045, and the confidence
is 29%. Hence, if a customer buys light cream, it is 29% chances that he also buys
chicken, and it is .0045 times appeared in the transactions. We can check all these
things in other rules also.
The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by the
various big retailer to discover the associations between items. We can understand it
by taking an example of a supermarket, as in a supermarket, all products that are
purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or
milk, so these products are stored within a shelf or mostly nearby. Consider the
below diagram:
PlayNext
Unmute
/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1. Apriori
2. Eclat
3. F-P Growth Algorithm
o Support
o Confidence
o Lift
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is
defined as the fraction of the transaction T that contains the itemset X. If there are X
datasets, then for transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the
items X and Y occur together in the dataset when the occurrence of X is already
given. It is the ratio of the transaction that contains X and Y to the number of records
that contain X.
Lift
It is the strength of any rule, which can be defined as below formula:
It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to
work on the databases that contain transactions. This algorithm uses a breadth-first
search and Hash Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products
that can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database. It
performs faster execution than Apriori Algorithm.
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3
classes, it is 3*3 table, and so on.
o The matrix is divided into two dimensions, that are predicted
values and actual values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and
actual values are the true values for the given observations.
o It looks like the below table:
o True Negative: Model has given prediction No, and the real or actual value
was also No.
o True Positive: The model has predicted yes, and the actual value was also
true.
o False Negative: The model has predicted no, but the actual value was Yes, it
is also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is
also called a Type-I error.
Suppose we are trying to create a model that can predict the result for the disease
that is either a person has that disease or not. So, the confusion matrix for this is
given as:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o The table is given for the two-class classifier, which has two predictions "Yes"
and "NO." Here, Yes defines that patient has the disease, and No defines that
patient does not has that disease.
o The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.
o The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.
o Misclassification rate: It is also termed as Error rate, and it defines how often
the model gives the wrong predictions. The value of error rate can be
calculated as the number of incorrect predictions to all number of the
predictions made by the classifier. The formula is given below:
o Recall: It is defined as the out of total positive classes, how our model
predicted correctly. The recall must be as high as possible.
o F-measure: If two models have low precision and high recall or vice versa, it is
difficult to compare these models. So, for this purpose, we can use F-score.
This score helps us to evaluate the recall and precision at the same time. The
F-score is maximum if the recall is equal to the precision. It can be calculated
using the below formula:
In machine learning, there is always the need to test the stability of the model. It
means based only on the training dataset; we can't fit our model on the training
dataset. For this purpose, we reserve a particular sample of the dataset, which was
not part of the training dataset. After that, we test our model on that sample before
deployment, and this complete process comes under cross-validation. This is
something different from the general train-test split.
PlayNext
Unmute
/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
But it has one of the big disadvantages that we are just using a 50% dataset to train
our model, so the model may miss out to capture important information of the
dataset. It also tends to give the underfitted model.
Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are
total n datapoints in the original input dataset, then n-p data points will be used as
the training dataset and the p data points as the validation set. This complete
process is repeated for all the samples, and the average error is calculated to know
the effectiveness of the model.
o In this approach, the bias is minimum as all the data points are used.
o The process is executed for n times; hence execution time is high.
o This approach leads to high variation in testing the effectiveness of the model
as we iteratively check against one data point.
K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples
of equal sizes. These samples are called folds. For each learning set, the prediction
function uses k-1 folds, and the rest of the folds are used for the test set. This
approach is a very popular CV approach because it is easy to understand, and the
output is less biased than other methods.
Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5
folds. On 1st iteration, the first fold is reserved for test the model, and rest are used to
train the model. On 2nd iteration, the second fold is used to test the model, and rest
are used to train the model. This process will continue until each fold is not used for
the test fold.
It can be understood with an example of housing prices, such that the price of some
houses can be much high than other houses. To tackle such situations, a stratified k-
fold cross-validation technique is useful.
Holdout Method
This method is the simplest cross-validation technique among all. In this method, we
need to remove a subset of the training data and use it to get prediction results by
training it on the rest part of the dataset.
The error that occurs in this process tells how well our model will perform with the
unknown dataset. Although this approach is simple to perform, it still faces the issue
of high variance, and it also produces misleading results sometimes.
Comparison of Cross-validation to
train/test split in Machine Learning
o Train/test split: The input data is divided into two parts, that are training set
and test set on a ratio of 70:30, 80:20, etc. It provides a high variance, which is
one of the biggest disadvantages.
o Training Data: The training data is used to train the model, and the
dependent variable is known.
o Test Data: The test data is used to make the predictions from the
model that is already trained on the training data. This has the same
features as training data but not the part of that.
o Cross-Validation dataset: It is used to overcome the disadvantage of
train/test split by splitting the dataset into groups of train/test splits, and
averaging the result. It can be used if we want to optimize our model that has
been trained on the training dataset for the best performance. It is more
efficient as compared to train/test split as every observation is used for the
training and testing both.
Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:
o For the ideal conditions, it provides the optimum output. But for the
inconsistent data, it may produce a drastic result. So, it is one of the big
disadvantages of cross-validation, as there is no certainty of the type of data
in machine learning.
o In predictive modeling, the data evolves over a period, due to which, it may
face the differences between the training set and validation sets. Such as if we
create a model for the prediction of stock market values, and the data is
trained on the previous 5 years stock values, but the realistic future values for
the next 5 years may drastically different, so it is difficult to expect the correct
output for such situations.
Applications of Cross-Validation
o This technique can be used to compare the performance of different
predictive modeling methods.
o It has great scope in the medical research field.
o It can also be used for the meta-analysis, as it is already being used by the
data scientists in the field of medical statistics.
Data Science and Machine Learning are closely related to each other but have
different functionalities and different goals. At a glance, Data Science is a field to
study the approaches to find insights from the raw data. Whereas, Machine Learning is
a technique used by the group of data scientists to enable the machines to learn
automatically from the past data. To understand the difference in-depth, let's first
have a brief introduction to these two technologies.
Note: Data Science and Machine Learning are closely related to each other but cannot
be treated as synonyms.
A data scientist collects the raw data from various sources, prepares and pre-
processes the data, and applies machine learning algorithms, predictive analysis to
extract useful insights from the collected data.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
For example, Netflix uses data science techniques to understand user interest by
mining the data and viewing patterns of its users.
Machine Leaning allows the computers to learn from the past experiences by its own, it
uses statistical methods to improve the performance and predict the output without
being explicitly programmed.
It is a broad term that includes It is used in the data modeling step of the
various steps to create a model for a data science as a complete process.
given problem and deploy the model.
A data scientist needs to have skills to Machine Learning Engineer needs to have
use big data tools like Hadoop, Hive skills such as computer science
and Pig, statistics, programming in fundamentals, programming skills in Python
Python, R, or Scala. or R, statistics and probability concepts, etc.
It can work with raw, structured, and It mostly requires structured data to work
unstructured data. on.
Data scientists spent lots of time in ML engineers spend a lot of time for
handling the data, cleansing the data, managing the complexities that occur
and understanding its patterns. during the implementation of algorithms
and mathematical concepts behind that.
In this topic, we will learn how machine learning is different from deep learning. But
before learning the differences, lets first have a brief introduction of machine
learning and deep learning.
Machine Leaning allows the computers to learn from the experiences by its own, use
statistical methods to improve the performance and predict the output without being
explicitly programmed.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In deep learning, models use different layers to learn and discover insights from the
data.
Type of data Machine learning models Deep Learning models can work
mostly require data in a with structured and unstructured
structured form. data both as they rely on the
layers of the Artificial neural
network.
Suitable for Machine learning models Deep learning models are suitable
are suitable for solving for solving complex problems.
simple or bit-complex
problems.
Hence, if you have lots of data and high hardware capabilities, go with deep learning.
But if you don't have any of them, choose the ML model to solve your problem.
Conclusion: In conclusion, we can say that deep learning is machine learning with
more capabilities and a different working approach. And selecting any of them to
solve a particular problem is depend on the amount of data and complexity of the
problem.
Introduction to Dimensionality
Reduction Technique
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given dataset is
known as dimensionality, and the process to reduce these features is called
dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes
the predictive modeling task more complicated. Because it is very difficult to visualize
or make predictions for the training dataset with a high number of features, for such
cases, dimensionality reduction techniques are required to use.
It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used
for data visualization, noise reduction, cluster analysis, etc.
PauseNext
Unmute
Duration 18:10
Loaded: 9.17%
Â
Fullscreen
The Curse of Dimensionality
Handling the high-dimensional data is very difficult in practice, commonly known as
the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of
features increases, the number of samples also gets increased proportionally, and the
chance of overfitting also increases. If the machine learning model is trained on high-
dimensional data, it becomes overfitted and results in poor performance.
Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.
o By reducing the dimensions of the features, the space required to store the
dataset also gets reduced.
o Less Computation training time is required for reduced dimensions of
features.
o Reduced dimensions of features of the dataset help in visualizing the data
quickly.
o It removes the redundant features (if present) by taking care of
multicollinearity.
Feature Selection
Feature selection is the process of selecting the subset of the relevant features and
leaving out the irrelevant features present in a dataset to build a model of high
accuracy. In other words, it is a way of selecting the optimal features from the input
dataset.
1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant
features is taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2. Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine
learning model for its evaluation. In this method, some features are fed to the ML
model, and evaluate the performance. The performance decides whether to add
those features or remove to increase the accuracy of the model. This method is more
accurate than the filtering method but complex to work. Some common techniques
of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination
o LASSO
o Elastic Net
o Ridge Regression, etc.
Feature Extraction:
Feature extraction is the process of transforming the space containing many
dimensions into space with fewer dimensions. This approach is useful when we want
to keep the whole information but use fewer resources while processing the
information.
c. Kernel PCA
c. Forward Selection
d. Score comparison
h. Random Forest
i. Factor Analysis
j. Auto-Encoder
PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels.
o In this technique, firstly, all the n variables of the given dataset are taken to
train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1
features for n times, and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in the
performance of the model, and then we will drop that variable or features;
after that, we will be left with n-1 features.
o Repeat the complete process until no feature can be dropped.
In this technique, by selecting the optimum performance of the model and maximum
tolerable error rate, we can define the optimal number of features require for the
machine learning algorithms.
o We start with a single feature only, and progressively we will add each feature
at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the
performance of the model.
Random Forest
Random Forest is a popular and very useful feature selection algorithm in machine
learning. This algorithm contains an in-built feature importance package, so we do
not need to program it separately. In this technique, we need to generate a large set
of trees against the target variable, and with the help of usage statistics of each
attribute, we need to find the subset of features.
Random forest algorithm takes only numerical variables, so we need to convert the
input data into numeric data using hot encoding.
Factor Analysis
Factor analysis is a technique in which each variable is kept within a group according
to the correlation with other variables, it means variables within a group can have a
high correlation between themselves, but they have a low correlation with variables
of other groups.
Auto-encoders
One of the popular methods of dimensionality reduction is auto-encoder, which is a
type of ANN or artificial neural network, and its main aim is to copy the inputs to
their outputs. In this, the input is compressed into latent-space representation, and
output is occurred using this representation. It has mainly two parts:
o Encoder: The function of the encoder is to compress the input to form the
latent-space representation.
o Decoder: The function of the decoder is to recreate the output from the
latent-space representation.
Machine Learning Algorithms
Machine Learning algorithms are the programs that can learn the hidden patterns
from the data, predict the output, and improve the performance from experiences on
their own. Different algorithms can be used in machine learning for different tasks,
such as simple linear regression that can be used for prediction problems like stock
market prediction, and the KNN algorithm can be used for classification
problems.
In this topic, we will see the overview of some popular and most commonly
used machine learning algorithms along with their use cases and categories.
The below diagram illustrates the different ML algorithm, along with the categories:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1) Supervised Learning Algorithm
Supervised learning is a type of Machine learning in which the machine needs
external supervision to learn. The supervised learning models are trained using the
labeled dataset. Once the training and processing are done, the model is tested by
providing a sample test data to check whether it predicts the correct output.
The goal of supervised learning is to map input data with the output data.
Supervised learning is based on supervision, and it is the same as when a student
learns things in the teacher's supervision. The example of supervised learning
is spam filtering.
o Classification
o Regression
o Clustering
o Association
3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing
actions, and learn with the help of feedback. The feedback is given to the agent in
the form of rewards, such as for each good action, he gets a positive reward, and for
each bad action, he gets a negative reward. There is no supervision provided to the
agent. Q-Learning algorithm is used in reinforcement learning. Read more…
1. Linear Regression
Linear regression is one of the most popular and simple machine learning algorithms
that is used for predictive analysis. Here, predictive analysis defines prediction of
something, and linear regression makes predictions for continuous numbers such
as salary, age, etc.
It shows the linear relationship between the dependent and independent variables,
and shows how the dependent variable(y) changes according to the independent
variable (x).
It tries to best fit a line between the dependent and independent variables, and this
best fit line is knowns as the regression line.
y= a0+ a*x+ b
x= independent variable
a0 = Intercept of line.
The below diagram shows the linear regression for prediction of weight according to
height: Read more..
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used to predict the
categorical variables or discrete values. It can be used for the classification
problems in machine learning, and the output of the logistic regression algorithm can
be either Yes or NO, 0 or 1, Red or Blue, etc.
Logistic regression is similar to the linear regression except how they are used, such
as Linear regression is used to solve the regression problem and predict continuous
values, whereas Logistic regression is used to solve the Classification problem and
used to predict the discrete values.
Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0 and
1. The S-shaped curve is also known as a logistic function that uses the concept of
the threshold. Any value above the threshold will tend to 1, and below the threshold
will tend to 0. Read more..
The data points that help to define the hyperplane are known as support vectors,
and hence it is named as support vector machine algorithm.
Some real-life applications of SVM are face detection, image classification, Drug
discovery, etc. Consider the below diagram:
As we can see in the above diagram, the hyperplane has classified datasets into two
different classes. Read more..
Naïve Bayes classifier is one of the best classifiers that provide a good result for a
given problem. It is easy to build a naïve bayesian model, and well suited for the
huge amount of dataset. It is mostly used for text classification. Read more..
7. K-Means Clustering
K-means clustering is one of the simplest unsupervised learning algorithms, which is
used to solve the clustering problems. The datasets are grouped into K different
clusters based on similarities and dissimilarities, it means, datasets with most of the
commonalties remain in one cluster which has very less or no commonalities
between other clusters. In K-means, K-refers to the number of clusters,
and means refer to the averaging the dataset in order to find the centroid.
This algorithm starts with a group of randomly selected centroids that form the
clusters at starting and then perform the iterative process to optimize these
centroids' positions.
It can be used for spam detection and filtering, identification of fake news, etc. Read
more..
It contains multiple decision trees for subsets of the given dataset, and find the average
to improve the predictive accuracy of the model. A random-forest should contain 64-
128 trees. The greater number of trees leads to higher accuracy of the algorithm.
To classify a new dataset or object, each tree gives the classification result and based
on the majority votes, the algorithm predicts the final output.
Random forest is a fast algorithm, and can efficiently deal with the missing &
incorrect data. Read more..
9. Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to solve the
association problems. It uses frequent itemsets to generate association rules, and it is
designed to work on the databases that contain transactions. With the help of these
association rule, it determines how strongly or how weakly two objects are
connected to each other. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.
The algorithm process iteratively for finding the frequent itemsets from the large
dataset.
The apriori algorithm was given by the R. Agrawal and Srikant in the year 1994. It is
mainly used for market basket analysis and helps to understand the products that
can be bought together. It can also be used in the healthcare field to find drug
reactions in patients. Read more..
PCA works by considering the variance of each attribute because the high variance
shows the good split between the classes, and hence it reduces the dimensionality.
Before understanding the overfitting and underfitting, let's understand some basic
term that will help to understand this topic well:
o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
Overfitting
Overfitting occurs when our machine learning model tries to cover all the data points
or more than the required data points present in the given dataset. Because of this,
the model starts caching noise and inaccurate values present in the dataset, and all
these factors reduce the efficiency and accuracy of the model. The overfitted model
has low bias and high variance.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Example: The concept of the overfitting can be understood by the below graph of
the linear regression output:
As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient, but in reality, it is not so. Because the
goal of the regression model to find the best fit line, but here we have not got any
best fit, so, it will generate the prediction errors.
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which
we can reduce the occurrence of overfitting in our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of training
data can be stopped at an early stage, due to which the model may not learn enough
from the training data. As a result, it may fail to find the best fit of the dominant
trend in the data.
In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.
Example: We can understand the underfitting using below output of the linear
regression model:
As we can see from the above diagram, the model is unable to capture the data
points present in the plot.
Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines how
closely the result or predicted values match the true values of the dataset.
The model with a good fit is between the underfitted and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.
As when we train our model for a time, the errors in the training data go down, and
the same happens with test data. But if we train the model for a long duration, then
the performance of the model may decrease due to the overfitting, as the model also
learn the noise present in the dataset. The errors in the test dataset start
increasing, so the point, just before the raising of errors, is the good point, and we can
stop here for achieving a good model.
There are two other methods by which we can get a good point for our model, which
are the resampling method to estimate model accuracy and validation dataset.
PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In Statistics, our main goal is to determine the statistical significance of our result,
and this statistical significance is made on below three concepts:
o Hypothesis Testing
o Normal Distribution
o Statistical Significance
Hypothesis Testing
Hypothesis testing can be defined between two terms; Null
hypothesis and Alternative Hypothesis. It is used to check the validity of the null
hypothesis or claim made using the sample data. Here, the null hypothesis (H0) is
defined as the hypothesis with no statistical significance between two variables, while
an alternative hypothesis is defined as the hypothesis with a statistical significance
between the two variables. No significant relationship between the two variables tells
that one variable will not affect the other variable. Thus, the Null hypothesis tells that
what you are going to prove doesn't actually happen. If the independent variable
doesn't affect the dependent variable, then it shows the alternative hypothesis
condition.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In a simple way, we can say that in hypothesis testing, first, we make a claim that is
assumed as a null hypothesis using the sample data. If this claim is found invalid, then
the alternative hypothesis is selected. This assumption or claim is validated using the
p-value to see if it is statistically significant or not using the evidence. If the evidence
supports the alternative hypothesis, then the null hypothesis is rejected.
Normal Distribution
The normal distribution, which is also known as Gaussian distribution, is the
Probability distribution function. It is symmetric about the mean, and use to see the
distribution of data using a graph plot. It shows that data near the mean is more
frequent to occur as compared to data which is far from the mean, and it looks like
a bell-shaped curve. The two main terms of the normal distribution are mean(μ) and
standard deviation(σ). For a normal distribution, the mean is zero, and the standard
deviation is 1.
In hypothesis testing, we need to calculate z-score. Z-score is the number of
standard deviations from the mean of data-point.
Here, the z-score inform us that where the data lies compared to the average
population.
Statistical significance:
To determine the statistical significance of the hypothesis test is the goal of
calculating the p-value. To do this, first, we need to set a threshold, which is said to
be alpha. We should always set the value of alpha before the experiment, and it is set
to be either 0.05 or 0.01(depending on the type of problem).
The result is concluded as a significant result if the observed p-value is lower than
alpha.
Errors in P-value
Two types of errors are defined for the p-value; these errors are given below:
1. Type I error
2. Type II error
Type I Error:
It is defined as the incorrect or false rejection of the Null hypothesis. For this error,
the maximum probability is alpha, and it is set in advance. The error is not affected
by the sample size of the dataset. The type I error increases as we increase the
number of tests or endpoints.
Type II error
Type II error is defined as the wrong acceptance of the Null hypothesis. The
probability of type II error is beta, and the beta depends upon the sample size and
value of alpha. The beta cannot be determined as the function of the true population
effect. The value of beta is inversely proportional to the sample size, and it means
beta decreases as the sample size increases.
The value of beta also decreases when we increase the number of tests or endpoints.
We can understand the relationship between hypothesis testing and decision on the
basis of the below table:
Decision
Importance of P-value
The importance of p-value can be understood in two aspects:
Sometimes the machine learning model performs well with the training data but
does not perform well with the test data. It means the model is not able to predict
the output when deals with unseen data by introducing noise in the output, and
hence the model is called overfitted. This problem can be deal with the help of a
regularization technique.
This technique can be used in such a way that it will allow to maintain all variables or
features in the model by reducing the magnitude of the variables. Hence, it maintains
accuracy as well as a generalization of the model.
It mainly regularizes or reduces the coefficient of features toward zero. In simple
words, "In regularization technique, we reduce the magnitude of the features by
keeping the same number of features."
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b
β0,β1,…..βn are the weights or magnitude attached to the features, respectively. Here
represents the bias of the model, and b represents the intercept.
Linear regression models try to optimize the β0 and b to minimize the cost function.
The equation for the cost function for the linear model is given below:
Now, we will add a loss function and optimize parameter to make the model that can
predict the accurate value of Y. The loss function for the linear regression is called
as RSS or Residual sum of squares.
Techniques of Regularization
There are mainly two types of regularization techniques, which are given below:
o Ridge Regression
o Lasso Regression
Ridge Regression
o In the above equation, the penalty term regularizes the coefficients of the
model, and hence ridge regression reduces the amplitudes of the coefficients
that decreases the complexity of the model.
o As we can see from the above equation, if the values of λ tend to zero, the
equation becomes the cost function of the linear regression
model. Hence, for the minimum value of λ, the model will resemble the linear
regression model.
o A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge
regression can be used.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
o Some of the features in this technique are completely neglected for model
evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the model
as well as the feature selection.
o Ridge regression is mostly used to reduce the overfitting in the model, and it
includes all the features present in the model. It reduces the complexity of the
model by shrinking the coefficients.
o Lasso regression helps to reduce the overfitting in the model as well as
feature selection.
Voice search, voice dialing, and appliance control are some real-world examples
of speech recognition. Alexa and Google Home are the most widely used speech
recognition software.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Similar to speech recognition, Image recognition is also the most widely used
example of Machine Learning technology that helps identify any object in the form
of a digital image. There are some real-world examples of Image recognition, such
as,
Tagging the name on any photo as we have seen on Facebook. It is also used in
recognizing handwriting by segmenting a single letter into smaller images.
Further, there is the biggest example of Image recognition is facial recognition. We
all are using new generation mobile phones, where we use facial recognition
techniques to unlock our devices. Hence, it also helps to increase the security of the
system.
4. Google Translation
Suppose you work on an international banking project like French, German, etc., but
you only know English. In that case, this will be a very panic moment for you because
you can't proceed further without reviewing documents. Google Translator software
helps to translate any language into the desired language. So, in this way, you can
convert French, German, etc., into English, Hindi, or any other language. This makes
the job of different sectors very easy as a user can work on any country's project
hassle-free.
Google uses the Google Neural Machine Translation to detect any language and
translate it into any desired language.
5. Prediction
Prediction system also uses Machine learning algorithms for making predictions.
There are various sectors where predictions are used. For example, in bank loan
systems, error probability can be determined using predictions with machine
learning. For this, the available data are classified into different groups with the set of
rules provided by analysts, and once the classification is done, the error probability is
predicted.
6. Extraction
One of the best examples of machine learning is the extraction of information. In this
process, structured data is extracted from unstructured data, and which is used in
predictive analytics tools. The data is usually found in a raw or unstructured form that
is not useful, and to make it useful, the extraction process is used. Some real-world
examples of extraction are:
7. Statistical Arbitrage
Arbitrage is an automated trading process, which is used in the finance industry to
manage a large volume of securities. The process uses a trading algorithm to analyze
a set of securities using economic variables and correlations. Some examples of
statistical arbitrage are as follows:
9. Self-driving cars
The future of the automobile industry is self-driving cars. These are driverless cars,
which are based on concepts of deep learning and machine learning. Some
commonly used machine learning algorithms in self-driving cars are Scale-invariant
feature transform (SIFT), AdaBoost, TextonBoost, YOLO(You only look once).
o Facility protections
o Operation monitoring
o Parking lots
o Traffic monitoring
o Shopping patterns
Some machine learning algorithms that are used in email spam filtering and malware
detection are Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier.
Machine Learning technology also helps in finding discounted prices, best prices,
promotional prices, etc., for each customer.
It collects data from the user's answer and creates a statical model to determine that
how long a person can remember the word, and before requiring a refresher, it
provides that information.
Introduction to Semi-Supervised
Learning
Semi-Supervised learning is a type of Machine Learning algorithm that
represents the intermediate ground between Supervised and Unsupervised
learning algorithms. It uses the combination of labeled and unlabeled datasets
during the training period.
Before understanding the Semi-Supervised learning, you should know the main
categories of Machine Learning algorithms. Machine Learning consists of three main
categories: Supervised Learning, Unsupervised Learning, and Reinforcement
Learning. Further, the basic difference between Supervised and unsupervised
learning is that supervised learning datasets consist of an output label training data
associated with each tuple, and unsupervised datasets do not consist the same. Semi-
supervised learning is an important category that lies between the Supervised
and Unsupervised machine learning. Although Semi-supervised learning is the
middle ground between supervised and unsupervised learning and operates on the
data that consists of a few labels, it mostly consists of unlabeled data. As labels are
costly, but for the corporate purpose, it may have few labels.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Firstly, it trains the model with less amount of training data similar to the
supervised learning models. The training continues until the model gives
accurate results.
o The algorithms use the unlabeled dataset with pseudo labels in the next step,
and now the result may not be accurate.
o Now, the labels from labeled training data and pseudo labels data are linked
together.
o The input data in labeled training data and unlabeled training data are also
linked.
o In the end, again train the model with the new combined input as did in the
first step. It will reduce errors and improve the accuracy of the model.
Mathematics has always been a good friend for some people and a phobia or anxiety
for some people. Many students don't find interest in mathematics around the globe
as they think that topics covered in mathematics are less or not relevant to practical
or real-world problems. But with the growth of machine learning, people are getting
motivated to learn mathematics as it is directly used in designing ML algorithms. It is
also very helpful to learn the concepts behind this. In this topic, we will learn all the
essential concepts of Mathematics that are used in Machine Learning.
Note: It is not required to go deep in learning Mathematics for working with simple
machine learning models; rather, knowing essential Maths concepts is enough to
understand how it is applied in ML.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Linear algebra
o Multivariate Calculus
o Probability Theory
o Discrete Mathematics
o Statistics
o Algorithm & Optimization
o Others
The below graph shows the importance of different Maths concepts in Machine
Learning. As shown in the graph, the most important part of Mathematics is Linear
Algebra, which is widely used in ML
Besides these uses, linear algebra is also widely used in neural networks and the data
science field. In short, Linear Algebra provides a Platform or base for all ML algorithms
to show their results.
o Partial Derivatives
o Vector-Values Functions
o Directional Gradient
o Hessian, Jacobian
o Laplacian and Lagrangian Distribution.
Some important Probability concepts that one needs to know are given below:
o It is a collection of tools that helps to identify the goal from the available data
and information.
o Statistics helps to understand the data and transform the sample observations
into meaningful information.
o No system in the world has perfect data stored and readily available as
needed. Every system has data anomalies like incomplete, corrupted data, etc.
Statistical concepts will be your best friend to help in such complex situations.
o It helps in finding answers to the questions such as, "Who scored the
maximum & minimum in a cricket tournament?" "Which technology is on-
trend in 2021?", and many more.
o Combinatorics
o Axioms
o Bayes'Bayes' Theorem
o Variance and Expectation
o Random Variables
o Conditional and Joint Distributions.
There are many cases in machine learning & AI where discrete mathematics is
required to use. For example, a neural network contains the integer number of nodes
and interconnections, and it can have .56 nodes. For such cases, a discrete element is
needed and hence required discrete mathematics. Graph structure and graph
algorithms are some important topics of discrete mathematics for machine learning.
For normal ML projects, only the fundamentals of discrete mathematics are enough.
At the same time, if we want to work with graphical models, relational domains,
structured prediction, etc., you need to refer to a discrete mathematics book.
However, for the science graduates, most of the concepts are covered during
College.
o Khan Academy
Khan Academy is popular online resource that provides best-explained maths and
science courses, and that that's also for free. From these videos, you can easily learn
different concepts of Mathematics on Linear Algebra, Probability &
Statistics, Multivariable Calculus, and Optimization.
o Udacity
A statistical model is said to be overfitted if it can’t generalize well with unseen data.
Before understanding overfitting, we need to know some basic terms, which are:
Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the
performance of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying
the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.
What is Overfitting?
o Overfitting & underfitting are the two main errors/problems in the machine
learning model, which cause poor performance in Machine Learning.
o Overfitting occurs when the model fits more data than required, and it tries to
capture each and every datapoint fed to it. Hence it starts capturing noise and
inaccurate data from the dataset, which degrades the performance of the
model.
o An overfitted model doesn't perform accurately with the test/unseen dataset
and can’t generalize well.
o An overfitted model is said to have low bias and high variance.
Suppose the model learns the training dataset, like the Y student. They perform very
well on the seen dataset but perform badly on unseen data or unknown instances. In
such cases, the model is said to be Overfitting.
And if the model performs well with the training dataset and also with the
test/unseen dataset, similar to student Z, it is said to be a good fit.
In the train-test split of the dataset, we can divide our dataset into random test and
training datasets. We train the model with a training dataset which is about 80% of
the total dataset. After training the model, we test it with the test dataset, which is 20
% of the total dataset.
Now, if the model performs well with the training dataset but not with the test
dataset, then it is likely to have an overfitting issue.
For example, if the model shows 85% accuracy with training data and 50% accuracy
with the test dataset, it means the model is not performing well.
Ways to prevent the Overfitting
Although overfitting is an error in Machine learning which reduces the performance
of the model, however, we can prevent it in several ways. With the use of the linear
model, we can avoid overfitting; however, many real-world problems are non-linear
ones. It is important to prevent overfitting from the models. Below are several ways
that can be used to prevent overfitting:
1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization
Early Stopping
In this technique, the training is paused before the model starts learning the noise
within the model. In this process, while training the model iteratively, measure the
performance of the model after each iteration. Continue up to a certain number of
iterations until a new iteration improves the performance of the model.
After that point, the model begins to overfit the training data; hence we need to stop
the process before the learner passes that point.
Stopping the training process before the model starts capturing noise from the data
is known as early stopping.
However, this technique may lead to the underfitting problem if training is paused
too early. So, it is very important to find that "sweet spot" between underfitting and
overfitting.
It may not always work to prevent overfitting, but this way helps the algorithm to
detect the signal better to minimize the errors.
When a model is fed with more training data, it will be unable to overfit all the
samples of data and forced to generalize well.
But in some cases, the additional data may add more noise to the model; hence we
need to be sure that data is clean and free from in-consistencies before feeding it to
the model.
Feature Selection
While building the ML model, we have a number of parameters or features that are
used to predict the outcome. However, sometimes some of these features are
redundant or less important for the prediction, and for this feature selection process
is applied. In the feature selection process, we identify the most important features
within training data, and other features are removed. Further, this process helps to
simplify the model and reduces noise from the data. Some algorithms have the auto-
feature selection, and if not, then we can manually perform this process.
Cross-Validation
Cross-validation is one of the powerful techniques to prevent overfitting.
In the general k-fold cross-validation technique, we divided the dataset into k-equal-
sized subsets of data; these subsets are known as folds.
Data Augmentation
Data Augmentation is a data analysis technique, which is an alternative to adding
more data to prevent overfitting. In this technique, instead of adding more training
data, slightly modified copies of already existing data are added to the dataset.
The data augmentation technique makes it possible to appear data sample slightly
different every time it is processed by the model. Hence each data set appears
unique to the model and prevents overfitting.
Regularization
If overfitting occurs when a model is complex, we can reduce the number of features.
However, overfitting may also occur with a simpler model, more specifically the
Linear model, and for such cases, regularization techniques are much helpful.
Ensemble Methods
In ensemble methods, prediction from different machine learning models is
combined to identify the most popular result.
The most commonly used ensemble methods are Bagging and Boosting.
In bagging, individual data points can be selected more than once. After the
collection of several sample datasets, these models are trained independently, and
depending on the type of task-i.e., regression or classification-the average of those
predictions is used to predict a more accurate result. Moreover, bagging reduces the
chances of overfitting in complex models.
Note: Encoding is different from encryption as its main purpose is not to hide the data
but to convert it into a format so that it can be properly consumed.
In this topic, we are going to discuss the different types of encoding techniques that
are used in computing.
Type of Encoding Technique
o Character Encoding
o Image & Audio and Video Encoding
Character Encoding
Character encoding encodes characters into bytes. It informs the computers how
to interpret the zero and ones into real characters, numbers, and symbols. The
computer understands only binary data; hence it is required to convert these
characters into numeric codes. To achieve this, each character is converted into
binary code, and for this, text documents are saved with encoding types. It can be
done by pairing numbers with characters. If we don't apply character encoding, our
website will not display the characters and text in a proper format. Hence it will
decrease the readability, and the machine would not be able to process data
correctly. Further, character encoding makes sure that each character has a proper
representation in computer or binary format.
There are different types of Character Encoding techniques, which are given below:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1. HTML Encoding
2. URL Encoding
3. Unicode Encoding
4. Base64 Encoding
5. Hex Encoding
6. ASCII Encoding
HTML Encoding
HTML encoding is used to display an HTML page in a proper format. With encoding,
a web browser gets to know that which character set to be used.
In HTML, there are various characters used in HTML Markup such as <, >. To encode
these characters as content, we need to use an encoding.
URL Encoding
URL (Uniform resource locator) Encoding is used to convert characters in such a
format that they can be transmitted over the internet. It is also known as percent-
encoding. The URL Encoding is performed to send the URL to the internet using the
ASCII character-set. Non-ASCII characters are replaced with a %, followed by the
hexadecimal digits.
UNICODE Encoding
Unicode is an encoding standard for a universal character set. It allows encoding,
represent, and handle the text represented in most of the languages or writing
systems that are available worldwide. It provides a code point or number for each
character in every supported language. It can represent approximately all the
possible characters possible in all the languages. A particular sequence of bits is
known as a coding unit.
The Unicode standard defines Unicode Transformation Format (UTF) to encode the
code points.
Base64 Encoding
Base64 Encoding is used to encode binary data into equivalent ASCII Characters. The
Base64 encoding is used in the Mail system as mail systems such as SMTP can't work
with binary data because they accept ASCII textual data only. It is also used in simple
HTTP authentication to encode the credentials. Moreover, it is also used to transfer
the binary data into cookies and other parameters to make data unreadable to
prevent tampering. If an image or another file is transferred without Base64
encoding, it will get corrupted as the mail system is not able to deal with binary data.
Base64 represents the data into blocks of 3 bytes, where each byte contains 8 bits;
hence it represents 24 bits. These 24 bits are divided into four groups of 6 bits. Each
of these groups or chunks are converted into equivalent Base64 value.
ASCII Encoding
American Standard Code for Information Interchange (ASCII) is a type of
character-encoding. It was the first character encoding standard released in the year
1963.
Th ASCII code is used to represent English characters as numbers, where each letter
is assigned with a number from 0 to 127. Most modern character-encoding schemes
are based on ASCII, though they support many additional characters. It is a single
byte encoding only using the bottom 7 bits. In an ASCII file, each alphabetic,
numeric, or special character is represented with a 7-bit binary number. Each
character of the keyboard has an equivalent ASCII value.
These encoded files contain the same content with usually similar quality, but in
compressed size, so that they can be saved within less space, can be transferred
easily via mail, or can be downloaded on the system.
We can understand it as a . WAV audio file is converted into .MP3 file to reduce the
size by 1/10th to its original size.
While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant.
If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.
Hence it is very important to identify and select the most appropriate features from
the data and remove the irrelevant or less important features, which is done with the
help of feature selection in machine learning.
Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and
relevant dataset to the model in order to get a better result.
In this topic, we will discuss different feature selection techniques for machine
learning. But before that, let's first understand some basics of feature selection.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Selecting the best features helps the model to perform well. For example, Suppose
we want to create a model that automatically decides which car should be crushed
for a spare part, and to do this, we have a dataset. This dataset contains a Model of
the car, Year, Owner's name, Miles. So, in this dataset, the name of the owner does
not contribute to the model performance as it does not decide if the car should be
crushed or not, so we can remove this column and select the rest of the
features(column) for the model building.
1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search
problem, in which different combinations are made, evaluated, and compared with
other combinations. It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with
this feature set, the model has trained again.
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This
method does not depend on the learning algorithm and chooses the features as a
pre-processing step.
The filter method filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and
does not overfit the data.
o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio
Information Gain: Information gain determines the reduction in entropy while
transforming the dataset. It can be used as a feature selection technique by
calculating the information gain of each variable with respect to the target variable.
Fisher's Score:
The value of the missing value ratio can be used for evaluating the feature set
against the threshold value. The formula for obtaining the missing value ratio is the
number of missing values in each column divided by the total number of
observations. The variable is having more than the threshold value can be dropped.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods
by considering the interaction of features along with low computational cost. These
are fast processing methods similar to the filter method but more accurate than the
filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds
the most important features that contribute the most to training in a particular
iteration. Some techniques of embedded methods are:
To know this, we need to first identify the type of input and output variables. In
machine learning, variables are of mainly two types:
Below are some univariate statistical measures, which can be used for filter-based
feature selection:
Numerical Input variables are used for predictive regression modelling. The common
method to be used for such a case is the Correlation coefficient.
Numerical Input with categorical output is the case for classification predictive
modelling problems. In this case, also, correlation-based techniques should be used,
but with categorical output.
The commonly used technique for such a case is Chi-Squared Test. We can also use
Information gain in this case.
We can summarise the above cases with appropriate measures in the below
table:
Conclusion
Feature selection is a very complicated and vast field of machine learning, and lots of
studies are already made to discover the best methods. There is no fixed rule of the
best feature selection method. However, choosing the method depend on a machine
learning engineer who can combine and innovate approaches to find the best
method for a specific problem. One should try a variety of model fits on different
subsets of features selected through different statistical Measures.
regardless of which algorithm has been used. The cause of these errors is unknown
variables whose value can't be reduced.
What is Bias?
In general, a machine learning model analyses the data, find patterns in it and make
predictions. While training, the model learns these patterns in the dataset and
applies them to test data for prediction. While making predictions, a difference
occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or Errors due
to bias. It can be defined as an inability of machine learning algorithms such as
Linear Regression to capture the true relationship between the data points. Each
algorithm begins with some amount of bias because bias occurs from assumptions in
the model, which makes the target function simple to learn. A model has either:
PlayNext
Unmute
Current Time 0:00
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
o High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler
the algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear
algorithm often has low bias.
Some examples of machine learning algorithms with low bias are Decision Trees, k-
Nearest Neighbours and Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression, Linear Discriminant Analysis and
Logistic Regression.
Low variance means there is a small variation in the prediction of the target function
with changes in the training data set. At the same time, High variance shows a large
variation in the prediction of the target function with changes in the training dataset.
A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a
model gives good results with the training dataset but shows high error rates on the
test dataset.
Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high
variance.
Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the same
time, algorithms with high variance are decision tree, Support Vector Machine,
and K-nearest neighbours.
1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine
learning model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions
are inconsistent and accurate on average. This case occurs when the model
learns with a large number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does not
learn well with the training dataset or uses few numbers of the parameter. It
leads to underfitting problems in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.
o High training error and the test error is almost similar to training error.
Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias
and variance in order to avoid overfitting and underfitting in the model. If the model
is very simple with fewer parameters, it may have low variance and high bias.
Whereas, if the model has a large number of parameters, it will have high variance
and low bias. So, it is required to make a balance between bias and variance errors,
and this balance between the bias error and variance error is known as the Bias-
Variance trade-off.
For an accurate prediction of the model, algorithms need a low variance and low
bias. But this is not possible because bias and variance are related to each other:
Hence, the Bias-Variance trade-off is about finding the sweet spot to make a
balance between bias and variance errors.
There are different tools, software, and platform available for machine learning, and
also new software and tools are evolving day by day. Although there are many
options and availability of Machine learning tools, choosing the best tool per your
model is a challenging task. If you choose the right tool for your model, you can
make it faster and more efficient. In this topic, we will discuss some popular and
commonly used Machine learning tools and their features.
1. TensorFlow
TensorFlow is one of the most popular open-source libraries used to train and build
both machine learning and deep learning models. It provides a JS library and was
developed by Google Brain Team. It is much popular among machine learning
enthusiasts, and they use it for building different ML applications. It offers a powerful
library, tools, and resources for numerical computation, specifically for large scale
machine learning and deep learning projects. It enables data scientists/ML
developers to build and deploy machine learning applications efficiently. For training
and building the ML models, TensorFlow provides a high-level Keras API, which lets
users easily start with TensorFlow and machine learning.
Features:
Below are some top features:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
2. PyTorch
It specifies a Tensor class containing an n-dimensional array that can perform tensor
computations along with GPU support.
Features:
Below are some top features:
While training a classifier with a huge amount of data, a computer system might not
perform well. However, various machine learning or deep learning projects requires
millions or billions of training datasets. Or the algorithm that is being used is taking a
long time for execution. In such a case, one should go for the Google Cloud ML
Engine. It is a hosted platform where ML developers and data scientists build and run
optimum quality machine, learning models. It provides a managed service that allows
developers to easily create ML models with any type of data and of any size.
Features:
Below are the top features:
Features
Below are some top features:
5. NET
Accord.Net is .Net based Machine Learning framework, which is used for scientific
computing. It is combined with audio and image processing libraries that are written
in C#. This framework provides different libraries for various applications in ML, such
as Pattern Recognition, linear algebra, Statistical Data processing. One popular
package of the Accord.Net framework is Accord. Statistics, Accord.Math, and
Accord.MachineLearning.
Features
Below are some top features:
6. Apache Mahout
Apache Mahout is an open-source project of Apache Software Foundation, which is
used for developing machine learning applications mainly focused on Linear Algebra.
It is a distributed linear algebra framework and mathematically expressive Scala DSL,
which enable the developers to promptly implement their own algorithms. It also
provides Java/Scala libraries to perform Mathematical operations mainly based on
linear algebra and statistics.
Features:
Below are some top features:
7. Shogun
Shogun is a free and open-source machine learning software library, which was
created by Gunnar Raetsch and Soeren Sonnenburg in the year 1999. This
software library is written in C++ and supports interfaces for different languages
such as Python, R, Scala, C#, Ruby, etc., using SWIG(Simplified Wrapper and Interface
Generator). The main aim of Shogun is on different kernel-based algorithms such as
Support Vector Machine (SVM), K-Means Clustering, etc., for regression and
classification problems. It also provides the complete implementation of Hidden
Markov Models.
Features:
Below are some top features:
8. Oryx2
It is a realization of the lambda architecture and built on Apache Kafka and Apache
Spark. It is widely used for real-time large-scale machine learning projects. It is a
framework for building apps, including end-to-end applications for filtering,
packaged, regression, classification, and clustering. It is written in Java languages,
including Apache Spark, Hadoop, Tomcat, Kafka, etc. The latest version of Oryx2 is
Oryx 2.8.0.
Features:
Below are some top features:
Apache Spark MLlib is a scalable machine learning library that runs on Apache
Mesos, Hadoop, Kubernetes, standalone, or in the cloud. Moreover, it can access
data from different data sources. It is an open-source cluster-computing framework
that offers an interface for complete clusters along with data parallelism and fault
tolerance.
For optimized numerical processing of data, MLlib provides linear algebra packages
such as Breeze and netlib-Java. It uses a query optimizer and physical execution
engine for achieving high performance with both batch and streaming data.
Features
Below are some top features:
Features:
Below are some top features:
Conclusion
In this topic, we have discussed some popular machine learning tools. However,
there are many more other ML tools, but choosing the tool completely depends on
the requirement for one's project, skills, and price to the tool. Most of these tools are
freely available, except for some tools such as Rapid Miner. Each tool works in a
different language and provides some specifications.
In this topic, we will discuss the perquisites for machine learning so that you can
make your base better for learning its advanced concepts.
1. Statistics
2. Linear Algebra
3. Calculus
4. Probability
5. Programming Languages
This is one of the most common questions regarding educational qualification for ML
among the aspirants who want to learn Machine learning and make a career in this.
The answer for this is NO, it means, it is not necessary that you must have a master's
or Ph.D. degree to learn and make a career in machine learning. There are lots of
people who have made a career in this field without having a degree. However,
having a Ph.D. or Master's degree will definitely give you additional benefits and will
make the path smoother. The master's/Ph.D. certificate works as a way to showcase
your skills, but in the end, your practical knowledge & skills will help you to either
build a project or make a career in Machine learning. So, if you have enough time
and funds to have a Master's or Ph.D. degree, you can do this, and it will surely give
you benefit. But if you are not having a degree and have good ML skills, then also
you can make the transition into Machine learning.
1. Statistics
Machine learning and statistics are the two tightly coupled fields, as most of the
concepts of machine learning are either taken from statistics or are dependent on it.
Machine learning techniques and algorithms are widely dependent on statistical
concepts and theories; hence it is a crucial prerequisite for ML.
Statistics is a field of mathematics that allows to draw the logical conclusion from the
data. Every machine learning enthusiast must understand the statistical concepts in
order to learn the working of algorithms such as logistic Regression, distribution,
hypothesis testing, etc. It helps in performing the following task:
o It contains various tools that allow us to get some outcomes from the
available data and information.
o It finds outcomes from the data and transforms sample observations into
meaningful information.
o Each raw data is not perfect and contains different impurities in it, such as
incomplete data, corrupted data, etc. In such cases, statistical concepts help to
identify these impurities.
o It helps in obtaining answers for different questions such as, who scored the
maximum & minimum in the cricket tournament? Which technology is on-
trend in 2021? etc.
o Statistical hypothesis tests enable in selecting the best model for any kind of
predictive modeling problem.
o Combinatorics
o Axioms
o Bayes' Theorem
o Variance and Expectation
o Random Variables
o Conditional and Joint Distributions.
2. Linear Algebra
Linear algebra deals with the study of vectors & some rules of manipulating these
vectors, matrices, and linear transform. It is one of the integral parts of machine
learning and helps the ML algorithms to run on a huge number of datasets with
multi-dimensionality.
The concepts of linear algebra are widely used in developing algorithms in machine
learning. It can perform the following task:
3. Probability
In the real world, there are various scenarios where the behavior or output can vary
for the same input. Probability has always been an essential part of Mathematics,
which measures the uncertainty of the event. The higher the probability of an event,
the more chances that event will occur. In Machine learning, probability helps to
make predictions with incomplete information. It helps in predicting the
likelihood of future events. With the help of probability, we can model elements of
uncertainty such as risk in a business process or transaction, i.e., we can work with
non-deterministic problems. Whereas in traditional programming, we deal with
deterministic problems; output is not affected by uncertainty. It also helps in
hypothesis testing and distributions such as Gaussian distribution and Probability
density function.
Probability theory and statistics are related fields; probability deals with future
events, whereas statistics deal with the analysis of past events.
4. Calculus
Calculus is also an integral part of Machine learning, but it is not required to go in-
depth of it at the beginner level; rather, only knowledge of basic concepts is enough.
In machine learning, the process of getting the best parameters is known as
optimization, and multivariate calculus helps in solving optimization problems in the
ML model. It helps in optimization and getting good results from the model. In
calculus, we don't need to solve complex derivatives manually; rather, we must
understand how differentiation work and how it is applied for vector calculus.
Multivariate calculus is not only used for algorithm training but also for gradient
descent. Some crucial concepts of multivariate calculus are Derivatives, divergence,
curvature, and quadratic approximations, Laplacian and Lagrangian
Distribution, Directional Gradient, etc.
5. Programming Languages
Apart from the mathematical concepts, it is very important to have a good
knowledge of a programming language and coding capabilities for machine learning.
Some of the most popular programming languages for machine learning are as
follows:
Python
Python is the most powerful and easy language that anyone can learn. Python was
initially developed in early 1991. Most of the developers and programmers choose
Python as their favorite programming language for developing Machine learning &
AI solutions. The best part about Python is it is very easy to learn compare to other
programming languages and also offers great carrier opportunities for programmers
and data scientists.
Python provides excellent community support and an extensive set of libraries, along
with the flexibility of programming languages. Python is a platform-independent
language as well as it provides an extensive framework for Deep Learning and
Machine Learning.
R is one of the great languages for statistical processing in programming. It may not
be the perfect language for machine learning, but it provides great performance
while dealing with large numbers. Some inbuilt features such as built-in functional
programming, object-oriented nature, and vectorial computation make it a
worthwhile programming language for machine learning.
R contains several packages that are specially designed for ML, which are:
o gmodels - This package provides different tools for the model fitting task.
o TM - It is a great framework that is used for text mining applications.
o RODBC - It is an ODBC interface.
o OneR - This package is used to implement the One Rule Machine Learning
classification algorithm.
Java:
Java is the most widely used programming language by all developers and
programmers in the world. Java can be easily implemented on the various platform
due to JVM(Java Virtual Machine). The best things about Java is once it is written and
compiled on one platform, then you should not need to compile it again and again.
This is known as WORA (Once Written Read/Run Anywhere) principle. Java has so
many features which make Java best for use in Machine learning. These are as
follows:
o Portable
o Memory manager
o Cross-platform.
o Easy to learn and use.
o Easy-to-code Algorithms.
o Built-in garbage collector.
o Swing and Standard Widget Toolkit.
o Simplified work with large-scale projects.
o Better user interaction.
o Easy to debug
Apart from the above programming and mathematics skills, awareness of some basic
concepts of machine learning is required to learn advanced concepts. These concepts
include machine learning types (Supervised, unsupervised, Reinforcement learning),
techniques, model building, etc.
In this tutorial on Gradient Descent in Machine Learning, we will learn in detail about
gradient descent, the role of cost functions specifically as a barometer within
Machine Learning, types of gradient descents, learning rates, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The best way to define the local minimum or local maximum of a function using
gradient descent is as follows:
o If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the
function at the current point, we will get the local maximum of that function.
This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a gradient descent algorithm is to
minimize the cost function using iteration. To achieve this goal, it performs two
steps iteratively:
What is Cost-function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the
form of a single real number. It helps to increase and improve machine learning
efficiency by providing feedback to this model so that it can minimize error and find
the local or global minimum. Further, it continuously iterates along the direction of
the negative gradient until the cost function approaches zero. At this steepest
descent point, the model will stop learning further. Although cost function and loss
function are considered synonymous, also there is a minor difference between them.
The slight difference between the loss function and the cost function is about the
error within the training of machine learning models, as loss function refers to the
error of one training example, while a cost function calculates the average error
across an entire training set.
The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using gradient descent algorithms over known data to
reduce the cost function.
Hypothesis:
Parameters:
Cost function:
Goal:
1. Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-
axis.
The starting point(shown in above fig.) is used to evaluate the performance as it is
considered just as an arbitrary point. At this starting point, we will derive the first
derivative or slope and then use a tangent line to calculate the steepness of this
slope. Further, this slope will inform the updates to the parameters (weights and
bias).
The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point,
it approaches the lowest point, which is called a point of convergence.
The main objective of gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are
required:
These two factors are used to determine the partial derivative calculation of future
iteration and allow it to the point of convergence or local minimum or global
minimum. Let's discuss learning rate factors in brief;
Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the
cost function. If the learning rate is high, it results in larger steps but also leads to
risks of overshooting the minimum. At the same time, a low learning rate shows the
small step sizes, which compromises overall efficiency but gives the advantage of
more precision.
Whenever the slope of the cost function is at zero or just close to zero, this model
stops learning further. Apart from the global minimum, there occur some scenarios
that can show this slop, which is saddle point and local minimum. Local minima
generate the shape similar to the global minimum, where the slope of the cost
function increases on both sides of the current points.
In contrast, with saddle points, the negative gradient only occurs on one side of the
point, which reaches a local maximum on one side and a local minimum on the other
side. The name of a saddle point is taken by that of a horse's saddle.
The name of local minima is because the value of the loss function is minimum at
that point in a local region. In contrast, the name of the global minima is given so
because the value of the loss function is minimum there, globally across the entire
domain the loss function.
Vanishing Gradients:
Vanishing Gradient occurs when the gradient is smaller than expected. During
backpropagation, this gradient becomes smaller that causing the decrease in the
learning rate of earlier layers than the later layer of the network. Once this happens,
the weight parameters update until they become insignificant.
Exploding Gradient:
Exploding gradient is just opposite to the vanishing gradient as it occurs when the
Gradient is too large and creates a stable model. Further, in this scenario, model
weight increases, and they will be represented as NaN. This problem can be solved
using the dimensionality reduction technique, which helps to minimize complexity
within the model.
Further, all machine learning experts are also responsible for customizing data for
analysis purposes, improving web and app-like experience, and identifying and
predicting business requirements. Moreover, machine learning experts are also
involved in robotics, web development, developing chatbots, data analytics,
intelligent application development, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Company Salary
Experience:
Like all other fields, total years of relevant domain working experience also matter for
deciding an employee's salary. It helps you understand the problems and give an
appropriate production-ready solution. Hence, experience is one of the most
important deciding factors in total compensation.
Company:
Other than the experience of candidates, the company is also one of the most
important factors, which decides the salary of the machine learning experts in the
industry. It directly affects the salary and perks of the candidates.
Professional Skills:
Professional skills are the major impacting factors that decide how much machine
learning experts earn in the industry. Every hiring process is based on the appropriate
skill sets of the candidates. If you have good skillsets as per industry demand, it will
be very helpful to clear any interview and the performance of that candidate in the
production environment. Hence, based on the professional skillsets, the employee
gets more salary and compensation according to company policies and terms &
conditions.
Location:
In earlier days, location was undoubtedly the important impacting factor that used to
decide an employee's salary in the industry. But nowadays, in the remotely working
culture, location does not play a vital role in the compensation. However, it can affect
the salary and compensations in terms of house rent (cost of living according to
urban, rural, or metro cities) and travel allowances (cost of traveling for pick and drop
of employees). These types of compensations also attract lots of candidates.
Moreover, as data science and machine learning are completely based on data, so
having experience in RDBMS and NoSQL databases is necessary to extract and
process data significantly. Hadoop, Spark, or Hive are a few important data
processing ecosystems in the computer science world.
There are some other skills that help to become a machine learning expert:
o Linear Regression
o Logistic Regression
o K Nearest Neighbours (KNN)
o Decision Tree
o Random Forest Algorithm
o Support Vector Machine (SVM)
o K Means Clustering
o Cross-Validation and Bias-Variance Trade-off
Country Salary(Annual)
Canada $93,684
Australia $106,532
Conclusion
Machine Learning is a very powerful technology that offers high salary packages to
their experts in India as well as in other countries. It is highly impacted by total years
of experience in the corresponding industry, the scale of the company, the skills of
candidates, and the location. Generally, you can earn 5-6 lakhs per annum (LPA) at a
very beginner level, and after gaining expertise, you can earn exponentially more
salary packages in IT hubs in India such as Bangalore, Delhi, Noida, Pune, Hyderabad,
and Kolkata, etc.
Hence, in simple words, we can say that a machine learning model is a simplified
representation of something or a process. In this topic, we will discuss different
machine learning models and their techniques and algorithms.
What is Machine Learning Model?
Machine Learning models can be understood as a program that has been trained to
find patterns within new data and make predictions. These models are represented
as a mathematical function that takes requests in the form of input data, makes
predictions on input data, and then provides an output in response. First, these
models are trained over a set of data, and then they are provided an algorithm to
reason over data, extract the pattern from feed data and learn from those data. Once
these models get trained, they can be used to predict the unseen dataset.
There are various types of machine learning models available based on different
business goals and data sets.
o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning
o Classification
o Regression
Regression
In regression problems, the output is a continuous variable. Some commonly used
Regression models are as follows:
a) Linear Regression
Linear regression is the simplest machine learning model in which we try to predict
one output variable using one or more input variables. The representation of linear
regression is a linear equation, which combines a set of input values(x) and predicted
output(y) for the set of those input values. It is represented in the form of a line:
Y = bx+ c.
The main aim of the linear regression model is to find the best fit line that best fits
the data points.
Linear regression is extended to multiple linear regression (find a plane of best fit)
and polynomial regression (find the best fit curve).
b) Decision Tree
Decision trees are the popular machine learning models that can be used for both
regression and classification problems.
A decision tree uses a tree-like structure of decisions along with their possible
consequences and outcomes. In this, each internal node is used to represent a test
on an attribute; each branch is used to represent the outcome of the test. The more
nodes a decision tree has, the more accurate the result will be.
The advantage of decision trees is that they are intuitive and easy to implement, but
they lack accuracy.
c) Random Forest
Random Forest is the ensemble learning method, which consists of a large number of
decision trees. Each decision tree in a random forest predicts an outcome, and the
prediction with the majority of votes is considered as the outcome.
A random forest model can be used for both regression and classification problems.
For the classification task, the outcome of the random forest is taken from the
majority of votes. Whereas in the regression task, the outcome is taken from the
mean or average of the predictions generated by each tree.
d) Neural Networks
Neural networks are the subset of machine learning and are also known as artificial
neural networks. Neural networks are made up of artificial neurons and designed in a
way that resembles the human brain structure and working. Each artificial neuron
connects with many other neurons in a neural network, and such millions of
connected neurons create a sophisticated cognitive structure.
Neural networks consist of a multilayer structure, containing one input layer, one or
more hidden layers, and one output layer. As each neuron is connected with another
neuron, it transfers data from one layer to the other neuron of the next layers. Finally,
data reaches the last layer or output layer of the neural network and generates
output.
Neural networks depend on training data to learn and improve their accuracy.
However, a perfectly trained & accurate neural network can cluster data quickly and
become a powerful machine learning and AI tool. One of the best-known neural
networks is Google's search algorithm.
Classification
Classification models are the second type of Supervised Learning techniques, which
are used to generate conclusions from observed values in the categorical form. For
example, the classification model can identify if the email is spam or not; a buyer will
purchase the product or not, etc. Classification algorithms are used to predict two
classes and categorize the output into different groups.
In classification, a classifier model is designed that classifies the dataset into different
categories, and each category is assigned a label.
o Binary classification: If the problem has only two possible classes, called a
binary classifier. For example, cat or dog, Yes or No,
o Multi-class classification: If the problem has more than two possible classes,
it is a multi-class classifier.
a) Logistic Regression
Support vector machine or SVM is the popular machine learning algorithm, which is
widely used for classification and regression tasks. However, specifically, it is used to
solve classification problems. The main aim of SVM is to find the best decision
boundaries in an N-dimensional space, which can segregate data points into classes,
and the best decision boundary is known as Hyperplane. SVM selects the extreme
vector to find the hyperplane, and these vectors are known as support vectors.
c) Naïve Bayes
Each naïve Bayes classifier assumes that the value of a specific variable is
independent of any other variable/feature. For example, if a fruit needs to be
classified based on color, shape, and taste. So yellow, oval, and sweet will be
recognized as mango. Here each feature is independent of other features.
Unsupervised learning models are mainly used to perform three tasks, which are as
follows:
o Clustering
Clustering is an unsupervised learning technique that involves clustering or
groping the data points into different clusters based on similarities and
differences. The objects with the most similarities remain in the same group,
and they have no or very few similarities from other groups.
Clustering algorithms can be widely used in different tasks such as Image
segmentation, Statistical data analysis, Market segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering,
hierarchal Clustering, DBSCAN, etc.
Reinforcement Learning
In reinforcement learning, the algorithm learns actions for a given set of states that
lead to a goal state. It is a feedback-based learning model that takes feedback
signals after each state or action by interacting with the environment. This feedback
works as a reward (positive for each good action and negative for each bad action),
and the agent's goal is to maximize the positive rewards to improve their
performance.
Below are some popular algorithms that come under reinforcement learning:
It aims to learn the policy that can help the AI agent to take the best action for
maximizing the reward under a specific circumstance. It incorporates Q values for
each state-action pair that indicate the reward to following a given state path, and it
tries to maximize the Q-value.
The answer to this question is No, and the machine learning model is not the same
as an algorithm. In a simple way, an ML algorithm is like a procedure or method
that runs on data to discover patterns from it and generate the model. At the
same time, a machine learning model is like a computer program that generates
output or makes predictions. More specifically, when we train an algorithm with
data, it becomes a model.
63.
64. Why should you read this book?
65. This book is available in various categories such as Machine Learning,
Reinforcement Learning, Deep Learning, Deep Reinforcement Learning, and
Artificial Intelligence.
66. This book was written by Mr. Richard S. Sutton and Andrew G. Barto. If
Deep Learning book (mentioned above) is considered as the Bible of Deep
Learning, then this book is also considered as the Bible of Reinforcement
Learning. If you really want to start a career in the Reinforcement Learning
field, then this book can be very helpful for you.
67. In this book, the author has significantly explained their clear ideas on Artificial
Intelligence algorithms. Similar to the first edition, the second edition is also
focused on core learning algorithms such as UCB, Expected Sarsa, and Double
Learning. Further, this book is distributed in various parts, which includes
topics such as artificial neural networks, Fourier basis, policy gradient methods,
reinforcement learning's relationships to psychology and neuroscience,
AlphaGo, AlphaGo Zero, Atari game playing, and IBM Watson's wagering
strategy.
68. Where to buy: You can purchase this book on the Amazon marketplace and
also read free online on the below-given link.
69. Amazon link: https://www.amazon.com/dp/0262039249/
70. Read here free
PDF: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook
2ndEd.pdf
71.
73.
74. Why should you read this book?
75. This book is written by Mr. Maxim Lapan and helps you to understand the
practical approaches of Reinforcement Learning with the help of balancing
theory, including coding practices. As per different reviews, if you really want
to gain hands-on experience with theoretical knowledge of reinforcement
learning, then this book is best suitable. This book is also available in various
categories such as Machine Learning, Reinforcement Learning, Deep Learning,
Deep Reinforcement Learning, and Artificial Intelligence.
76. Where to buy: You can purchase this book on Amazon or the Packt website.
77. Amazon link: https://www.amazon.com/Deep-Reinforcement-Learning-
Hands-optimization/dp/1838826998
78. Packt Link: https://www.packtpub.com/product/deep-reinforcement-
learning-hands-on/9781788834247
79.
The term Linear Algebra was initially introduced in the early 18 th century to find out
the unknowns in Linear equations and solve the equation easily; hence it is an
important branch of mathematics that helps study data. Also, no one can deny that
Linear Algebra is undoubtedly the important and primary thing to process the
applications of Machine Learning. It is also a prerequisite to start learning Machine
Learning and data science.
Linear algebra plays a vital role and key foundation in machine learning , and it
enables ML algorithms to run on a huge number of datasets.
The concepts of linear algebra are widely used in developing algorithms in machine
learning. Although it is used almost in each concept of Machine learning, specifically,
it can perform the following task:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Optimization of data.
o Applicable in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine
classification.
o Implementation of Linear Regression in Machine Learning.
Besides the above uses, linear algebra is also used in neural networks and the data
science field.
Basic mathematics principles and concepts like Linear algebra are the foundation of
Machine Learning and Deep Learning systems. To learn and understand Machine
Learning or Data Science, one needs to be familiar with linear algebra and
optimization theory. In this topic, we will explain all the Linear algebra concepts
required for machine learning.
Note: Although linear algebra is a must-know part of mathematics for machine learning,
it is not required to get intimate in this. It means it is not required to be an expert in linear
algebra; instead, only good knowledge of these concepts is more than enough for
machine learning.
Below are some benefits of learning Linear Algebra before Machine learning:
Moreover, Linear Algebra helps solve and compute large and complex data set
through a specific terminology named Matrix Decomposition Techniques. There
are two most popular matrix decomposition techniques, which are as follows:
o Q-R
o L-U
Improved Statistics:
Statistics is an important concept to organize and integrate data in Machine
Learning. Also, linear Algebra helps to understand the concept of statistics in a better
manner. Advanced statistical topics can be integrated using methods, operations,
and notations of linear algebra.
Few supervised learning algorithms can be created using Linear Algebra, which is as
follows:
o Logistic Regression
o Linear Regression
o Decision Trees
o Support Vector Machines (SVM)
Further, below are some unsupervised learning algorithms listed that can also be
created with the help of linear algebra as follows:
Easy to Learn:
Linear Algebra is an important department of Mathematics that is easy to
understand. It is taken into consideration whenever there is a requirement of
advanced mathematics and its applications.
Operations:
Working with an advanced level of abstractions in vectors and matrices can make
concepts clearer, and it can also help in the description, coding, and even thinking
capability. In linear algebra, it is required to learn the basic operations such as
addition, multiplication, inversion, transposing of matrices, vectors, etc.
Matrix Factorization:
One of the most recommended areas of linear algebra is matrix factorization,
specifically matrix deposition methods such as SVD and QR.
Examples of Linear Algebra in Machine
Learning
Below are some popular examples of linear algebra in Machine learning:
Each dataset resembles a table-like structure consisting of rows and columns. Where
each row represents observations, and each column represents features/Variables.
This dataset is handled as a Matrix, which is a key data structure in Linear Algebra.
Further, when this dataset is divided into input and output for the supervised
learning model, it represents a Matrix(X) and Vector(y), where the vector is also an
important concept of linear algebra.
In the one-hot encoding technique, a table is created that shows a variable with one
column for each category and one row for each example in the dataset. Further, each
row is encoded as a binary vector, which contains either zero or one value. This is an
example of sparse representation, which is a subfield of Linear Algebra.
4. Linear Regression
Linear regression is a popular technique of machine learning borrowed from
statistics. It describes the relationship between input and output variables and is
used in machine learning to predict numerical values. The most common way to
solve linear regression problems using Least Square Optimization is solved with the
help of Matrix factorization methods. Some commonly used matrix factorization
methods are LU decomposition, or Singular-value decomposition, which are the
concept of linear algebra.
5. Regularization
In machine learning, we usually look for the simplest possible model to achieve the
best outcome for the specific problem. Simpler models generalize well, ranging from
specific examples to unknown datasets. These simpler models are often considered
models with smaller coefficient values.
A technique used to minimize the size of coefficients of a model while it is being fit
on data is known as regularization. Common regularization techniques are L1 and L2
regularization. Both of these forms of regularization are, in fact, a measure of the
magnitude or length of the coefficients as a vector and are methods lifted directly
from linear algebra called the vector norm.
7. Singular-Value Decomposition
Singular-Value decomposition is also one of the popular dimensionality reduction
techniques and is also written as SVD in short form.
NLP represents a text document as large matrices with the occurrence of words. For
example, the matrix column may contain the known vocabulary words, and rows may
contain sentences, paragraphs, pages, etc., with cells in the matrix marked as the
count or frequency of the number of times the word occurred. It is a sparse matrix
representation of text. Documents processed in this way are much easier to compare,
query, and use as the basis for a supervised machine learning model.
This form of data preparation is called Latent Semantic Analysis, or LSA for short, and
is also known by the name Latent Semantic Indexing or LSI.
9. Recommender System
A recommender system is a sub-field of machine learning, a predictive modelling
problem that provides recommendations of products. For example, online
recommendation of books based on the customer's previous purchase history,
recommendation of movies and TV series, as we see in Amazon & Netflix.
Deep learning studies these neural networks, which implement newer and faster
hardware for the training and development of larger networks with a huge dataset.
All deep learning methods achieve great results for different challenging tasks such
as machine translation, speech recognition, etc. The core of processing neural
networks is based on linear algebra data structures, which are multiplied and added
together. Deep learning algorithms also work with vectors, matrices, tensors (matrix
with more than two dimensions) of inputs and coefficients for multiple dimensions.
Conclusion
In this topic, we have discussed Linear algebra, its role and its importance in machine
learning. For each machine learning enthusiast, it is very important to learn the basic
concepts of linear algebra to understand the working of ML algorithms and choose
the best algorithm for a specific problem.
Based on the methods and way of learning, machine learning is divided into mainly
four types, which are:
In this topic, we will provide a detailed description of the types of Machine Learning
along with their respective algorithms:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of supervised
learning are Risk Assessment, Fraud Detection, Spam filtering, etc.
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue,
etc. The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email
filtering, etc.
b) Regression
o Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this
process, image classification is performed on different image data with pre-
defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis
purposes. It is done by using medical images and past labelled data with
labels for disease conditions. With such a process, the machine can identify a
disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic
data to identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are
used. These algorithms classify an email as spam or not spam. The spam
emails are sent to the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various
identifications can be done using the same, such as voice-activated
passwords, voice commands, etc.
In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.
So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given
below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.
2) Association
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.
Disadvantages:
3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.
Disadvantages:
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI
agent (A software component) automatically explore its surrounding by hitting
& trail, taking action, learning from experiences, and improving its
performance. Agent gets rewarded for each good action and get punished for each
bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.
In reinforcement learning, there is no labelled data like supervised learning, and
agents learn from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment, moves
of an agent at each step define states, and the goal of the agent is to get a high
score. Agent receives feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain
super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and schedule
resources to wait for different jobs in order to minimize average job
slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning. There are different industries that have their
vision of building intelligent robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented
with the help of Reinforcement Learning by Salesforce company.
Disadvantage
The curse of dimensionality limits reinforcement learning for real physical systems.
What is a feature?
Generally, all machine learning algorithms take input data to generate the output.
The input data remains in a tabular form consisting of rows (instances or
observations) and columns (variable or attributes), and these attributes are often
known as features. For example, an image is an instance in computer vision, but a
line in the image could be the feature. Similarly, in NLP, a document can be an
observation, and the word count could be the feature. So, we can say a feature is an
attribute that impacts a problem or is useful for the problem.
o Data Preparation: The first step is data preparation. In this step, raw data
acquired from different resources are prepared to make it in a suitable format
so that it can be used in the ML model. The data preparation may contain
cleaning of data, delivery, data augmentation, fusion, ingestion, or loading.
o Exploratory Analysis: Exploratory analysis or Exploratory data analysis (EDA)
is an important step of features engineering, which is mainly used by data
scientists. This step involves analysis, investing data set, and summarization of
the main characteristics of data. Different data visualization techniques are
used to better understand the manipulation of data sources, to find the most
appropriate statistical technique for data analysis, and to select the best
features for the data.
o Benchmark: Benchmarking is a process of setting a standard baseline for
accuracy to compare all the variables from this baseline. The benchmarking
process is used to improve the predictability of the model and reduce the
error rate.
1. Imputation
Feature engineering deals with inappropriate data, missing values, human
interruption, general errors, insufficient data sources, etc. Missing values within the
dataset highly affect the performance of the algorithm, and to deal with them
"Imputation" technique is used. Imputation is responsible for handling
irregularities within the dataset.
For example, removing the missing values from the complete row or complete
column by a huge percentage of missing values. But at the same time, to maintain
the data size, it is required to impute the missing data, which can be done as:
2. Handling Outliers
Outliers are the deviated values or data points that are observed too away from
other data points in such a way that they badly affect the performance of the model.
Outliers can be handled with this feature engineering technique. This technique first
identifies the outliers and then remove them out.
Standard deviation can be used to identify the outliers. For example, each value
within a space has a definite to an average distance, but if a value is greater distant
than a certain value, it can be considered as an outlier. Z-score can also be used to
detect outliers.
3. Log transform
Logarithm transformation or log transform is one of the commonly used
mathematical techniques in machine learning. Log transform helps in handling the
skewed data, and it makes the distribution more approximate to normal after
transformation. It also reduces the effects of outliers on the data, as because of the
normalization of magnitude differences, a model becomes much robust.
Note: Log transformation is only applicable for the positive values; else, it will give an
error. To avoid this, we can add 1 to the data before transformation, which ensures
transformation to be positive.
4. Binning
In machine learning, overfitting is one of the main issues that degrade the
performance of the model and which occurs due to a greater number of parameters
and noisy data. However, one of the popular techniques of feature engineering,
"binning", can be used to normalize the noisy data. This process involves segmenting
different features into bins.
5. Feature Split
As the name suggests, feature split is the process of splitting features intimately into
two or more parts and performing to make new features. This technique helps the
algorithms to better understand and learn the patterns in the dataset.
The feature splitting process enables the new features to be clustered and binned,
which results in extracting useful information and improving the performance of the
data models.
Conclusion
In this topic, we have explained a detailed description of feature engineering in
machine learning, working of feature engineering, techniques, etc.
In this topic, we are providing a list of the best machine learning courses. Some of
these courses are easy to start, while some may need some advanced aspects of
learning.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
This course helps to understand and learn the theory behind effective machine
learning techniques with practical implementation.
This course not only provides theoretical knowledge of machine learning techniques
but also teach how to apply these techniques practically by yourself.
In this course, after completing each topic, you will be tested, and after the
competition, the course final score will be given. In this course, you will get a detailed
description of the mathematics behind each ML algorithm.
Level: Beginner
Pre-requisites:
Ratings: 4.9/5
Cost: Free to Audit, Paid Certification.
o You will get Silicon Valley's best practices in innovation in the field of Machine
Learning and AI.
o The Course Structure contains topics from basic to advanced that start from
Introduction (Supervised and Unsupervised Learning), and covers Linear
Regression with One Variable, Linear Algebra Review, Logistic Regression,
Regularization, Neural Networks: Representation, Machine Learning System
Design, etc.
o Major skills to learn are Logistic Regression, Artificial Neural Networks skills.
Also, you will get learn to implement your own neural network for digit
recognition.
o Practical Implementation of different algorithms, and learn how to apply these
algorithms for building smart robots (perception, control), text understanding
(web search, anti-spam), computer vision, medical informatics, audio, database
mining, and other areas.
o You can learn complete courses from anywhere at any time online.
The course structure and way of representation makes machine learning even more
interesting to learn. Along with the concepts of ML, it also provides programming
knowledge of Python.
o The course involves interactive quizzes that enable you to enhance your
knowledge of the topics covered.
o Join the student support community to exchange ideas and clarify doubts.
o It has a big community that any student can join to share his ideas and ask a
doubtful question.
o Anyone can learn it from anywhere at their convenience.
o Each enrolled student can get a one-on-one mentor, which means personal
career coaching is provided along with access to the student community.
Level: Beginner
Pre-requisites:
Ratings: 4.5/5
o Great tutorial to get started with the topic with little or no prior experience.
o The Course structure contains different topics that start from Data
Preprocessing, Regression, Clustering, Association Rule Learning, Natural
Language Processing, Artificial Neural Networks, Dimensionality Reduction,
and other important concepts.
o You will get lifetime access to the course once purchased and accessible on
mobile & tv.
o A detailed explanation of each topic with theory as well as practical.
o This course is available in both Python and R programming languages. You
can also download templates and use them in your ML projects.
4. Machine Learning Crash Course - Google
AI
Machine Learning Crash Course is provided by Google AI education, which is a free
platform to learn about AI and Machine Learning key concepts. However, this course
is the best fit for those who want ML concepts to learn at a fast pace and want to
learn the basics of key ML concepts, which may take several hours. But if you are just
a beginner without any prior understanding of ML concepts, linear algebra, statistics,
etc., then this may little difficult for you to learn this course.
This crash course includes theoretical video lectures, practical exercises, real-world
examples, and hands-on practical implementation of examples. This course is taught
by Google experts who explain different key concepts of Machine learning.
Cost: Free
Provider: Google AI
The Course structure has topics that start from Machine Learning Basics, and cover
Generalization, Training and Test Sets, Representation, Logistic Regression,
Classification, neural Networks, Embedding, ML Engineering.
If you search for the Machine Learning Program, you will get search results for
different courses, out of which most are free to audit but to gain a certificate, you
need to pay. Some popular courses on Data Science and Machine Learning are Data
Science from Harvard, Artificial Intelligence from Columbia, Python Data
Science from IBM, Machine Learning from Texas, and Data Science from
Microsoft, among a host of other courses. On each course, the timing is different,
and the mode is online.
o One can freely audit the course on Machine learning and also on other
technology from renowned institutions.
o Explore the different courses and make a strong and deep understanding of
that.
o Video lectures with theory and practical implementations and knowledge
check.
o Also, get subtitles for each lecture.
o Course may archive after some time if you don't upgrade it.
Each course is structured in such a way that it covers all concepts from scratch and
focuses on learning by doing. You can choose the course best suited for you as per
your level of learning (beginner or experienced). So, if you are serious about getting
started in this area, then this will be the easiest way is to select a course from here.
"Introduction to machine learning for coders" focuses on the practical
implementation of each algorithm from scratch.
Course Highlights
o Each topic is explained in detail with the help of screenshots and examples.
o You will get the complete guide for the configuration of software and getting
started with the course.
o It allows you to join the forum, where you can communicate with other
learners and professionals and can help each other.
o Models are trained with the fast.ai library.
o One of the great things about this course is that it is available for free, and
other courses on this platform are also free.
o Duration: Self-paced
One important prerequisite for this course is that you need to have knowledge of the
R language. The course mainly focuses on providing useful knowledge on different
machine learning techniques.
The course is designed in an interactive and interesting way and free for some
content, however after some content, you need to pay for it.
The course involved mainly involved how does machine learning works, where to use
ML algorithms, the difference between AI and machine learning, etc. It also involves
information about machine learning models, deep learning, etc.
Course Highlights
As this is an advanced specialization course, hence it is required that you must have
basic or intermediate knowledge of Machine learning, Probability theory, Linear
algebra and calculus, and Python programming to enrol and understand this course.
So, it is suggested that if you are a beginner in machine learning, then first brush on
your maths & programming skill, and then move on to this course to complete your
learning.
Course Highlights
Each notebook will enhance your knowledge and provide you with an understanding
that how to use these algorithms with real-world data.
Course Highlights
This course is not only focused on Machine learning concepts but also help you to
start your career in the Data Science field.
Ratings: 4.7/5
Course Highlights
o Course start with Python crash course, so anyone can easily learn and
understand each concept of this course.
o Deep explanation of each concept throughout the complete course.
o You will be provided with written notes that would be very helpful in learning.
o It contains different exercises for practising each concept and also provide a
solution to check your knowledge and enhance your confidence.
Usually, when a machine learning model is trained, then it requires a little number of
Epochs. An Epoch is often mixed up with iteration.
What is Iteration?
Iteration is defined as a total number of batches required to complete one epoch,
where a number of batches are equal to the total number of iteration for one epoch.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Let's understand the iteration and epoch with an example, where we have 3000
training examples that we are going to use to train a machine learning model.
In the above scenario, we can break up the training dataset into sizeable batches. So
let's suppose we have considered the batches of 500 examples in each batch, then it
will take 6 iterations to complete 1 Epoch.
Batch size is defined as the total number of training examples that exist in a single
batch. You can understand batch with the above-mentioned example also, where we
have divided the entire training dataset/examples into different batches or sets or
parts.
Let's understand the concept of mixing up an Epoch and iteration with the below
example where we have considered 1000 datasets as shown in the below image.
In the above figure, we can understand this concept as follows:
o If the Batch size is 1000, then an epoch will complete in one iteration.
o If the Batch size is 500, then an epoch will complete in 2 iterations.
Similarly, if the batch size is too small or such as 100, then the epoch will be
complete in 10 iterations. So, as a result, we can conclude that for each epoch, the
required number of iterations times the batch size gives the number of data points.
However, we can use multiple numbers epochs for training the machine learning
model.
So it needs to be kept in mind that to optimize the learning, we use gradient descent,
an iterative process. Hence, it is not enough to update the weights with a single pass
or one epoch.
Finding an anomaly is the ability to define what is normal? For example, in the below
image, the yellow vehicle is an anomaly among all red vehicles.
PlayNext
Unmute
Current Time 0:00
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
A tuple within the dataset can be said as a Point anomaly if it is far away from the
rest of the data.
2. Contextual Anomaly
3. Collective Anomaly
Collective anomalies occur when a data point within a set is anomalous for the whole
dataset, and such values are known as collective outliers. In such anomalies, specific
or individual values are not anomalous as a whole or contextually.
K-nearest neighbor is one of the popular nonparametric techniques, which find the
approximate distance between different points on the input vector. This is one of the
best anomaly detection methods. Another popular model is the Bayesian network,
which is used for anomaly detection when combined with statistical schemes. This
model encodes a probabilistic relationship among variable interests.
o Most of the network connections are from normal traffic, and only a small
amount of data is abnormal.
o Malicious traffic is statistically different from normal traffic.
On the basis of these assumptions, data clusters of similar data points that occur
frequently are assumed to be normal traffic, and those data groups that are
infrequent are considered abnormal or malicious.
Anomaly detection can effectively help in catching the fraud, discovering strange
activity in large and complex Big Data sets. This can prove to be useful in areas such
as banking security, natural sciences, medicine, and marketing, which are prone to
malicious activities. With machine learning, an organization can intensify search and
increase the effectiveness of its digital business initiatives.
Anomaly detection using machine learning algorithms can simply correlate data with
corresponding application performance metrics and find out the complete
knowledge of the issue. There are different industries that also employ anomaly
detection techniques for their businesses, such as Telco, Adtech, etc.
Proactively streamlining and improving user experiences will help improve customer
satisfaction in a variety of industries, including Gaming, online business, etc.
Conclusion
In this topic, we have provided a detailed description of anomaly detection and its
use cases in business. Anomaly detection is very helpful in different business
applications such as Credit Card Fraud detection systems, Intrusion detection, etc.
Cost function also plays a crucial role in understanding that how well your model
estimates the relationship between the input and output parameters.
In this topic, we will explain the cost function in Machine Learning, Gradient descent,
and types of cost functions.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In machine learning, once we train our model, then we want to see how well our
model is performing. Although there are various accuracy functions that tell you how
your model is performing, but will not give insights to improve them. So, we need a
function that can find when the model is most accurate by finding the spot between
the undertrained and overtrained model.
The main aim of each ML model is to determine parameters or weights that can
minimize the cost function.
In the above image, the green dots are cats, and the yellow dots are dogs. Below are
the three possible solutions for this classification problem.
In the above solutions, all three classifiers have high accuracy, but the third solution
is the best because it correctly classifies each datapoint. The reason behind the best
classification is that it is in mid between both the classes, not close or not far to any
of them.
To get such results, we need a Cost function. It means for getting the optimal
solution; we need a Cost function. It calculated the difference between the actual
values and predicted values and measured how wrong was our model in the
prediction. By minimizing the value of the cost function, we can get the optimal
solution.
There are three commonly used Regression cost functions, which are as follows:
a. Means Error
In this type of cost function, the error is calculated for each training data, and then
the mean of all error values is taken.
The errors that occurred from the training data can be either negative or positive.
While finding mean, they can cancel out each other and result in the zero-mean error
for the model, so it is not recommended cost function for a model.
Means Square error is one of the most commonly used Cost function methods. It
improves the drawbacks of the Mean error cost function, as it calculates the square
of the difference between the actual value and predicted value. Because of the
square of the difference, it avoids any possibility of negative error.
In MSE, each error is squared, and it helps in reducing a small deviation in prediction
as compared to MAE. But if the dataset has outliers that generate more prediction
errors, then squaring of this error will further increase the error multiple times.
Hence, we can say MSE is less robust to outliers.
Mean Absolute error also overcome the issue of the Mean error cost function by
taking the absolute difference between the actual value and predicted value.
This means the Absolute error cost function is also known as L1 Loss. It is not
affected by noise or outliers, hence giving better results if the dataset has noise or
outlier.
One of the commonly used loss functions for classification is cross-entropy loss.
The binary Cost function is a special case of Categorical cross-entropy, where there is
only one output class. For example, classification between red and blue.
To better understand it, let's suppose there is only a single output variable Y
The error in binary classification is calculated as the mean of cross-entropy for all N
training data. Which means:
It is designed in a way that it can be used with multi-class classification with the
target values ranging from 0 to 1, 3, ….,n classes.
For a perfect cross-entropy, the value should be zero when the score is minimized.
Bayes theorem is also known with some other name such as Bayes rule or Bayes
Law. Bayes theorem helps to determine the probability of an event with random
knowledge. It is used to calculate the probability of occurring one event while other
one already occurred. It is a best method to relate the condition probability and
marginal probability.
In simple words, we can say that Bayes theorem helps to contribute more accurate
results.
Bayes Theorem is used to estimate the precision of values and provides a method for
calculating the conditional probability. However, it is hypocritically a simple
calculation but it is used to easily calculate the conditional probability of events
where intuition often fails. Some of the data scientist assumes that Bayes theorem is
most widely used in financial industries but it is not like that. Other than financial,
Bayes theorem is also extensively applied in health and medical, research and survey
industry, aeronautical sector, etc.
What is Bayes Theorem?
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes' theorem can be derived using product rule and conditional probability of
event X with known event Y:
Here, both events X and Y are independent events which means probability of
outcome of both events does not depends one another.
1. Experiment
2. Sample Space
During an experiment what we get as a result is called as possible outcomes and the
set of all possible outcome of an event is known as sample space. For example, if we
are rolling a dice, sample space will be:
S1 = {1, 2, 3, 4, 5, 6}
Similarly, if our experiment is related to toss a coin and recording its outcomes, then
sample space will be:
S2 = {Head, Tail}
3. Event
o Disjoint Event: If the intersection of the event A and B is an empty set or null
then such events are known as disjoint event or mutually exclusive
events also.
4. Random Variable:
It is a real value function which helps mapping between sample space and a real line
of an experiment. A random variable is taken on some random values and each value
having some probability. However, it is neither random nor a variable but it behaves
as a function which can either be discrete, continuous or combination of both.
5. Exhaustive Event:
As per the name suggests, a set of events where at least one event occurs at a time,
called exhaustive event of an experiment.
Thus, two events A and B are said to be exhaustive if either A or B definitely occur at
a time and both are mutually exclusive for e.g., while tossing a coin, either it will be a
Head or may be a Tail.
6. Independent Event:
Two events are said to be independent when occurrence of one event does not
affect the occurrence of another event. In simple words we can say that the
probability of outcome of both events does not depends one another.
7. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another
event B has already occurred (i.e. A conditional B). This is represented by P(A|B) and
we can define it as:
8. Marginal Probability:
Naïve Bayes classifier is one of the simplest applications of Bayes theorem which is
used in classification algorithms to isolate data as per accuracy, speed and classes.
Let's understand the use of Bayes theorem in machine learning with below example.
These are two conditions given to us, and our classifier that works on Machine
Language has to predict A and the first thing that our classifier has to choose will be
the best possible class. So, with the help of Bayes theorem, we can write it as:
Here;
P(A) will remain constant throughout the class means it does not change its value
with respect to change in class. To maximize the P(Ci/A), we have to maximize the
value of term P(A/Ci) * P(Ci).
With n number classes on the probability list let's assume that the possibility of any
class being the right answer is equally likely. Considering this factor, we can say that:
P(C1)=P(C2)-P(C3)=P(C4)=…..=P(Cn).
This process helps us to reduce the computation cost as well as time. This is how
Bayes theorem plays a significant role in Machine Learning and Naïve Bayes theorem
has simplified the conditional probability tasks without affecting the precision.
Hence, we can conclude that:
Hence, by using Bayes theorem in Machine Learning we can easily describe the
possibilities of smaller events.
o It is one of the simplest and effective methods for calculating the conditional
probability and text classification problems.
o A Naïve-Bayes classifier algorithm is better than all other models where
assumption of independent predictors holds true.
o It is easy to implement than other models.
o It requires small amount of training data to estimate the test data which
minimize the training time period.
o It can be used for Binary as well as Multi-class Classifications.
Conclusion
Though, we are living in technology world where everything is based on various new
technologies that are in developing phase but still these are incomplete in absence
of already available classical theorems and algorithms. Bayes theorem is also most
popular example that is used in Machine Learning. Bayes theorem has so many
applications in Machine Learning. In classification related problems, it is one of the
most preferred methods than all other algorithm. Hence, we can say that Machine
Learning is highly dependent on Bayes theorem. In this article, we have discussed
about Bayes theorem, how can we apply Bayes theorem in Machine Learning, Naïve
Bayes Classifier, etc.
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.
o Activation Function:
These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.
Step-1
In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a
pre-determined value, the model gets activated and shows the output value as +1.
o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the
input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural
networks having various layers in which activation function does not remain linear,
similar to a single layer perceptron model. Instead of linear, activation function can
be executed as sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with
the learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
The perceptron model has the following characteristics.
Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to
interpret data by building intuitive patterns and applying them in the future. Machine
learning is a rapidly growing technology of Artificial Intelligence that is continuously
evolving and in the developing phase; hence the future of perceptron technology will
continue to support and facilitate analytical behavior in machines that will, in turn,
add to the efficiency of computers.
Conclusion:
In this article, you have learned how Perceptron models are the simplest type of
artificial neural network which carries input and their weights, the sum of all
weighted input, and an activation function. Perceptron models are continuously
contributing to Artificial Intelligence and Machine Learning, and these models are
becoming more advanced. Perceptron enables the computer to work more efficiently
on complex problems using various Machine Learning technologies. The Perceptrons
are the fundamentals of artificial neural networks, and everyone should have in-
depth knowledge of perceptron models to study deep neural networks.
Also, Machine Learning is so much demanded in the IT world that most companies
want highly skilled machine learning engineers and data scientists for their business.
Machine Learning contains lots of algorithms and concepts that solve complex
problems easily, and one of them is entropy in Machine Learning. Almost everyone
must have heard the Entropy word once during their school or college days in
physics and chemistry. The base of entropy comes from physics, where it is defined
as the measurement of disorder, randomness, unpredictability, or impurity in the
system. In this article, we will discuss what entropy is in Machine Learning and why
entropy is needed in Machine Learning. So let's start with a quick introduction to the
entropy in Machine Learning.
When information is processed in the system, then every piece of information has a
specific value to make and can be used to draw conclusions from it. So if it is easier
to draw a valuable conclusion from a piece of information, then entropy will be lower
in Machine Learning, or if entropy is higher, then it will be difficult to draw any
conclusion from that piece of information.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Where;
Entropy always lies between 0 and 1, however depending on the number of classes in
the dataset, it can be greater than 1. But the high value of
Let's understand it with an example where we have a dataset having three colors of
fruits as red, green, and yellow. Suppose we have 2 red, 2 green, and 4 yellow
observations throughout the dataset. Then as per the above equation:
E=−(prlog2pr+pplog2pp+pylog2py)
Where;
Let's consider a case when all observations belong to the same class; then entropy
will always be 0.
E=−(1log21)
=0
When entropy becomes 0, then the dataset has no impurity. Datasets with 0
impurities are not useful for learning. Further, if the entropy is 1, then this kind of
dataset is good for learning.
What is a Decision Tree in Machine
Learning?
A decision tree is defined as the supervised learning algorithm used for classification
as well as regression problems. However, it is primarily used for solving classification
problems. Its structure is similar to a tree where internal nodes represent the features
of the dataset, branches of the tree represent the decision rules, and leaf nodes as an
outcome.
Decision trees are used to predict an outcome based on historical data. The decision
tree works on the sequence of 'if-then-else' statements and a root which is our
initial problem to solve.
Root Node: As the name suggests, a root node is the origin point of any decision
tree. It contains the entire data set, which gets divided further into two or more sub-
sets. This node includes multiple branches and is used to make any decision in
classification problems.
Splitting: It is a process that divides the root node into multiple sub-nodes under
some defined conditions.
Branches: Branches are formed by splitting the root node or decision node.
Pruning: Pruning is defined as the process of removing unwanted branches from the
tree.
Parent Node: The root node in a decision tree is called the parent node.
Child Node: Except for the root node, all other nodes are called child nodes in the
decision tree.
Scenario 2 0 1
Scenario 3 1 0
Let's say we have a tree with a total of four values at the root node that is split into
the first level having one value in one branch (say, Branch 1) and three values in the
other branch (Branch 2). The entropy at the root node is 1.
Now, to compute the entropy at the child node 1, the weights are taken as ? for
Branch 1 and ? for Branch 2 and are calculated using Shannon's entropy formula. As
we had seen above, the entropy for child node 2 is zero because there is only one
value in that child node, meaning there is no uncertainty, and hence, the
heterogeneity is not present.
The information gain for the above case is the reduction in the weighted average of
the entropy.
The more the entropy is removed, the greater the information gain. The higher the
information gain, the better the split.
1. An attribute with the highest information gain from a set should be selected
as the parent (root) node. From the image below, it is attributed A.
This article will discuss some major practical issues and their business
implementation, and how we can overcome them. So let's start with a quick
introduction to Machine Learning.
It is a branch of Artificial Intelligence and computer science that helps build a model
based on training data and make predictions and decisions without being constantly
programmed. Machine Learning is used in various applications such as email
filtering, speech recognition, computer vision, self-driven cars, Amazon product
recommendation, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Linear Regression
o Logistic Regression
o Decision Tree
o Bayes Theorem and Naïve Bayes Classification
o Support Vector Machine (SVM) Algorithm
o K-Nearest Neighbor (KNN) Algorithm
o K-Means
o Gradient Boosting algorithms
o Dimensionality Reduction Algorithms
o Random Forest
Common issues in Machine Learning
Although machine learning is being used in every industry and helps organizations
make more informed and data-driven choices that are more effective than classical
methodologies, it still has so many problems that cannot be ignored. Here are some
common issues in Machine Learning that professionals face to inculcate ML skills and
create an application from scratch.
Hence, we should use representative data in training to protect against being biased
and make accurate predictions without any drift.
Overfitting is one of the most common issues faced by Machine Learning engineers
and data scientists. Whenever a machine learning model is trained with a huge
amount of data, it starts capturing noise and inaccurate data into the training data
set. It negatively affects the performance of the model. Let's understand with a
simple example where we have a few training data sets such as 1000 mangoes, 1000
apples, 1000 bananas, and 5000 papayas. Then there is a considerable probability of
identification of an apple as papaya because we have a massive amount of biased
data in the training data set; hence prediction got negatively affected. The main
reason behind overfitting is using non-linear methods used in machine learning
algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.
Underfitting occurs when our model is too simple to understand the base structure
of the data, just like an undersized pant. This generally happens when we have
limited data into the data set, and we try to build a linear model with non-linear data.
In such scenarios, the complexity of the model destroys, and rules of the machine
learning model become too easy to be applied on this data set, and the model starts
doing wrong predictions as well.
8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine
learning algorithm. To identify the customers who paid for the recommendations
shown by the model and who don't even check them. Hence, an algorithm is
necessary to recognize the customer behavior and trigger a relevant
recommendation for the user based on past experience.
Conclusion
An ML system doesn't perform well if the training set is too small or if the data is not
generalized, noisy, and corrupted with irrelevant features. We went through some of
the basic challenges faced by beginners while practicing machine learning. Machine
learning is all set to bring a big bang transformation in technology. It is one of the
most rapidly growing technologies used in medical diagnosis, speech recognition,
robotic training, product recommendations, video surveillance, and this list goes on.
This continuously evolving domain offers immense job satisfaction, excellent
opportunities, global exposure, and exorbitant salary. It is high risk and a high return
technology. Before starting your machine learning journey, ensure that you carefully
examine the challenges mentioned above. To learn this fantastic technology, you
need to plan carefully, stay patient, and maximize your efforts. Once you win this
battle, you can conquer the Future of work and land your dream job!
Precision and Recall in Machine
Learning
While building any machine learning model, the first thing that comes to our mind is
how we can build an accurate & 'good fit' model and what the challenges are that
will come during the entire procedure. Precision and Recall are the two most
important but confusing concepts in Machine Learning. Precision and recall are
performance metrics used for pattern recognition and classification in machine
learning. These concepts are essential to build a perfect machine learning model
which gives more precise and accurate results. Some of the models in machine
learning require more precision and some model requires more recall. So, it is
important to know the balance between Precision and recall or, simply, precision-
recall trade-off.
In this article, we will understand Precision and recall, the most confusing but
important concepts in machine learning that lots of professionals face during their
entire data science & machine learning career. But before starting, first, we need to
understand the confusion matrix concept. So, let's start with the quick introduction
of Confusion Matrix in Machine Learning.
Confusion Matrix helps us to visualize the point where our model gets confused in
discriminating two classes. It can be understood well through a 2×2 matrix where the
row represents the actual truth labels, and the column represents the predicted
labels.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
This matrix consists of 4 main elements that show different metrics to count a
number of correct and incorrect predictions. Each element has two words either as
follows:
o True or False
o Positive or Negative
If the predicted and truth labels match, then the prediction is said to be correct, but
when the predicted and truth labels are mismatched, then the prediction is said to be
incorrect. Further, positive and negative represents the predicted labels in the matrix.
There are four metrics combinations in the confusion matrix, which are as follows:
o True Positive: This combination tells us how many times a model correctly
classifies a positive sample as Positive?
o False Negative: This combination tells us how many times a model incorrectly
classifies a positive sample as Negative?
o False Positive: This combination tells us how many times a model incorrectly
classifies a negative sample as Positive?
o True Negative: This combination tells us how many times a model correctly
classifies a negative sample as Negative?
What is Precision?
Precision is defined as the ratio of correctly classified positive samples (True Positive)
to a total number of classified positive samples (either correctly or incorrectly).
o The precision of a machine learning model will be low when the value of;
o The precision of the machine learning model will be high when Value of;
Precision = TP/TP+FP
Case 2- In this scenario, we have three Positive samples that are correctly classified,
and one Negative sample is incorrectly classified.
Case 3- In this scenario, we have three Positive samples that are correctly classified
but no Negative sample which is incorrectly classified.
Precision = TP/TP+FP
Hence, in the last scenario, we have a precision value of 1 or 100% when all positive
samples are classified as positive, and there is no any Negative sample that is
incorrectly classified.
What is Recall?
The recall is calculated as the ratio between the numbers of Positive samples
correctly classified as Positive to the total number of Positive samples. The recall
measures the model's ability to detect positive samples. The higher the recall, the
more positive samples detected.
1. Recall = True Positive/True Positive + False Negative
2. Recall = TP/TP+FN
o Recall of a machine learning model will be low when the value of;
TP+FN (denominator) > TP (Numerator)
o Recall of machine learning model will be high when Value of;
TP (Numerator) > TP+FN (denominator)
Example 1- Let's understand the calculation of Recall with four different cases where
each case has the same Recall as 0.667 but differs in the classification of negative
samples. See how:
In this scenario, the classification of the negative sample is different in each case.
Case A has two negative samples classified as negative, and case B have two negative
samples classified as negative; case c has only one negative sample classified as
negative, while case d does not classify any negative sample as negative.
However, recall is independent of how the negative samples are classified in the
model; hence, we can neglect negative samples and only calculate all samples that
are classified as positive.
In the above image, we have only two positive samples that are correctly classified as
positive while only 1 negative sample that is correctly classified as negative.
Hence, true positivity rate is 2 and while false negativity rate is 1. Then recall will be:
Recall = TP/TP+FN
=2/(2+1)
=2/3
=0.667
Note: This means the model has correctly classified only 0.667% of Positive Samples
Example-2
Now, we have another scenario where all positive samples are classified correctly as
positive. Hence, the True Positive rate is 3 while the False Negative rate is 0.
Recall = TP/TP+FN = 3/(3+0) =3/3 =1
If the recall is 100%, then it tells us the model has detected all positive samples as
positive and neglects how all negative samples are classified in the model. However,
the model could still have so many samples that are classified as negative but recall
just neglect those samples, which results in a high False Positive rate in the model.
Note: This means the model has correctly classified 100% of Positive Samples.
Example-3
In this scenario, the model does not identify any positive sample that is classified as
positive. All positive samples are incorrectly classified as Negative. Hence, the true
positive rate is 0, and the False Negative rate is 3. Then Recall will be:
This means the model has not correctly classified any Positive Samples.
Precision Recall
It helps us to measure the ability to classify It helps us to measure how many positive sam
positive samples in the model. were correctly classified by the ML model.
While calculating the Precision of a model, we While calculating the Recall of a model, we
should consider both Positive as well as Negative need all positive samples while all neg
samples that are classified. samples will be neglected.
When a model classifies most of the positive When a model classifies a sample as Positive, b
samples correctly as well as many false-positive can only classify a few positive samples, then
samples, then the model is said to be a high model is said to be high accuracy, high preci
recall and low precision model. and low recall model.
The precision of a machine learning model is Recall of a machine learning model is depen
dependent on both the negative and positive on positive samples and independent of neg
samples. samples.
In Precision, we should consider all positive The recall cares about correctly classifying
samples that are classified as positive either positive samples. It does not consider if
correctly or incorrectly. negative sample is classified as positive.
Conclusion:
In this tutorial, we have discussed various performance metrics such as confusion
matrix, Precision, and Recall for binary classification problems of a machine learning
model. Also, we have seen various examples to calculate Precision and Recall of a
machine learning model and when we should use precision, and when to use Recall.
Genetic Algorithm in Machine
Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature." It is used to solve optimization problems
in machine learning. It is one of the important algorithms as it helps solve complex
problems that would take a long time to solve.
Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and
artificial creativity.
In this topic, we will explain Genetic algorithm in detail, including basic terminologies
used in Genetic algorithm, how it works, advantages and limitations of genetic
algorithm, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
After calculating the fitness of every existent in the population, a selection process is
used to determine which of the individualities in the population will get to reproduce
and produce the seed that will form the coming generation.
So, now we can define a genetic algorithm as a heuristic search algorithm to solve
optimization problems. It is a subset of evolutionary algorithms, which is used in
computing. A genetic algorithm uses genetic and natural selection concepts to solve
optimization problems.
How Genetic Algorithm Work?
The genetic algorithm works on the evolutionary generational cycle to generate
high-quality solutions. These algorithms use different operations that either enhance
or replace the population to give an improved fit solution.
It basically involves five phases to solve the complex optimization problems, which
are given as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which
is called population. Here each individual is the solution for the given problem. An
individual contains or is characterized by a set of parameters called Genes. Genes are
combined into a string and generate chromosomes, which is the solution to the
problem. One of the most popular techniques for initialization is the use of random
binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of
an individual to compete with other individuals. In every iteration, individuals are
evaluated based on their fitness function. The fitness function provides a fitness score
to each individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of
offspring. All the selected individuals are then arranged in a pair of two to increase
reproduction. Then these individuals transfer their genes to the next generation.
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In
this step, the genetic algorithm uses two variation operators that are applied to the
parent population. The two operators involved in the reproduction phase are given
below:
The genes of parents are exchanged among themselves until the crossover
point is met. These newly generated offspring are added to the population.
This process is also called or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in the population. It can be done by flipping some bits
in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution is reached.
It will identify the final solution as the best solution in the population.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Xn = Value of Normalization
o Xmaximum = Maximum value of a feature
o Xminimum = Minimum value of a feature
Example: Let's assume we have a model dataset having maximum and minimum
values of feature as mentioned above. To normalize the machine learning model,
values are shifted and rescaled so their range can vary between 0 and 1. This
technique is also known as Min-Max scaling. In this scaling technique, we will
change the feature values as follows:
Xn = 0
Case2- If the value of X is maximum, then the value of the numerator is equal to the
denominator; hence Normalization will be 1.
Xn = 1
Case3- On the other hand, if the value of X is neither maximum nor minimum, then
values of normalization will also be between 0 and 1.
Hence, Normalization can be defined as a scaling method where values are shifted
and rescaled to maintain their ranges between 0 and 1, or in other words; it can be
referred to as Min-Max scaling technique.
Here, µ represents the mean of feature value, and σ represents the standard
deviation of feature values.
However, unlike Min-Max scaling technique, feature values are not restricted to a
specific range in the standardization technique.
This technique is helpful for various machine learning algorithms that use distance
measures such as KNN, K-means clustering, and Principal component analysis,
etc. Further, it is also important that the model is built on assumptions and data is
normally distributed.
This technique uses minimum and max values This technique uses mean and standard deviation
for scaling of model. scaling of model.
It is helpful when features are of different It is helpful when the mean of a variable is set
scales. and the standard deviation is set to 1.
Scales values ranges between [0, 1] or [-1, 1]. Scale values are not restricted to a specific range.
Further, it is also useful for data having variable scaling techniques such as KNN,
artificial neural networks. Hence, you can't use assumptions for the distribution of
data.
2. Standardization in the machine learning model is useful when you are exactly
aware of the feature distribution of data or, in other words, your data follows a
Gaussian distribution. However, this does not have to be necessarily true. Unlike
Normalization, Standardization does not necessarily have a bounding range, so if you
have outliers in your data, they will not be affected by Standardization.
Further, it is also useful when data has variable dimensions and techniques such
as linear regression, logistic regression, and linear discriminant analysis.
Because of its bigger value, the attributed income will organically influence the
conclusion more when we undertake further analysis, such as multivariate linear
regression. However, this does not necessarily imply that it is a better predictor. As a
result, we normalize the data so that all of the variables are in the same range.
Further, it is also helpful for the prediction of credit risk scores where normalization is
applied to all numeric data except the class column. It uses the tanh
transformation technique, which converts all numeric features into values of range
between 0 to 1.
Conclusion
Normalization avoids raw data and various problems of datasets by creating new
values and maintaining general distribution as well as a ratio in data. Further, it also
improves the performance and accuracy of machine learning models using various
techniques and algorithms. Hence, the concept of Normalization and Standardization
is a bit confusing but has a lot of importance to build a better machine learning
model.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
However, the first image is slightly different from than third image or even a
modified version of the first. The right-side image is created with the help of
introducing a small perturbation in the central image.
The first image is predicted by the model to be a panda, as expected, while the right
side image is recognized as a gibbon with high confidence.
Hence, while introducing a typical image with adversarial input, it can cause a
classifier to misguide a panda as a gibbon.
Now, take another example that shows different views of a 3D turtle the authors
printed and the misclassifications by the Google Inception v3 model.
Adversarial machine learning has yielded results that range from the funny, benign,
and embarrassing-such as to following turtle being mistaken for a rifle-to potentially
harmful examples, such as a self-driving car mistaking a stop sign for a speed limit.
What do you mean by adversarial Whitebox
and Blackbox attacks?
There are two ways in which attacks are categorized in machine learning. These are
as follows:
Black Box Attack: Black Box attacks are the scenario where attackers do not have
model information about the targeted model and also have no access to its
architecture, parameters, and gradients.
White Box Attack: These attacks are just opposite to black-box attacks, where
attackers have all access to the targeted model and information of its architecture,
parameters, and gradients as well.
Black box attacks and white box attacks are further categorized into two types as
follows:
o Targeted Attacks: In this type of attack, attackers disrupt the input in such a
way that the model predicts a specific target class.
o Un-targeted Attacks: In this type of attack, attackers disrupt the inputs in
such a way that the model predicts a class, but it should not be a true class.
Poisoning Attack:
Poisoning attacks take place whenever the machine learning model is under training
or during deployment. It is also referred to as contaminating attacks.
In poisoning attacks, attackers influence the data or its labels when a model is in the
training phase, which causes system skewed or generates inaccurate decisions in the
future. It reduces the accuracy and performance of the machine learning system.
Evasion Attacks:
These attacks are just opposite to the poisoning attacks, where attacks take place
after a machine learning system has already been trained. These attacks are
commonly used attacks type in machine learning.
It occurs when the ML model calculates the probability around a new sample and is
often developed by trial-and-error methods. The attackers manipulate the data
during deployment, but they are unknown when a machine learning model breaks.
Let's understand with an example. Suppose the attacker wants to investigate the
algorithm of the machine learning model that is designed to filter the spam email
content. Then attackers may do various experiments on different emails to bypass
the spam filter by introducing a new email that includes enough extraneous words to
"tip" the algorithm and classify it as not spam from spam.
These attacks may affect the righteousness and confidentiality of a machine learning
model, which leads it to provide malicious output that is intended by an attacker.
These attacks can also be used to reveal private or sensitive information. One of the
most prevalent examples of evasion attacks is spoofing attacks against biometric
verification systems.
Model Extraction:
Model Extraction is referred to as a black box machine learning system. It is used to
reconstruct the model by extracting data on which it got trained. It helps to steal the
stock marketing prediction model, and later attackers reconstruct a new model
similar to the previous model for their own financial benefit. Model Extraction attacks
are important when either the training data or the model itself is sensitive and
confidential.
Techniques/Methods used in generating
Adversarial Attack
C&W It stands for Carlini & Wagner It is the most effective It is more computatio
Attack. This technology is also method for generating intensive in compariso
similar to the L-BFGS attack, but adversarial examples in Deepfool, FGSM, and J
the only difference is related to machine learning and methods, and example
box constraints and different can misguide the not appropriate.
objective functions as it does adversarial defenses
not contain box constraints technologies also.
which makes the method more
effective for generating
adversarial examples.
Conclusion
Well, in this way, we have understood how adversarial machine learning examples
are so important for security perspectives in machine learning and Artificial
Intelligence. Hopefully, you will get complete basic information about adversarial
machine learning after reading this tutorial.
Machine Learning enables computers to behave like human beings by training them
with the help of past experience and predicted data.
There are three key aspects of Machine Learning, which are as follows:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and
estimations, etc.
o Experience: It is defined as learning from historical or past data and used to
estimate and resolve future tasks.
o Performance: It is defined as the capacity of any machine to resolve any
machine learning task or problem and provide the best outcome for the same.
However, performance is dependent on the type of machine learning
problems.
1. Supervised Learning
Supervised learning is applicable when a machine has sample data, i.e., input as well
as output data with correct labels. Correct labels are used to check the correctness of
the model using some labels and tags. Supervised learning technique helps us to
predict future events with the help of past experience and labeled examples. Initially,
it analyses the known training dataset, and later it introduces an inferred function
that makes predictions about output values. Further, it also predicts errors during this
entire learning process and also corrects those errors through algorithms.
Example: Let's assume we have a set of images tagged as ''dog''. A machine learning
algorithm is trained with these dog images so it can easily distinguish whether an
image is a dog or not.
2. Unsupervised Learning
In unsupervised learning, a machine is trained with some input samples or labels
only, while output is not known. The training information is neither classified nor
labeled; hence, a machine may not always provide correct output compared to
supervised learning.
Example: Let's assume a machine is trained with some set of documents having
different categories (Type A, B, and C), and we have to organize them into
appropriate groups. Because the machine is provided only with input samples or
without output, so, it can organize these datasets into type A, type B, and type C
categories, but it is not necessary whether it is organized correctly or not.
3. Reinforcement Learning
Reinforcement Learning is a feedback-based machine learning technique. In such
type of learning, agents (computer programs) need to explore the environment,
perform actions, and on the basis of their actions, they get rewards as feedback. For
each good action, they get a positive reward, and for each bad action, they get a
negative reward. The goal of a Reinforcement learning agent is to maximize the
positive rewards. Since there is no labeled data, the agent is bound to learn by its
experience only.
4. Semi-supervised Learning
Semi-supervised Learning is an intermediate technique of both supervised and
unsupervised learning. It performs actions on datasets having few labels as well as
unlabeled data. However, it generally contains unlabeled data. Hence, it also reduces
the cost of the machine learning model as labels are costly, but for corporate
purposes, it may have few labels. Further, it also increases the accuracy and
performance of the machine learning model.
Marketing:
Machine learning helps marketers to create various hypotheses, testing, evaluation,
and analyze datasets. It helps us to quickly make predictions based on the concept of
big data. It is also helpful for stock marketing as most of the trading is done through
bots and based on calculations from machine learning algorithms. Various Deep
Learning Neural network helps to build trading models such as Convolutional Neural
Network, Recurrent Neural Network, Long-short term memory, etc.
Self-driving cars:
This is one of the most exciting applications of machine learning in today's world. It
plays a vital role in developing self-driving cars. Various automobile companies like
Tesla, Tata, etc., are continuously working for the development of self-driving cars. It
also becomes possible by the machine learning method (supervised learning), in
which a machine is trained to detect people and objects while driving.
Speech Recognition:
Speech Recognition is one of the most popular applications of machine learning.
Nowadays, almost every mobile application comes with a voice search facility. This
''Search By Voice'' facility is also a part of speech recognition. In this method, voice
instructions are converted into text, which is known as Speech to text" or "Computer
speech recognition.
Google assistant, SIRI, Alexa, Cortana, etc., are some famous applications of speech
recognition.
Traffic Prediction:
Machine Learning also helps us to find the shortest route to reach our destination by
using Google Maps. It also helps us in predicting traffic conditions, whether it is
cleared or congested, through the real-time location of the Google Maps app and
sensor.
Image Recognition:
Image recognition is also an important application of machine learning for
identifying objects, persons, places, etc. Face detection and auto friend tagging
suggestion is the most famous application of image recognition used by Facebook,
Instagram, etc. Whenever we upload photos with our Facebook friends, it
automatically suggests their names through image recognition technology.
Product Recommendations:
Machine Learning is widely used in business industries for the marketing of various
products. Almost all big and small companies like Amazon, Alibaba, Walmart, Netflix,
etc., are using machine learning techniques for products recommendation to their
users. Whenever we search for any products on their websites, we automatically get
started with lots of advertisements for similar products. This is also possible by
Machine Learning algorithms that learn users' interests and, based on past data,
suggest products to the user.
Automatic Translation:
Automatic language translation is also one of the most significant applications of
machine learning that is based on sequence algorithms by translating text of one
language into other desirable languages. Google GNMT (Google Neural Machine
Translation) provides this feature, which is Neural Machine Learning. Further, you can
also translate the selected text on images as well as complete documents through
Google Lens.
Virtual Assistant:
A virtual personal assistant is also one of the most popular applications of machine
learning. First, it records out voice and sends to cloud-based server then decode it
with the help of machine learning algorithms. All big companies like Amazon,
Google, etc., are using these features for playing music, calling someone, opening an
app and searching data on the internet, etc.
y= a0+a1x+ ε
Y= Dependent Variable
X= Independent Variable
The values for x and y variables are training datasets for Linear Regression model
representation.
Linear Regression is helpful for evaluating the business trends and forecasts such as
prediction of salary of a person based on their experience, prediction of crop
production based on the amount of rainfall, etc.
Logistic Regression
Logistic Regression is a subset of the Supervised learning technique. It helps us to
predict the output of categorical dependent variables using a given set of
independent variables. However, it can be Binary (0 or 1) as well as Boolean
(true/false), but instead of giving an exact value, it gives a probabilistic value between
o or 1. It is much similar to Linear Regression, depending on its use in the machine
learning model. As Linear regression is used for solving regression problems,
similarly, Logistic regression is helpful for solving classification problems.
o Binomial
o Multinomial
o Ordinal
Let's understand the KNN algorithm with the below screenshot, where we have to
assign a new data point based on the similarity with available data points.
Including Machine Learning, KNN algorithms are used in so many fields as follows:
K-Means Clustering
K-Means Clustering is a subset of unsupervised learning techniques. It helps us to
solve clustering problems by means of grouping the unlabeled datasets into different
clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
Decision Tree
Decision Tree is also another type of Machine Learning technique that comes under
Supervised Learning. Similar to KNN, the decision tree also helps us to solve
classification as well as regression problems, but it is mostly preferred to solve
classification problems. The name decision tree is because it consists of a tree-
structured classifier in which attributes are represented by internal nodes, decision
rules are represented by branches, and the outcome of the model is represented by
each leaf of a tree. The tree starts from the decision node, also known as the root
node, and ends with the leaf node.
Decision nodes help us to make any decision, whereas leaves are used to determine
the output of those decisions.
A Decision Tree is a graphical representation for getting all the possible outcomes to
a problem or decision depending on certain given conditions.
Random Forest
Random Forest is also one of the most preferred machine learning algorithms that
come under the Supervised Learning technique. Similar to KNN and Decision Tree, It
also allows us to solve classification as well as regression problems, but it is preferred
whenever we have a requirement to solve a complex problem and to improve the
performance of the model.
Naïve Bayes
The naïve Bayes algorithm is one of the simplest and most effective machine learning
algorithms that come under the supervised learning technique. It is based on the
concept of the Bayes Theorem, used to solve classification-related problems. It helps
to build fast machine learning models that can make quick predictions with greater
accuracy and performance. It is mostly preferred for text classification having high-
dimensional training datasets.
It is also based on the concept of Bayes Theorem, which is also known as Bayes' Rule
or Bayes' law. Mathematically, Bayes Theorem can be expressed as follows:
Where,
Conclusion
This article has introduced you to a few important basic concepts of Machine
Learning. Now, we can say, machine learning helps to build a smart machine that
learns from past experience and works faster. There are a lot of online games
available on the internet that are much faster than a real game player, such as Chess,
AlphaGo and Ludo, etc. However, machine learning is a broad concept, but also you
can learn each concept in a few hours of study. If you are preparing yourself for
making a data scientist or machine learning engineer, then you must have in-depth
knowledge of each concept of machine learning.
As the number of samples available for learning increases, the algorithm adapts to
improve performance. Deep learning is a special form of machine learning.
Supervised learning
Supervised machine learning creates a model that makes predictions based on
evidence in the presence of uncertainty. A supervised learning algorithm takes a
known set of input data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to the new data. Use
supervised learning if you have known data for the output you are trying to estimate.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Classification models classify the input data. Classification techniques predict discrete
responses. For example, the email is genuine, or spam, or the tumor is cancerous or
benign. Typical applications include medical imaging, speech recognition, and credit
scoring.
Use taxonomy if your data can be tagged, classified, or divided into specific groups
or classes. For example, applications for handwriting recognition use classification to
recognize letters and numbers. In image processing and computer vision,
unsupervised pattern recognition techniques are used for object detection and
image segmentation.
If you are working with a data range or if the nature of your response is a real
number, such as temperature or the time until a piece of equipment fails, use
regression techniques.
Unsupervised Learning
Detects hidden patterns or internal structures in unsupervised learning data. It is
used to eliminate datasets containing input data without labeled responses.
For example, if a cell phone company wants to optimize the locations where they
build towers, they can use machine learning to predict how many people their towers
are based on.
A phone can only talk to 1 tower at a time, so the team uses clustering algorithms to
design the good placement of cell towers to optimize signal reception for their
groups or groups of customers.
Ten methods are described and it is a foundation you can build on to improve your
machine learning knowledge and skills:
o Regression
o Classification
o Clustering
o Dimensionality Reduction
o Ensemble Methods
o Neural Nets and Deep Learning
o Transfer Learning
o Reinforcement Learning
o Natural Language Processing
o Word Embedding's
For example, you can use supervised ML techniques to help a service business that
wants to estimate the number of new users that will sign up for the service in the
next month. In contrast, untrained ML looks at ways of connecting and grouping
data points without using target variables to make predictions.
In other words, it evaluates data in terms of traits and uses traits to group objects
that are similar to each other. For example, you can use unsupervised learning
techniques to help a retailer who wants to segment products with similar
characteristics-without specifying in advance which features to use.
1. Regression
Regression methods fall under the category of supervised ML. They help predict or
interpret a particular numerical value based on prior data, such as predicting an
asset's price based on past pricing data for similar properties.
The simplest method is linear regression, where we use the mathematical equation of
the line (y = m * x + b) to model the data set. We train a linear regression model
with multiple data pairs (x, y) by computing the position and slope of a line that
minimizes the total distance between all data points and the line. In other words, we
calculate the slope (M) and the y-intercept (B) for a line that best approximates the
observations in the data.
Let us consider a more concrete example of linear regression. I once used linear
regression to predict the energy consumption (in kW) of some buildings by
gathering together the age of the building, the number of stories, square feet, and
the number of wall devices plugged in.
Since there was more than one input (age, square feet, etc.), I used a multivariable
linear regression. The principle was similar to a one-to-one linear regression. Still, in
this case, the "line" I created occurred in a multi-dimensional space depending on
the number of variables.
Now imagine that you have access to the characteristics of a building (age, square
feet, etc.), but you do not know the energy consumption. In this case, we can use the
fitted line to estimate the energy consumption of the particular building. The plot
below shows how well the linear regression model fits the actual energy
consumption of the building.
Note that you can also use linear regression to estimate the weight of each factor
that contributes to the final prediction of energy consumed. For example, once you
have a formula, you can determine whether age, size, or height are most important.
Regression techniques run the gamut from simple (linear regression) to complex
(regular linear regression, polynomial regression, decision trees, random forest
regression, and neural nets). But don't get confused: start by studying simple linear
regression, master the techniques, and move on.
2. Classification
In another class of supervised ML, classification methods predict or explain a class
value. For example, they can help predict whether an online customer will purchase a
product. Output can be yes or no: buyer or no buyer. But the methods of
classification are not limited to two classes. For example, a classification method can
help assess whether a given image contains a car or a truck. The simplest
classification algorithm is logistic regression, which sounds like a regression method,
but it is not. Logistic regression estimates the probability of occurrence of an event
based on one or more inputs.
For example, logistic regression can take two test scores for a student to predict that
the student will get admission to a particular college. Because the guess is a
probability, the output is a number between 0 and 1, where 1 represents absolute
certainty. For the student, if the predicted probability is greater than 0.5, we estimate
that they will be admitted. If the predicted probability is less than 0.5, we estimate it
will be rejected.
The chart below shows the marks of past students and whether they were admitted.
Logistic regression allows us to draw a line that represents the decision boundary.
The most popular clustering method is K-Means, where "K" represents the number of
clusters selected by the user. (Note that there are several techniques for selecting the
value of K, such as the elbow method.)
Otherwise, we return to step 2. (To prevent ending in an infinite loop if the centers
continue to change, set the maximum number of iterations in advance.)
The process is over if the centers do not change (or change very little).
The next plot applies the K-means to the building's data set. The four measurements
pertain to air conditioning, plug-in appliances (microwave, refrigerator, etc.),
household gas, and heating gas. Each column of the plot represents the efficiency of
each building.
Linear regression model estimates of building energy consumption (kWh).
Regression techniques run the gamut from simple (linear) to complex (regular linear,
polynomial, decision trees, random forest, and neural nets). But don't get confused:
start by studying simple linear regression, master the techniques, and move on.
As you explore clustering, you will come across very useful algorithms such as
Density-based Spatial Clustering of Noise (DBSCAN), Mean Shift Clustering,
Agglomerative Hierarchical Clustering, and Expectation-Maximization Clustering
using the Gaussian Mixture Model, among others.
4. Dimensionality Reduction
We use dimensionality reduction to remove the least important information
(sometimes unnecessary columns) from the data setFor example, and images may
consist of thousands of pixels, which are unimportant to your analysis. Or, when
testing microchips within the manufacturing process, you may have thousands of
measurements and tests applied to each chip, many of which provide redundant
information. In these cases, you need a dimensionality reduction algorithm to make
the data set manageable.
The next plot shows the analysis of the MNIST database of handwritten digits. MNIST
contains thousands of images of numbers 0 to 9, which the researchers use to test
their clustering and classification algorithms. Each row of the data set is a vector
version of the original image (size 28 x 28 = 784) and a label for each image (zero,
one, two, three, …, nine). Therefore, we are reducing the dimensionality from 784
(pixels) to 2 (the dimensions in our visualization). Projecting to two dimensions allows
us to visualize higher-dimensional original data sets.
5. Ensemble Methods
Imagine that you have decided to build a bicycle because you are not happy with the
options available in stores and online. Once you've assembled these great parts, the
resulting bike will outlast all other options.
Each model uses the same idea of combining multiple predictive models (supervised
ML) to obtain higher quality predictions than the model.
For example, the Random Forest algorithm is an ensemble method that combines
multiple decision trees trained with different samples from a data set. As a result, the
quality of predictions of a random forest exceeds the quality of predictions predicted
with a single decision tree.
Think about ways to reduce the variance and bias of a single machine learning
model. By combining the two models, the quality of the predictions becomes
balanced. With another model, the relative accuracy may be reversed. It is important
because any given model may be accurate under some conditions but may be
inaccurate under other conditions.
Most of the top winners of Kaggle competitions use some dressing method. The
most popular ensemble algorithms are Random Forest, XGBoost, and LightGBM.
The neural network structure is flexible enough to construct our famous linear and
logistic regression. The term deep learning comes from a neural net with many
hidden layers and encompasses a variety of architectures.
It is especially difficult to keep up with development in deep learning as the research
and industry communities redouble their deep learning efforts, spawning whole new
methods every day.
Deep learning techniques require a lot of data and computation power for best
performance as this method is self-tuning many parameters within vast architectures.
It quickly becomes clear why deep learning practitioners need powerful computers
with GPUs (Graphical Processing Units).
7. Transfer learning
Let's say you are a data scientist working in the retail industry. You've spent months
training a high-quality model to classify images as shirts, t-shirts, and polos. Your
new task is to create a similar model to classify clothing images like jeans, cargo,
casual, and dress pants.
Transfer learning refers to reusing part of an already trained neural net and adapting
it to a new but similar task. Specifically, once you train a neural net using the data for
a task, you can move a fraction of the trained layers and combine them with some
new layers that you can use for the new task. The new neural net can learn and adapt
quickly to a new task by adding a few layers.
The advantage of transfer learning is that you need fewer data to train a neural net,
which is especially important because training for deep learning algorithms is
expensive in terms of both time and money.
The main advantage of transfer learning is that you need fewer data to train a neural
net, which is especially important because training for deep learning algorithms is
expensive both in terms of time and money (computational resources). Of course, it
isn't easy to find enough labeled data for training.
Let's come back to your example and assume that you use a neural net with 20
hidden layers for the shirt model. After running a few experiments, you realize that
you can move the 18 layers of the shirt model and combine them with a new layer of
parameters to train on the pant images.
So the Pants model will have 19 hidden layers. The inputs and outputs of the two
functions are different but reusable layers can summarize information relevant to
both, for example, fabric aspects.
Transfer learning has become more and more popular, and there are many concrete
pre-trained models now available for common deep learning tasks such as image
and text classification.
8. Reinforcement Learning
Imagine a mouse in a maze trying to find hidden pieces of cheese. At first, the Mouse
may move randomly, but after a while, the Mouse's feel helps sense which actions
bring it closer to the cheese. The more times we expose the Mouse to the maze, the
better at finding the cheese.
Process for Mouse refers to what we do with Reinforcement Learning (RL) to train a
system or game. Generally speaking, RL is a method of machine learning that helps
an agent to learn from experience.
You can use RL when you have little or no historical data about a problem, as it does
not require prior information (unlike traditional machine learning methods). In the RL
framework, you learn from the data as you go. Not surprisingly, RL is particularly
successful with games, especially games of "correct information" such as chess and
Go. With games, feedback from the agent and the environment comes quickly,
allowing the model to learn faster. The downside of RL is that it can take a very long
time to train if the problem is complex.
As IBM's Deep Blue beat the best human chess player in 1997, the RL-based
algorithm AlphaGo beat the best Go player in 2016. The current forerunners of RL are
the teams of DeepMind in the UK.
In April 2019, the OpenAI Five team was the first AI to defeat the world champion
team of e-sport Dota 2, a very complex video game that the OpenAI Five team chose
because there were no RL algorithms capable of winning it. You can tell that
reinforcement learning is a particularly powerful form of AI, and we certainly want to
see more progress from these teams. Still, it's also worth remembering the
limitations of the method.
Natural Language Processing (NLP) is not a machine learning method but a widely
used technique for preparing text for machine learning. Think of many text
documents in different formats (Word, online blog). Most of these text documents
will be full of typos, missing characters, and other words that need to be filtered out.
At the moment, the most popular package for processing text is NLTK (Natural
Language Toolkit), created by Stanford researchers.
The easiest way to map text to a numerical representation is to count the frequency
of each word in each text document. Think of a matrix of integers where each row
represents a text document, and each column represents a word. This matrix
representation of the term frequency is usually called the term frequency matrix
(TFM). We can create a more popular matrix representation of a text document by
dividing each entry on the matrix by the weighting of how important each word is in
the entire corpus of documents. We call this method Term Frequency Inverse
Document Frequency (TFIDF), and it generally works better for machine learning
tasks.
Let's say vector('word') is the numeric vector representing the word 'word'. To
approximate the vector ('female'), we can perform an arithmetic operation with the
vectors:
The word representation allows finding the similarity between words by computing
the cosine similarity between the vector representations of two words. The cosine
similarity measures the angle between two vectors.
We calculate word embedding's using machine learning methods, but this is often a
pre-stage of implementing machine learning algorithms on top. For example, let's
say we have access to the tweets of several thousand Twitter users. Let's also assume
that we know which Twitter users bought the house. To estimate the probability of a
new Twitter user buying a home, we can combine Word2Vec with logistic regression.
You can train the word embedding yourself or get a pre-trained (transfer learning)
set of word vectors. To download pre-trained word vectors in 157 different
languages, look at Fast Text.
Summary
Studying these methods thoroughly and fully understanding the basics of each can
serve as a solid starting point for further study of more advanced algorithms and
methods.
There is no best way or one size fits all. Finding the right algorithm is partly just trial
and error - even highly experienced data scientists can't tell whether an algorithm
will work without trying it out. But algorithmic selection also depends on the size and
type of data you're working with, the insights you want to derive from the data, and
how those insights will be used.
In recent years, Machine Learning has evolved very rapidly and has become one of
the most popular and demanding technology in current times. It is currently being
used in every field, making it more valuable. But there are two biggest barriers to
making efficient use of machine learning (classical & deep learning): skills
and computing resources. However, computing resources can be made available by
spending a good amount of money, but the availability of skills to solve the machine
learning problem is still difficult. It means it is not available for those with limited
machine learning knowledge. To solve this problem, Automated Machine Learning
(AutoML) came into existence. In this topic, we will understand what AuotML is and
how it affects the world?
What is AutoML?
Automated Machine Learning or AutoML is a way to automate the time-consuming
and iterative tasks involved in the machine learning model development process. It
provides various methods to make machine learning available for people with limited
knowledge of Machine Learning. It aims to reduce the need for skilled people to
build the ML model. It also helps to improve efficiency and to accelerate the research
on Machine learning.
To better understand automated machine learning, we must know the life cycle of a
data science or ML project. A typical lifecycle of a data science project contains the
following phases:
o Data Cleaning
o Feature Selection/Feature Engineering
o Model Selection
o Parameter Optimization
o Model Validation.
Although the technology has become so advanced, still all these processes need
manual processes, which are time-consuming and require many skilled data
scientists. The complexity of completing these tasks is very difficult for the non-ml
experts. The rapid growth of ML applications has generated the demand for
automating these processes so that they can also be easily used without expert
knowledge. Hence, to automate the entire process from data cleaning -to-parameter
optimization, Automated machine learning came into existence. It does save not only
time but also gives a tremendous performance.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
AutoML Platforms
AutoML has evolved before many years, but in the last few years, it has gained
popularity. There are several platforms or frameworks that have emerged. These
platforms enable the user to train the model using drag & drop design tools.
1. Google Cloud AutoML
Google has launched several AutoML products for building our own custom machine
learning models as per the business needs, and it also allows us to integrate these
models into our applications or websites. Google has created the following product:
The above products provide various tools to train the model for specific use cases
with limited machine learning expertise. For cloud AutoML, we don't need to have
knowledge of transfer learning or how to create a neural network, as it provides the
out-of-box for deep learning models.
The Microsoft Azure AutoML was released in the year 2018. It also offers a
transparent model selection process to non-ml experts to build the ML models.
3. H2O.ai
H2O is an open-source platform that enables the user to create ML models. It can be
used for automating the machine learning workflow, such as automatic training and
tuning of many models within a user-specified time limit. Although H2O AutoML can
make the development of ML models easy for the non-experts still, a good
knowledge of data science is required to build the high-performing ML models.
4. TPOT
5. DataRobot
DataRobot is one of the best AutoML tools platforms. It provides complete
automation by automating the ML pipeline and supports all the steps required for
the preparation, building, deployment, monitoring, and maintaining the powerful AI
applications.
6. Auto-Sklearn
7. MLBox
MLBox also provides the powerful Python Library for automated Machine Learning.
With AutoML, a Machine learning enthusiast can use Machine learning or deep
learning models by using Python language. Moreover, below are the steps that are
automated by AutoML that occur in the Machine learning lifecycle or learning
process:
Pros of AutoML
o Performance: AutoML performs most of the steps automatically and gives a
great performance.
o Efficiency: It provides good efficiency by speeding up the machine learning
process and by reducing the training time required to train the models.
o Cost Savings: As it saves time and the learning process of machine learning
models, hence also reduces the cost of developing an ML model.
Cons of AutoML
o One of the main challenges of AutoML is that it is currently viewed as the
replacement/alternative of human knowledge & intervention. Similar to other
automation processes, AutoML is designed to perform the routine task
automatically with efficiency and accuracy to allow humans to focus only on a
complex task. Some routine tasks such as monitoring, analysis & problem
detection are much faster when done automatically. However, humans should
also be involved to supervised the model, but no need to involve in a step-by-
step process. Moreover, it is to help the human by enhancing the efficiency,
not to replace the human.
o AutoML is a comparatively new & developing field, and most of the popular
tools are not yet fully developed.
Applications of AutoML
AutoML shares common use cases with traditional machine learning. Some of these
include:
It's not only that. Machine Learning and Data Science generally are everywhere. Why?
Because data is everywhere!
Therefore, it's only natural that someone with an above-average brain and can
distinguish between Programming Paradigms by looking at Code is enthralled at the
prospect of Machine Learning.
What do we mean by Machine Learning? And how big is Machine Learning? Let's
explore Machine Learning, once and for all. Instead of presenting the technical specs,
we'll use the "Understand by Example" approach.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
This area is Computer Science and Artificial intelligence, which "learns" by studying
data without human intervention.
However, this notion is not without flaws. Due to this belief, when the term Machine
Learning is thrown around, it is usually thought of as "Artificial Intelligence" as well as
"Neural networks that are able to emulate Human brains (currently it isn't possible)"
or self-Driving cars and so on. However, Machine Learning is far beyond the scope.
We will explore some typical and some not generally thought of aspects in Modern
Computing where Machine Learning is at work.
Now, that we may have noticed, Machine Learning is everywhere. Everything from
Research and Development to improving the business for Small Companies. It's all
over. This makes for a great career opportunity since the field is growing and is that
will not end anytime very soon.
Technological Singularity:
Although this topic attracts lots of attention from the many public, scientists are not
interested in the notion of AI exceeding humans' intelligence anytime in the
immediate future. This is often referred to as superintelligence and superintelligence,
which Nick Bostrum defines as "any intelligence that far surpasses the top human
brains in virtually every field, which includes general wisdom, scientific creativity and
social abilities." In spite of the fact that the concept of superintelligence and strong
AI isn't a reality in the world, the concept poses some interesting questions when we
contemplate the potential use of autonomous systems, such as self-driving vehicles.
It's impossible to imagine that a car with no driver would never be involved in a car
accident, but who would be accountable and accountable in those situations? Do we
need to continue to explore autonomous vehicles, or should we restrict the use of
this technology to produce semi-autonomous cars that encourage the safety of
drivers? The jury isn't yet out on this issue. However, these kinds of ethical debates
are being fought as new and genuine AI technology is developed.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
AI Impact on Jobs:
While the majority of public opinion about artificial intelligence revolves around job
loss, the issue should likely be changed. With each new and disruptive technology,
we can see shifts in demand for certain job positions. For instance, when we consider
the automotive industry, a lot of manufacturers like GM are focusing their efforts on
electric vehicles to be in line with green policies. The energy sector isn't going away,
but the primary source that fuels it is changing from an energy economy based on
fuel to an electrical one. Artificial intelligence must be seen as a way to think about it,
as artificial intelligence is expected to shift the need for jobs to different areas. There
will be people who can control these systems as data expands and changes each day.
It is still necessary resources in order to solve more complicated issues within sectors
that are more likely to suffer from demand shifts, including customer service. The
most important element of artificial intelligence and its impact on the employment
market will be in helping individuals adapt to the new realms that are a result of the
market.
Privacy:
Privacy is often frequently discussed in relation to data privacy security, data
protection, and security. These concerns have helped policymakers advance their
efforts recently. For instance, in 2016, GDPR legislation was introduced to safeguard
the personal information of individuals within Europe's European Union and
European Economic Area, which gives individuals more control over their data.
Within the United States, individual states are creating policies, including the
California Consumer Privacy Act (CCPA), that require companies to inform their
customers about the processing of their data. This legislation is forcing companies to
think about how they handle and store personally identifiable information (PII). In the
process, security investments have become a business priority to remove any
potential vulnerabilities or opportunities to hack, monitor, and cyber-attacks.
Discrimination and bias aren't just limited to the human resource function. They are
present in a variety of applications ranging from software for facial recognition to
algorithms for social media.
Accountability:
There isn't a significant law to control AI practices. There's no mechanism for
enforcement to make sure that ethical AI is being used. Companies' primary
motivations to adhere to these standards are the negative effects of an
untrustworthy AI system on their bottom lines. To address the issue, ethical
frameworks have been developed in a partnership between researchers and ethicists
to regulate the creation and use of AI models. But, for the time being, they only serve
as a provide guidance the development of AI models. Research has shown that
shared responsibility and insufficient awareness of potential effects aren't ideal for
protecting society from harm.
Difference between Model Parameter
and Hyperparameter
For a Machine learning beginner, there can be so many terms that could seem
confusing, and it is important to clear this confusion to be proficient in this field. For
example, "Model Parameters" and "Hyperparameters". Not having a clear
understanding of both terms is a common struggle for beginners. So, in order to
clear this confusion, let's understand the difference between parameter and
hyperparameter and how they can be related to each other.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1. y= mx+c
Where m is the slope of the line, and c is the intercept of the line. These two
parameters are calculated by fitting the line by minimizing RMSE, and these are
known as model parameters.
Some examples of Hyperparameters are the learning rate for training a neural
network, K in the KNN algorithm, etc.
These are specified or estimated while These are set before the beginning of the
training the model. training of the model.
These are learned & set by the model These are set manually by a machine
by itself. learning engineer/practitioner.
These are dependent on the dataset, These are independent of the dataset.
which is used for training.
Conclusion
In this article, we have understood the clear definitions of Model Parameters are
Hyperparameters and the difference between both of them. In brief, Model
parameters are internal to the model and estimated from data automatically,
whereas Hyperparameters are set manually and are used in the optimization of the
model and help in estimating the model parameters.
Hyperparameters in Machine
Learning
Hyperparameters in Machine learning are those parameters that are explicitly
defined by the user to control the learning process. These hyperparameters are
used to improve the learning of the model, and their values are set before starting
the learning process of the model.
In this topic, we are going to discuss one of the most important concepts of machine
learning, i.e., Hyperparameters, their examples, hyperparameter tuning, categories of
hyperparameters, how hyperparameter is different from parameter in Machine
Learning? But before starting, let's first understand the Hyperparameter.
Here the prefix "hyper" suggests that the parameters are top-level parameters that
are used in controlling the learning process. The value of the Hyperparameter is
selected and set by the machine learning engineer before the learning algorithm
begins training the model. Hence, these are external to the model, and their
values cannot be changed during the training process.
Backward Skip 10sPlay VideoForward Skip 10s
Model Parameters:
Model parameters are configuration variables that are internal to the model, and a
model learns them on its own. For example, W Weights or Coefficients of
independent variables in the Linear regression model. or Weights or
Coefficients of independent variables in SVM, weight, and biases of a neural
network, cluster centroid in clustering. Some key points for model parameters are
as follows:
Model Hyperparameters:
Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process. Some key points for model parameters are as follows:
o These are usually defined manually by the machine learning engineer.
o One cannot know the exact best value for hyperparameters for the given
problem. The best value can be determined either by the rule of thumb or by
trial and error.
o Some examples of Hyperparameters are the learning rate for training a
neural network, K in the KNN algorithm,
Categories of Hyperparameters
Broadly hyperparameters can be divided into two categories, which are given below:
Note: Learning rate is a crucial hyperparameter for optimizing the model, so if there is a
requirement of tuning only a single hyperparameter, it is suggested to tune the learning
rate.
o Batch Size: To enhance the speed of the learning process, the training set is
divided into different subsets, which are known as a batch. Number of
Epochs: An epoch can be defined as the complete cycle for training the
machine learning model. Epoch represents an iterative learning process. The
number of epochs varies from model to model, and various models are
created with more than one epoch. To determine the right number of epochs,
a validation error is taken into account. The number of epochs is increased
until there is a reduction in a validation error. If there is no improvement in
reduction error for the consecutive epochs, then it indicates to stop increasing
the number of epochs.
o A number of Hidden Units: Hidden units are part of neural networks, which
refer to the components comprising the layers of processors between input
and output units in a neural network.
It is important to specify the number of hidden units hyperparameter for the neural
network. It should be between the size of the input layer and the size of the output
layer. More specifically, the number of hidden units should be 2/3 of the size of the
input layer, plus the size of the output layer.
For complex functions, it is necessary to specify the number of hidden units, but it
should not overfit the model.
Conclusion
Hyperparameters are the parameters that are explicitly defined to control the
learning process before applying a machine-learning algorithm to a dataset. These
are used to specify the learning capacity and complexity of the model. Some of the
hyperparameters are used for the optimization of the models, such as Batch size,
learning rate, etc., and some are specific to the models, such as Number of Hidden
layers, etc.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
It involves data exploration and pattern matching with minimal human intervention.
There are mainly four technologies that machine learning used to work:
1. Supervised Learning:
Supervised Learning is a machine learning method that needs supervision similar to
the student-teacher relationship. In supervised Learning, a machine is trained with
well-labeled data, which means some data is already tagged with correct outputs. So,
whenever new data is introduced into the system, supervised learning algorithms
analyze this sample data and predict correct outputs with the help of that labeled
data.
It is classified into two different categories of algorithms. These are as follows:
This technology allows us to collect or produce data output from experience. It works
the same way as humans learn using some labeled data points of the training set. It
helps in optimizing the performance of models using experience and solving various
complex computation problems.
2. Unsupervised Learning:
Unlike supervised learning, unsupervised Learning does not require classified or well-
labeled data to train a machine. It aims to make groups of unsorted information
based on some patterns and differences even without any labelled training data. In
unsupervised Learning, no supervision is provided, so no sample data is given to the
machines. Hence, machines are restricted to finding hidden structures in unlabeled
data by their own.
3. Semi-supervised learning:
Semi-supervised Learning is defined as the combination of both supervised and
unsupervised learning methods. It is used to overcome the drawbacks of both
supervised and unsupervised learning methods.
Speech analysis, web content classification, protein sequence classification, and text
documents classifiers are some most popular real-world applications of semi-
supervised Learning.
4. Reinforcement learning:
Reinforcement learning is defined as a feedback-based machine learning method
that does not require labeled data. In this learning method, an agent learns to
behave in an environment by performing the actions and seeing the results of
actions. Agents can provide positive feedback for each good action and negative
feedback for bad actions. Since, in reinforcement learning, there is no training data,
hence agents are restricted to learn with their experience only.
Machine learning has several practical applications that drive the kind of real
business results - such as time and money savings - that have the potential to
dramatically impact the future of your organization. In particular, we see tremendous
impact occurring within the customer care industry, whereby machine learning is
allowing people to get things done more quickly and efficiently. Through Virtual
Assistant solutions, machine learning automates tasks that would otherwise need to
be performed by a live agent - such as changing a password or checking an account
balance. This frees up valuable agent time that can be used to focus on the kind of
customer care that humans perform best: high touch, complicated decision-making
that is not as easily handled by a machine. At Interactions, we further improve the
process by eliminating the decision of whether a request should be sent to a human
or a machine: unique Adaptive Understanding technology, the machine learns to be
aware of its limitations, and bailout to humans when it has low confidence in
providing the correct solution.
Conclusion:
Machine Learning is directly or indirectly involved in our daily routine. We have seen
various machine learning applications that are very useful for surviving in this
technical world. Although machine learning is in the developing phase, it is
continuously evolving rapidly. The best thing about machine learning is its High-
value predictions that can guide better decisions and smart actions in real-time
without human intervention. Hence, at the end of this article, we can say that the
machine learning field is very vast, and its importance is not limited to a specific
industry or sector; it is applicable everywhere for analyzing or predicting future
events.
Machine Learning helps users make predictions and develop algorithms that can
automatically learn by using historical data. However, various machine learning
algorithms such as Linear Regression, Logistic Regression, SVM, Decision Tree,
Naïve Bayes, K-Means, random forest, Gradient Boosting algorithms, etc.,
require a massive amount of storage that become pretty challenging for a data
scientist as well as machine learning professionals. Cloud computing becomes a
game-changer for deploying machine learning models in such situations. Cloud
computing helps to enhance and expand machine learning applications. The
combination of machine learning and cloud computing is also known as
intelligent Cloud.
This article will discuss machine learning and cloud computing, the advantages of ML
using the Cloud, applications of ML algorithms using Cloud, and much more. So, let's
start with a quick introduction to Machine Learning and Cloud computing.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Supervised
o Unsupervised
o Semi-supervised
o Reinforcement
The primary aim of Machine Learning is to provide the capability to computers learn
automatically without human intervention or assistance and adjust actions
accordingly.
1. Cloud works on the principle of 'pay for what you need'. The Cloud's pay-per-
use model is good for companies who wish to leverage ML capabilities for
their business without much expenditure.
2. It provides the flexibility to work with machine learning functionalities without
having advanced data science skills.
3. It helps us ease of experiment with various ML technologies and scales up as
projects go into production and demand increases.
There are so many cloud service providers that offer lots of ML technologies for
everyone without having prior knowledge of AI and ML.
o Amazon SageMaker: This product primarily helps to create and train machine
learning models.
o Amazon Forecast: This product helps increase the forecast accuracy of ML
models.
o Amazon Translate: It is used to translate languages in NLP and ML.
o Amazon Personalize: This product creates various personal
recommendations in the ML system.
o Amazon Polly: It is used to convert text into a speech format.
o AWS Deep Learning AMI's: This product is primarily used to solve deep
learning problems in ML.
o Amazon Augmented AI: It implements human review in ML models.
2. Microsoft Azure:
Microsoft Azure is also a popular cloud computing platform offered by Microsoft in
2010. It is popular among data scientists and machine learning professionals for data
analytics requirements.
There are some Microsoft Azure products available for machine learning as follows:
3. Google Cloud
Google Cloud or Google Cloud Platform is a cloud computing platform that is a
subsidiary of Tech Giant Google developed in 2008. It provides its infrastructure to
customers for developing machine learning models overcloud.
There are a few Google Cloud products available for machine learning as follows:
o Google Cloud Vision AI: This product allows machine learning applications to
easily integrate vision detection features such as image labeling, text
detection, face detection, tagging, etc.
o Google Cloud AI Platform: This product helps develop, sample, and manage
machine learning models.
o Google Cloud Text-to-Speech: This product helps transmit text data into
speech format for training machine learning models.
o Google Cloud Speech-to-Text: This is also one of the important products
that support 120+languages for transmitting speech data into text format.
o Google Cloud AutoML: It helps train a machine learning model and generate
automating machine learning models.
o Google Cloud Natural Language: This product is used in NLP to analyze and
classify text.
4. IBM Cloud:
IBM Cloud (formerly known as Bluemix) is also one of IBM's most popular open-
source cloud computing platforms. It includes various cloud delivery models that are
public, private, and hybrid models.
There are a few IBM Cloud products available for machine learning as follows:
IBM Watson Studio: This product helps develop, run, and manage machine learning
and Artificial Intelligent models.
IBM Watson Natural Language Understanding: It helps us analyze and classify text
in NLP.
IBM Watson Speech-to-Text: As the name suggests, this product is responsible for
converting speech or voice instructions into text format.
IBM Watson Assistant: This product is used for creating and managing the personal
virtual assistant.
IBM Watson Visual Recognition: it helps machine learning search visual images
and classify them.
o Binary Prediction
o Category Prediction
o Value Prediction
Binary Prediction:
In this type of machine learning prediction, we get responses either as true or false.
Binary predictions are useful for credit card fraud detections, order processing,
recommendation systems, etc.
Category Prediction:
These machine learning predictions are responsible for categorizing a dataset based
on experience. For instance, insurance companies use category prediction to
categorize different types of claims.
Value Prediction:
This type of prediction finds patterns within the accumulated data by using learning
models to show the quantitative measure of all the likely outcomes. It helps to
predict the future sale of products in a manufacturing industry.
Business intelligence:
Business intelligence primarily focuses on improving and making better decisions
making for businesses. Machine learning is a process of automated decision making,
and on the other end, business intelligence is used to understand, organize and
improve that decision making. Further, cloud computing deals with a large amount
of data used to train machine learning models; hence business intelligence becomes
important to store raw data. Further, this unstructured data is transformed into a
structured format using manipulation, transformation, and classification techniques.
These structured data sets are referred to as data warehouses.
Business analysts work on exploring structured data sets using some data
visualization techniques. These techniques are used to create visual dashboards,
which help in understanding information to others. The panels help to analyze and
understand past performance and are used to adapt future strategies to improve
KPIs (Key Business Indicators).
Personal Assistant:
Personal virtual assistant becomes mandate for developing an organization's
business as it provides support to their customer like a human. Nowadays, all
industries such as banking, healthcare, education, infrastructure, etc., are
implementing these chatbots or personal virtual assistants in their businesses to
perform multiple tasks.
Although they are still in their developing phase and require more improvement,
they still reduce the burden to resolve common customer problems using some
frequently asked questions. Cortana, SIRI, and Alexa are such most popular chatbots.
AI-as-a-Service:
Nowadays, all big cloud companies are providing AI facilities using AI-as-a-service
platforms. Open-source AI functionalities are quite cheaper when deployed in cloud.
These services provide Artificial Intelligence and machine learning functionalities, and
build the capacity of cognitive computations and make the system more intelligent.
It helps to make the system relatively fast and efficient.
Conclusion
Machine Learning with cloud computing is very crucial for next-generation
technologies. The demand for machine learning is continuously increasing with cloud
computing as it offers an ideal environment for machine learning models having a
large amount of data. Further, it can be used to train new systems, identify the
pattern, and make predictions. The Cloud offers a scalable, on-demand environment
to collect, store, curate, and process data.
As well, all cloud service providers realize the importance of machine learning in the
Cloud; it is increasing the demand of Cloud based ML models to small, mid, and
large organizations. Machine learning and cloud computing are mutually exclusive to
one another. If machine learning helps cloud computing to make more enhanced,
efficient, and scalable, then on the other end, cloud computing also expands the
horizon for machine learning applications. Hence, we can say Ml and cloud
computing are intricately interrelated and used together; they can also give
tremendous results.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Placement: This is the step where money obtained from illegal sources puts
into financial institutions for the first time.
o Layering: In these steps, money launders make various layers by dividing
money into multiple bank accounts to confuse the banking analysts and ML
algorithms, so they cannot identify the actual source of laundering.
o Integration: This is the final step in sending this layered money to the money
launder's account.
Anti Money Laundering (AML) using
Machine Learning Applications:
Machine Learning plays a significant role in preventing money laundering activities in
financial industries. To prevent money laundering, it uses a supervised machine
learning technique in which an ML model is trained with various types of data or
trends to identify the alerts and suspicious transactions flagged by the internal
banking system. These machine learning models help identify these suspicious
transactions, sender, and beneficiary financial records, their pattern of making
transactions using transaction history, etc.
Machine Learning algorithms help in AML and reduce human error to a great extent.
Machine learning models use a few techniques to prevent money laundering.
Natural Language Processing (NLP) helps machines process human language and
identify alerts, process mortgage loans, negative news screening, payments
screening, etc. Further, these machine learning technologies help monitor various
suspicious activities and transaction monitoring. ML teaches machines to detect and
identify the transaction patterns, behavior, associated suspicious users/accounts, and
classification of alerts based on their risk categories such as High risk, medium risk,
and low risk. Further, it checks alerts, automatically clears some alerts and makes
accounts fully operational based on their account behavior and required documents.
Machines can be taught to recognize, score, triage, enrich, close, or hibernate alerts.
However, these processes are very complex for humans and time-consuming, but
with the help of machine learning technologies, they become relatively easier than
the classical approach. Natural Language Generation (NLG) helps fill Suspicious
Activity Reports (SAR) and provides the narratives for the same. This way can reduce
dependencies on human operators to perform routine tasks, reduce the total time it
takes to triage alerts, and allow personnel to focus on more valuable and complex
activities.
With the introduction of ML into AML TM alert triage, SAR conversion rates should
improve from the current unacceptable rate of ~1% in the banking sector.
Why use Machine Learning in Anti Money
laundering (AML)
Machine Learning is widely used in the banking and finance industry, and AML is one
of the best examples of using machine learning. There are a few important reasons
that show machine learning plays a vital role as follows:
Machine learning helps identify and detect 98% of the false positives in the AML
process, while compliance teams estimate only 1% and 2% of AML alerts. In the AML
process, some alerts are generated wrongly that affect the customer's account by
putting some restrictions. However, these alerts should not be triggered on the user's
account. Machine learning helps to reduce the rate of false positives by using
semantic analysis and statistical analysis to identify the risk factors that lead to true
positive results. Machine learning algorithms help in eliminating these false positives
during the transactions monitoring process.
Machine Learning teaches computers past transactions and their Profileprofile that
helps detect the customer behavior. These machines first learn with old data and
then analyze it according to the customer's transaction history. According to their
transaction behavior/patterns, these machines detect all suspicious activities and
associated users who all were associated with any suspicious activity in the past.
Using traditional approaches of finding customer behavior is not accurate and time-
consuming; Machine Learning technology has reduced the chances of human errors.
Also, it reduces the investigation time by monitoring customer transactions using
rule engines.
Hence, machine learning makes this process relatively faster because money
launderers are generally one step ahead.
Banking and financial institutions analyze customer data such as KYC, screening,
residence country, professions, politically exposed person (PEP) status, social status,
etc., to check their behavior. These all are the main factors that affect the business of
any financial institution. To reduce the financial risk, financial institutions use many
external datasets such as LinkedIn, Bloomberg, BBL, Norkom, social networks,
company houses, and other open-source data.
BBL and Norkom are the software that helps find matches or name search using
external data and tells computers if any customer is associated with any
fraud/suspicious activity, PEP, high-risk entity. Hence, The NLP replaces these
classical approaches and helps to analyze this unstructured data and establish the
connection.
Hence, Machine learning technologies help to analyze the unstructured data and
external data in a significant manner in comparison to classical methods with greater
accuracy.
RPA plays a significant role in the banking and finance sectors. So many banks are
still adapting the RPA to automate their business process. Further, When RPA is
combined with Machine Learning, it becomes more powerful. It provides intelligent
automatic techniques in different banking operations such as Know your customer
(KYC), transaction monitoring, screening, alert elimination, etc.
These challenges include data quality management (poor data quality), profile
refresh, lack of 360-degree view of the customer, insufficient knowledge of banking,
finance, and AML process such as know your customer (KYC), limited regulatory
appetite, lack of straightforward processes to follow for machine learning
implementations.
Data quality management is one of the most important factors for implementing
machine learning applications in AML. It is required for both monitoring as well as for
analytics purposes. Lack of data traceability and data lineage is also found in both
static and dynamic customer profile records. Static data can be like KYC documents,
and dynamic data may be their incoming and outgoing transactions.
Sometimes, it is also found that a few alerts are generated wrongly on customer
accounts, i.e., false-positive, but actually, they are not likely to be generated on the
account. This may lead to various types of restrictions on customers' accounts and
affect the entire business. These issues reduce the reoccurrence of noise or false
positives on the user's account. Further, other techniques are also applicable instead
of using these methods, such as large-scale, one-off data reconciliation or refresh
exercises, etc. Many FIs have undertaken large and costly data remediation projects
to improve data and have implemented frameworks to manage data quality during
the last few years. Hence, financial specialists always find data quality a major issue.
On the other hand, profile refresh can also be a significant solution for managing
quality data. Relationship managers and back-end associates can use profile refresh
within a certain duration by reaching out to customers and validating their
documents.
Machine Learning is a very new technology in the market, and there are very few ML
engineers and professionals in the industry. Further, a lack of knowledge in banking
and financial operations has also been seen in analysts, leading to various major
problems from start-ups and established vendors. This is one of the most common
factors found while implementing machine learning in AML and other banking
operations.
The regulators need an ideal ML model that includes all choices, limitations, and
results in the documented format before implementing it in the AML process. ML
algorithms do not allow results to be reproduced with a given input, but regulators
expect the result to be reproduced while implementing in the AML process. Some
regulators want intelligent and adaptive solutions for transaction monitoring that
have become a complex scenario for ML learning applications.
Machine learning is a very new technology, and it is even under development. Hence,
there are a few established, straightforward processes to follow to implement it.
Teaching systems to detect certain types of financial crime can be tricky without
knowing what to look for. For example, how does one teach a system to recognize
terrorist financing? There is a carousel process for fraud but nothing similar for
terrorist financing (nothing that is, other than name matching against terrorist lists).
While some of these problems are better suited to unsupervised learning, model
validators should be sure about the desired outcomes.
Conclusion
Anti-money laundering is a broad field in the banking and financial industry, and this
is one of the most important key factors in preventing the illegal flow of money.
Machine Learning plays a significant role in the AML process to get better results
with greater efficiency and effectiveness. Although many financial institutions also
adopt automation like Robotics Process Automation (RPA) in their business process,
some belief in machine learning and artificial intelligence to run their business.
However, robotics can train ML models, and ML models help robotics build strong
decision-making (in the form of NLP) or reading (via optical character recognition).
Data Science Vs. Machine Learning
Vs. Big Data
Data Science, Machine Learning, and Big Data are all buzzwords in today's time. Data
science is a method for preparing, organizing, and manipulating data to perform
data analysis. After analyzing data, we need to extract the structured data, which is
used in various machine learning algorithms to train ML models later. Hence, these
three technologies are interrelated with each other, and together they provide
unexpected outcomes. Data is the most important key player in this IT world, and all
these technologies are based on data.
Data Science, Machine Learning, and Big Data are all the hottest technologies in the
entire world and growing exponentially. All big, as well as small-size companies, are
now looking for IT professionals who can shift through the goldmine of data and
help them drive smooth business decisions efficiently. Data science, Big Data, and
machine learning are crucial terms that help businesses to grow and develop as per
the current competitive situation. In this topic, "Data Science vs. Machine Learning
vs. Big Data", we will discuss the basic definition and required skills to learn them.
Also, we will see the basic difference between Data Science, ML, and Big data. So,
let's start with a quick introduction of all one by one.
It helps the systems to learn from sample/training data and predicts results by
teaching itself with various algorithms. An ideal machine learning model does not
require human intervention too; however, still, such ML models are not in existence.
The use of Machine Learning can be seen in various sectors such as healthcare,
infrastructure, science, education, banking, finance, marketing, etc.
Below are a few skills sets that you should have to build a career in this domain:
o In-depth knowledge of computer science and fundamentals.
o Strong programming skills such as Python, Java, R, etc.,
o Basic Mathematical knowledge like probability and statistics
o Knowledge of Data Modelling.
Big Data is used to store, analyze and organize the huge volume of structured as well
as unstructured datasets. Big Data can be described mainly with 5 V's as follows:
o Volume
o Variety
o Velocity
o Value
o Veracity
When it comes to the difference between Data science and machine learning
technologies, Drew Conway's Venn Diagram is the best option to understand this.
In the above diagram, there are three primary sections that everyone must have
a look at. These are as follows:
Hacking Skill: These are the skills such as organizing data, learning vectorized
operations, and thinking algorithmically like a computer that makes a skilled data
hacker.
Maths and Statistics Knowledge: After storing and cleaning data, we must know
appropriate mathematical and statistical methods. You must have a good
understanding of ordinary least squares regression.
Substantive Expertise: This is also an important common term that helps you to
erase all your confusion.
Below is the difference table between data science and machine learning.
Data science is a field of computer science to Machine Learning is a subset of Artificial Intellig
extracts useful data from structured, that helps to make computers capable of predic
unstructured, and semi-structured data. outcomes based on training from old data/experie
It primarily deals with data. Machine Learning uses data to learn from it
predict insights or results.
Data in Data Science maybe or maybe not It includes various technologies like superv
have evolved from a machine or mechanical unsupervised, semi-supervised and reinforcem
process. learning, regression, clustering, etc.
It includes various data operations such as It includes operations such as data preparation,
cleaning, collection, manipulation, etc. wrangling, data analysis, training the model, etc.
Below is the table to understand the difference between Machine Learning and Big
Data.
It deals with using more data as input and It deals with extraction as well as
algorithms to predict future outcomes analysis of data from a large number
based on trends. of datasets.
It uses tools such as Numpy, Pandas, Scikit It requires tools like Apache Hadoop
Learn, TensorFlow, Keras, etc., to analyze MongoDB.
datasets.
Machine Learning can learn from training Big Data analytics pulls raw data and
data and act intelligently for making looks for patterns to help in stronger
effective predictions by teaching itself decision-making for the firms.
using Algorithms.
Machine Learning is helpful for providing Big Data is helpful for handling
virtual assistance, Product different purposes, including Stock
Recommendations, Email Spam filtering, Analysis, Market Analysis, etc.
etc.
The scope of machine learning is much The scope of big data is not limited
vast such as improving quality of to collecting a huge amount of data
prediction, building strong decision- only but also to optimizing data for
making capability, cognitive analysis, analysis as well.
improving healthcare services, speech and
text recognition, etc.
Big Data is used to store, analyze and organize the huge volume of structured as well
as unstructured datasets. Big Data can be described mainly with 5 V's such as
Volume, Variety, velocity, value, and Veracity.
Data Science: Data science is the study of working with a huge volume of data and
enables data for prediction, prescriptive, and prescriptive analytical models. It helps
to discriminate useful and raw data/insights from the vast amount of data sets using
various scientific methods, algorithms, tools, and processes. It includes digging,
capturing, analyzing, and utilizing the data from a vast volume of datasets.
Let's discuss some major differences between Data Science and Big Data in the
below table.
The main aim of data science is to build The main goal of big data is to extract
data-based products for firms. useful information from the huge
volume of data and use it for building
products for firms.
It broadly focuses on the science of the It is more involved with the processes
data. of handling voluminous data.
Conclusion:
Machine learning, data science, and Big data are all the most popular technologies,
which are widely being used in the entire world. Although these technologies have
their significance individually, when combining them, they became more powerful to
work on models/projects. Big data technology is a huge source of data, Data science
is a technology that extracts useful insights from big data, and this useful information
is used in machine learning for teaching machines or computers to predict future
results based on past experience and build strong decision-making capability.
Have you ever thought about why you get product recommendations from various
online platforms such as Amazon, Netflix, Flipkart, etc.? The short answer is Machine
Learning. It became the most popular buzzword today in all technologies, and the
entire 21th century, as well as the upcoming generation, is going to use machine
learning technology for their businesses. All small and big companies, including
Facebook, Google, Amazon, IBM, Oracle, etc., employ machine learning technologies
to run and grow their business. So, don't worry! You are exactly in the right place.
Although machine learning is used everywhere, the main problem is the platforms
that support machine learning services. This article will discuss some of the most
popular machine learning platforms that'll help you manage your experiments at
every stage, such as preparing data for deployment, monitoring, and managing
machine learning models. So let's start with a quick introduction to Machine learning
first.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
It helps build blocks to solve the various ML and data science problems. It provides a
suitable environment for users to get complete freedom to deploy their products.
We will discuss a few most popular machine learning platforms for deploying ML
models.
o Amazon Sagemaker
o TIBCO Software
o Alteryx Analytics
o SAS
o H2O.ai
o DataRobot
o RapidMiner
1. Amazon SageMaker
Amazon SageMaker is an Amazon Web Services (AWS) entity that helps data
scientists and ML experts prepare, build, train, and deploy high-quality ML models. It
provides one-click deployment support for various open-source models such as NLP,
object detection, image classification, etc.
Top Features:
2. Alteryx Analytics
Alteryx is the best data science platform that accelerates digital transformation. It
offers data accessibility and data science processes. It enables you to do complex
things with data without having prior experience in coding and data mining
techniques.
TIBCO data science allows the user to prepare data and build, deploy, and monitor
the model. It is widely known for use cases, such as product refinement and business
exploration.
Features of TIBCO:
o It enables users to easily and quickly connect applications and APIs using the
browser.
o It provides the services like metadata management, data catalog, data
governance, etc.
o It facilitates users' actionable intelligence in real-time.
o It helps to build smart apps with a single click.
o It supports cloud messaging for reliable and secure data distribution.
o It reduces decision latency to a greater extent and acts in real-time.
4. SAS
SAS provides advanced data science and data analytics software that helps ease-of-
access data facility irrespective of source and format of data.
Features of SAS:
o It offers a visual interface for data analytics. It allows users to explore data
within the model studio.
o You can access training data within the model studio from each node.
5. ai
H2O.ai offers various facilities and functionalities of Artificial Intelligence and data
science. It supports a highly scalable elastic environment for the AI life cycle.
Like SAS, it is also an open-source platform that deals with distributed in-memory ML
platforms with linear scalability.
Make: It helps build Ml models and applications with more accuracy, speed, and
transparency.
Features of H2O.ai
o H2O is the open source leader in AI, which aims to democratize AI.
o It supports the facility of building responsible AI models and applications.
o It also helps build explainable AI models with greater transparency,
accountability, and trustworthiness in AI.
o It provides automatic feature recommendation, drift, insights, versioning,
metadata, rank and bias identification, etc.
6. DataRobot
DataRobot is an AI cloud platform that helps build, prepare, deploy, predict, monitor,
and optimize industry data models.
DataRobot in MLOps:
Features of DataRobot
7. RapidMiner
RapidMiner is one of the most popular multimodal predictive analytics, Machine
Learning, and end-to-end data science solution platform. It is used to optimize
decision-making. It offers a variety of sophisticated, flexible approaches that will turn
the data into insights that can be used to overcome challenges and achieve unique
goals. It has extensive experience in all major industries such as manufacturing,
energy, utilities, automotive, healthcare, financial services, insurance, life science,
communication, travel, transport, logistics, etc.
Features of RapidMiner
Conclusion
With data science and big data, machine learning became more powerful among
data scientists and professionals. These machine learning platforms play a significant
role in developing and deploying ML models. This software is the key player for
growing your business and customer satisfaction and support. If you want to upskill
your organization, you can choose either of the above-given machine learning
platforms to smooth the run of your business.
As these technologies look similar, most of the persons have misconceptions about
'Deep Learning, Machine learning, and Artificial Intelligence' that all three are similar
to each other. But in reality, although all these technologies are used to build
intelligent machines or applications that behave like a human, still, they differ by
their functionalities and scope.
It means these three terms are often used interchangeably, but they do not quite
refer to the same things. Let's understand the fundamental difference between deep
learning, machine learning, and Artificial Intelligence with the below image.
With the above image, you can understand Artificial Intelligence is a branch of
computer science that helps us to create smart, intelligent machines. Further, ML is a
subfield of AI that helps to teach machines and build AI-driven applications. On the
other hand, Deep learning is the sub-branch of ML that helps to train ML models
with a huge amount of input and complex algorithms and mainly works with neural
networks.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In this article, "Deep Learning vs. Machine Learning vs. Artificial Intelligence", we will
help you to gain a clear understanding of concepts related to these technologies and
how they differ from each other. So, let's start this topic with each technology
individually.
Mr. John McCarthy is known as the godfather of this amazing invention. There are
some popular definitions of AI, which are as follows:
"A computer system able to perform tasks that normally require human intelligence,
such as visual perception, speech recognition, decision-making, and translation
between languages."
1. Reactive machine
2. Limited memory
3. Theory of Mind
4. Self-awareness
o Language Translations
o AI in healthcare
o Speech recognition, text recognition, and image recognition
o AI in astronomy
o AI in gaming
o AI in finance
o AI in data security
o AI in social media
o AI in travel and transport
o AI in Automotive Industry
o AI in robots
o AI in Entertainment, agriculture, E-commerce, education, etc.
We have taken a basic knowledge of Artificial Intelligence. Now, let's discuss the
basic understanding of Machine Learning.
However, firstly, machine learning access a huge amount of data using data pre-
processing. This data can be either structured, semi-structured, or unstructured.
Further, this data is fed through some techniques and algorithms to machines, and
then based on previous trends; it predicts the outputs automatically.
After understanding the working of machine learning models, it's time to move on to
types of machine learning.
o Data gathering
o Data pre-processing
o Choose model
o Train model
o Test model
o Tune model
o Prediction
We have discussed machine learning and artificial intelligence basics, and it's time to
move towards the basics of deep learning.
Deep Learning is a set of algorithms inspired by the structure and function of the
human brain. It uses a huge amount of structured as well as unstructured data to
teach computers and predicts accurate results. The main difference between machine
learning and deep learning technologies is of presentation of data. Machine learning
uses structured/unstructured data for learning, while deep learning uses neural
networks for learning models.
Deep learning can be useful to solve many complex problems with more accurate
predictions such as image recognition, voice recognition, product
recommendations systems, natural language processing (NLP), etc.
Conclusion
Artificial intelligence is one of the most popular 5 th generation technologies that is
changing the world using its subdomains, machine learning, and deep learning. AI
helps us to create an intelligent system and provide cognitive abilities to the machine.
Further, machine learning enables machines to learn based on experience without
human intervention and makes them capable of learning and predicting results with
given data. At the same time, deep learning is the breakthrough in the field of AI that
uses various layers of artificial neural networks to achieve impressive outputs for
various problems such as image recognition and text recognition. Hence, after reading
this topic, you can say there is no confusion to differentiate these terms that most
people face. This topic must have given you enough confidence to understand the
basic difference between artificial intelligence (AI), machine learning (ML), and deep
learning (DL).
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Supervised learning
o Unsupervised learning
o Reinforcement learning
1. ML in Warfare Platforms
2. ML in Cyber security
3. ML in Logistics and Transportation
4. ML in Target Recognition and tracking
5. ML in Battlefield Healthcare
6. ML in defense Combat Training
7. ML in Threat Monitoring
8. ML in Maritime situational awareness
9. ML in Unmanned sensor systems: UAVs, UGVs, UUVs
10. ML in Unattended sensors and systems
11. ML in Compound security and force protection
12. Border protection
13. Route planning clearance
14. Reconnaissance and surveillance
15. Vehicle situation awareness
16. Improved visualization
o Warfare Platforms
Machine Learning and Artificial Intelligence are being embedded into
Weapons and other military systems of different countries across the globe,
used on land, naval, airborne, and space platforms.
The application of AI-enabled systems on these platforms helps develop
efficient warfare systems, which require less human intervention. It also helps
increase synergy and enhances the performance of warfare systems while
requiring less maintenance. AI and ML are expected to empower autonomous
and high-speed weapons to perform collaborative attacks.
o Defense Cyber security
The military system of any country is one of the most important parts to
maintain the security of the whole nation. Hence military/defense systems are
most sensitive to cyberattacks, as it can lead to loss of crucial information of
military and can also damage the whole system.
However, AI and ML embedded systems can automatically protect networks,
computers programs, and data from any kind of unauthorized access. Further,
ML-enabled web security systems can record the pattern of cyberattacks and
develop counter-attack tools to tackle them.
o Logistics & Transportation
Machine Learning plays a crucial role in defense logistics and transportation
systems. For each successful military operation, it is required to effective
transportation of essential components of a military such as goods, weapons,
ammunition, etc.
Embedding AI/ML with a military transportation system can reduce
transportation costs and also human operational efforts.
Recently, the US Army collaborated with IBM to use its Watson artificial
intelligence platform to help pre-identify maintenance problems in Stryker
combat vehicles.
o Target Recognition and Tracking
Machine learning and artificial intelligence are also involved in enhancing the
accuracy of target recognition in complex combat environments. These
techniques allow defense forces to gain an in-depth understanding of
potential operation areas by analyzing reports, documents, news feeds, and
other forms of unstructured information.
o Battlefield Healthcare
Machine learning and Artificial intelligence help in battlefield healthcare such
as evacuation activities, remote surgical systems, etc. In war zones, various
robotics surgical system and robotics ground platforms are equipped with ML
technologies helps in difficult medical diagnosis and handling injuries in
combat situations.
o Defense combat Training
Machine learning enables computers or machines to train the troopers with
various combat systems deployed in various military operations in warzones. It
provides stimulation and training with various software engineering skills that
help during a difficult situation. The USA is investing so much money in
simulation and training applications. Further, various countries use this ML-
equipped combat training system to train their soldiers instead of the classical
approach that requires more money and time as well. These modern
approaches are more efficient and also be adaptive.
Reinforcement learning helps in building a combat training system where they
learn by reward and punishment as feedback. This approach becomes more
significant in maintaining an enhanced training system for their individuals.
o Threat Monitoring
Threat monitoring is defined as a network monitoring solution/system, which
is dedicated to analyzing, evaluating, and monitoring an organization's
network and endpoints to prevent various security majors such as network
intrusion, ransomware, and other malware attacks.
The typical process of ML in detecting security threats is given in below image:
Machine Learning helps in threat detection through various detection
categories such as Configuration, Modeling, Indicator, and Threat Behavior. By
using sophisticated ML algorithms, computer systems are being trained to
detect malware, run pattern recognition, and detect the malware behaviors or
ransomware attacks before it enters the system. AI also plays a vital role in
developing an intelligent system for threat awareness, such as drones. These
drones are equipped with intelligent software and algorithms that enable
them to detect threats, analyze them, and prevent them from entering into
the system. All big countries like the USA, Russia, China, France, Britain, Japan,
India, etc., are investing huge amounts of money in making drones to detect
threats and target especially useful in remote areas.
o Anomaly detection
Anomaly detection is defined as an outlier's process which is used to identify
suspicious events, items, and observations that deviate from a dataset's
normal behavior. Anomaly detection is also significant to identify the pattern
of abnormality in data and later discriminate these patterns that differ from
the normal state, i.e., outliers. ML and AI help anomaly detection to find the
outliers data in a series of data. Supervised machine learning plays an
important role in pattern recognition in anomaly detection.
o Surveillance applications
Reconnaissance and Surveillance system has become a crucial part of any
country to collect and manage huge amount of defense data. These
applications use various sensors and continuously transmit a stream of
information through data networks to data centers. Data scientists analyze
that data and extract useful information from it. In this entire procedure,
machine learning (ML) helps data analysts to detect, analyze, organize and
manage data automatically.
o Decision-support system
A decision-support system is helpful for various industries in different
applications such as medical treatment, manufacturing, marketing, self-driven
equipment (drones), etc. Similarly, ML also helps to build enhanced decision-
support system for the defense sector, such as intelligent drones, automatic
cruise missiles, automatic weapons that takes decision in accordance with
suspicious objects. ML helps machines to make a decision by analyzing data
and proposing the best course of action for them.
o Border protection
The main goal of the defense sector is to protect their country from border
attacks by means of patrolling that region. Although soldiers are always
positioned to look out the border but nowadays, various smart sensors and
intelligent machines such as drones are playing a crucial role in the border
security system. These drones are equipped with various ML algorithms and
software that detect, analyze, and inform against any suspicious activity by
sending information to data centers. Hence, it is more useful in dangerous
situations where human intervention is not significant.
Conclusion
To this end, we can say that machine learning has become an essential part of the
modern defense system in comparison to conventional systems. Machine learning
and artificial intelligence enable military systems to handle a huge volume of data
more efficiently and improve combat systems with enhanced computing and
decision-making capabilities. AI and ML are being deployed in the entire defense
industry. The Governments and tech industries are continuously investing their
money and efforts to increase ML involvement in their defense sector to ensure
better security of their country inside and outside the borders.
As per the information published by Statista, the value of the global entertainment
and media market from 2011 to 2025 has increased to a great extent.
PlayNext
Unmute
Current Time 0:00
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
As per the reports, the value of the worldwide entertainment and media market fell
to two trillion U.S. dollars in the year 2020. However, the forecast for 2021 suggests
revenue will begin once more rise and surpass the pre-COVID levels, with a 2.2
trillion dollars result. The rapid growth in the media industry is primarily because
most people are using an online platform like YouTube, Facebook, and Netflix
instead of classical channels such as cable and radio FM.
There are a few important applications of machine learning in media with their
example. These are as follows:
Conclusion
By this topic, we have understood how AI and ML are useful for the media and
entertainment industries. Each media and entertainment industry is using AI/ML
applications to enhance their business and maximize profit. Big data also helps AI
and ML to provide a huge amount of data for training ML models because machine
learning needs a vast amount of data to train their models. The more effective data
an ML model will take, the more efficient result it will generate.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
A blockchain can store different types of information, but mainly this technology is
used behind cryptocurrencies such as Bitcoin.
Components of Blockchain
o Blocks: Each blockchain is made up of several blocks, where each block has
three elements:
o Data
o Nonce
o Hash
o Miners: Miners are used to create new blocks through mining.
Nodes: A node can be understood as a device that contains a copy of the blockchain.
For a complete transaction, there are different nodes, and each node owns a copy of
the blockchain.
By using ML to govern the blockchain, the security of the chain can be enhanced to a
great extent. Moreover, as Machine learning work better with lots of data, it can
generate a great opportunity to build better models by taking advantage of the
decentralised nature of blockchains.
2. Surveillance System
Security is an important concern of the people because of the increasing crime rate
in the present scenario. Machine learning and Blockchain technology can be used for
surveillance, where blockchain can be used for managing continuous data, and ML
can be used for analyzing the data.
3. Smart Cities
Nowadays, Smart cities are evolving day by day and helping people to enhance their
living standards by making their life easy. A smart city also involves machine learning
and blockchain technologies that play a crucial role. For example, a smart home
enabled with blockchain and Machine learning algorithms can be monitored easily
and can provide device personalization to each individual.
Taotao Wang, Soung Chang Liew, and Shengli Zhang authored a research paper,
where they presented how reinforcement learning can be used for optimizing
blockchain mining strategy for cryptocurrencies such as Bitcoin. In this paper, the
author shows a way to use a multidimensional RL algorithm that uses a Q-learning
technique for optimising cryptocurrency mining.
SiCaGCN is the system created by the researchers that identify the similarities
between a pair of code. It consists of components of neural networks and different
techniques of deep learning and the ML domain.
o Enhancing Security
Data in Blockchain is much more secured because of implicit encryption of the
system. It is the perfect system to store highly sensitive personal data, such as
personalized recommendations.
Although at its base, blockchain is secured, some applications or additional
layers that are using blockchain can be Vulnerable. For such a case, we can
take advantage of Machine learning. ML can help to predict the possible
breaches or security threats in blockchain apps.
o Managing the data Market
Different big companies such as Google, Facebook, LinkedIn, etc., have a
huge amount of data or large data pools, and this data can be very useful for
the AI processes. However, such data is not available to others.
But, by using Blockchain, various start-ups and small companies can access
the same data pool and same AI process.
o Optimizing Energy Consumption
Data Mining is a high-energy consuming process, and it is one of the major
struggles for different industries. However, Google has majorly solved this
issue with the help of Machine Learning. Google does this by training the
DeepMind AI so that it can reduce the energy consumption used for cooling
the data centres by approx. 40 %.
o Implementing Trustable Real-time Payment Process
By combing Blockchain and ML, the most trustworthy real-time payment
process can be implemented in the Blockchain environment.
Conclusion
With the above description, we can conclude that both Machine Learning and
Blockchain perfectly complement each other. Both these technologies can be used as
the pillars of future innovation.
Artificial Intelligence (AI) is a field of computer science that deals with developing
intelligent machines that can behave like humans, such as speech recognition,
learning and planning, text recognition, etc. On the other hand, machine learning is a
subset of artificial intelligence that enables the machines to use past data or
experience and make a prediction and learn more accurately. Hence, both
technologies are very much important to groom your skills and career in the current
era. To do the same, you must know the primary requirements or prerequisites to
enter in AI and ML fields. Let's start with a quick introduction to AI and ML with
important prerequisites.
Now, we will discuss some important prerequisites to learn Artificial Intelligence (AI).
Here is a list of some prerequisites as follows:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Strong Knowledge of Mathematics: Before getting started with AI, you must
have sound knowledge of various mathematical concepts such as probability,
statistics, algebra, matrix, calculus, etc. Mathematics is very important to
build logical capability that is widely used in developing software and systems.
o Good knowledge of Programming knowledge: To learn the fundamentals
of writing codes, you must have sound knowledge of programming languages
like Python, R, LISP, Java, C++, Prolog, etc.
o Strong Analytical skills: Analytical skills refer to the ability to think critically,
analyze data, decision making capability as well as solve complex problems.
These important skill sets involve taking in new information and mentally
processing it in a productive manner. Hence, if you are planning to jump into
the AI domain, you must build your analytical skills to a great extent.
o Ability to understand complex algorithms: Artificial Intelligence is a field
that completely depends on various algorithms that tell computers how to
learn and take actions further. There are a few important algorithms that you
must know before getting started with AI as follows:
o Classification algorithms
o Regression algorithms
o Clustering algorithms
o Basic knowledge of Statistics and modelling: Statistical modelling is
defined as the use of mathematical models and statistical assumptions to
generate training data and predict outcomes for the future. A statistical model
is a collection of probability distributions on a set of all possible outcomes of
an experiment. We can say if anyone is looking to learn AI, then one must
enhance statistics and modelling knowledge.
In this way, you are now aware of a few common prerequisites to learn Artificial
Intelligence and ready to get started your career in this domain.
Now, we will discuss machine learning and important prerequisites to learning ML.
So, let's start with a quick introduction to Machine Learning technology.
o Supervised ML
o Unsupervised ML
o Reinforcement ML
All small as well as large size organizations want to implement machine learning
techniques in their business to grow more smartly than other competitors. Image
recognition and personal virtual assistance such as Alexa, SIRI, Cortana are the
most common examples of ML applications.
This is one of the most important prerequisites to learning ML. If you have sound
knowledge of mathematical concepts, you can easily build your own logic and
implement them in developing intelligent software to predict accurately.
Hence, we can conclude if you are really planning to enter in ML domain, then you
must go with at least one programming language given above. This will not only help
you in learning ML but also help you in data modelling and analytics.
Conclusion
Machine Learning and Artificial Intelligence are currently the most popular
technologies, and in upcoming decades these technologies will be the core of the IT
sector. As a prerequisite, both AI and ML technologies require a sound knowledge of
basic mathematics concepts to implement in software or systems. You must have a
good catch on statistics, linear algebra, matrix, calculus, probability, programming
languages and data modelling. If you are confident in these areas, you can go ahead
to make your career in these fields. In this topic, we have discussed a few important
prerequisites to learn AI and ML. Hopefully, after reading this, you must have a clear
understanding of the first step to entering this domain.
1. TRIGMA
Trigma has been a leading provider of custom software development and
consultancy services for 12+ years with 200+ IT professionals. It aims to take client
business to a worldwide audience through smart technology, deep expertise, and
intelligence.
Trigma provides various services with a super speciality in the following technologies
and tools:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o CMS
o AI and ML
o Infra and DevOps
o Cloud
o Mobility
o Quality assurance (QA)
o Web development
o IoT
o SEO, digital marketing & advertising
o Brand strategy consulting and brand design
o Custom software and support
o Social media and content creation, etc.
Overall rating:5.0
India - 140306
+919855885133
Clients:
Trigma deals with individuals as well as organizations with ambition and imagination
to unleash the power of IT for their business and ideas. Clients of Trigma
are Samsung, UNDP, Disney, Suzuki, British council, Whirlpool, Government of
India (GOI), Alera, Shell, History, Hero, Walmart, Pernod ricard, Abbott and
IcarAsia, etc.
Company Website: https://trigma.com/
2. Talentica Software
Talentica Software has primarily dealt with startups for two decades with 170+
technology products and 1000+ IT professionals. Talentica has a good track record in
providing custom software development services for data protection platforms.
Talentica helps you to choose technology, setup architecture, and leverage emerging
tools & trends in the following technologies:
Overall rating:4.8
Global Location: Company has 3 presences in Pune (India) and 1 in the USA.
Contact: +91-2040751111
Clients:
o Emtech
o Rupeek
o Rostify
o Citrus
o Wideorbit
o Mist
o Tailored Mail
o Realization
o Step Solutions, etc.
3. InApp
InApp is one of the leading companies that offer world-class mobile and web
application services for startups, SMBs and enterprises around the globe, with 300+
graduate & post-graduate engineers for 21 years.
InApp provides the services in various technologies and tools such as:
Global Location:
India Location:
Overall rating:4.9
Clients:
o Align
o Axa
o Informatica
o Innotas by planview
o Pro unlimited
o MPulse, etc.
4. Prolitus
Since their inauguration in 2005, Prolitus has constantly been delivering cutting edge
technology to their client for developing the best enterprise solutions and
transforming their business.
The company is well known for its technology synergies which have successfully
moderated challenges faced by the clients. They consist of more than 200 techo-
functional professionals who aim to build market-leading advanced services and
solutions to grow clients' businesses efficiently.
Prolitus Partners:
Prolitus clients
India Location: Stellar IT Park, Tower B, 5th Floor, Sector 62, Noida - 201309, Uttar
Pradesh, India.
5. Webtunix AI
Webtunix is a group of talented people who have a common aim ML as a service to
use data for helping organizations to solve complex business problems.
Webtunix AI works as an ML consulting company that deals with the most advanced
problems in data science and machine learning.
The main focus of this company is to make business automated using deep learning
techniques, which uses a huge amount of big data & ML libraries.
Global location:
Webtunix AI is currently offering ML as a service in San Francisco, New York, Tampa,
Virginia, Dallas, Texas, Washington DC, USA, Ontario Canada, Denmark, UK, UAE,
Singapore, Germany, Netherlands, Italy, China, Nigeria, Bangalore, Delhi.
6. QBurst
QBurst is a leading software development and consulting organization which offers
cognitive solutions and custom software development services for SMBs companies
for 17 years. QBurst is currently present in 14 cities with 2500+ projects, 150+ active
clients and 2000+ employees globally.
o Cloud enablement
o Data and AI (Machine Learning, Data Science, Big Data, Data visualization,
data engineering, Artificial intelligence. RPA, Computer vision, etc.)
o Digital marketing
o Digitalization
o End-to-end (UX/UI design, API management, Cybersecurity, QA Automation,
DevOps, Performance Monitoring, etc.)
o SaaS (Salesforce, Oracle, ServiceNow, SharePoint, Microsoft Solution, etc.)
Clients:Qburst has worked with so many clients in past decades. Some of the top
clients are Dell, Adani, Omron, Mercedes Benz, United Nations, Genesys, Airtel,
Concentrix, Qlik, Bajaj Allianz, Greenpeace, Spectrum brands, ABB, etc.
Global Location:
QBurst is currently serving in America, Europe, the Middle East, South Asia, East Asia
and Oceania.
India Location:
Company Website:https://www.qburst.com/
7. ValueCoders
Valuecoders is an Indian software and consulting company established in 2004. It is
one of the top-rated and recognized software outsourcing companies with a team of
650+ IT professionals and 2500+ clients globally ranging from startups to Fortune
500 companies.
ValueCoders works with various technologies and platforms to lend flexibility to your
software development and outsourcing needs. Technologies on which VslueCoders
work are given below:
o Healthcare
o Banking & Finance
o Retail and Ecommerce
o Media & entertainment
o Education and E-learning
o ISVs & Product firms
Global Location: North America, Asia Pacific region, Europe, Middle East & Africa,
India.
8. PixelCrayons
Pixelcrayons is a SaaS-based software IT outsourcing company that provides
software product development, digital transformation services, e-commerce
development services across the globe. Pixelcrayons is a 16+ years old company that
is running a business in 38+ countries with 450+ employees and 11500+ projects.
Conclusion
Machine learning became an essential part of our technologies today. Without ML
technologies and applications, no one can compete in the industry. All small and
large companies are hiring ML engineers and data scientists to deliver a seamless
consumer experience around the globe. India is also continuously growing in
developing IT companies with ML solutions. We have concluded a few best-rated ML
and data science companies that have good repudiation in India as well as across the
world.
Hence, in this topic, "Maths courses for Machine Learning", we will discuss a few best
courses available over the internet. Referring to these courses, you can enhance the
basic math skills required for entering the machine learning world. Below are some
criteria, based on which we are suggesting to follow given mathematics courses for
ML.
Criteria
Now, without wasting time, let's start discovering a few best online mathematics
courses for machine learning.
Best Online Mathematics courses for
Machine Learning
1. Mathematics for Machine Learning Specialization
2. Data Science Math Skills
3. Introduction to Calculus
4. Probabilistic Graphical Models Specialization
5. Statistics with R Specialization
6. Probability and Statistics
7. Mathematical Foundation for Machine Learning and AI
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o In the first series, we will learn important concepts of linear algebra, vectors,
matrices and their relationship with data in ML.
o In the second series, we will focus on Multivariate Calculus, which helps you in
getting in-depth knowledge of optimizing fitting functions to get good fits to
data.
o The last and 3rd series of this course is Dimensionality Reduction with
Principal Component Analysis. This course enables you to implement entire
mathematics knowledge in real-time scenarios.
After completing all series, you will feel confident enough to start a career in machine
learning.
Course description:
Important link: Click here to enrol and know more about this course.
o Set theory
o Venn diagrams
o Properties of the real number line
o Sigma notation, interval notation and quadratic equations
o Concepts of a Cartesian plane, slope, and distance formulas
o Functions and graphs
o Instantaneous rate of change and tangent lines to a curve
o Logarithmic functions
o Exponential functions
o Probability
o Bayes Theorem
Pre-requisites:
To enrol on this course, you do not need a prior understanding of the maths
required for ML and Data Science.
Important link: Click here to enrol and know more about this course package.
3. Introduction to Calculus
This is one of the highest-rated maths courses over the internet by David Easdown.
It covers the entire calculus concepts required for machine learning solutions.
Further, this course helps you to maintain a balance between theory and the
application of calculus.
Pre-requisites:
You must have a basic understanding of calculus and general mathematics concepts
to enrol on this course. This course is significant if you only want to Master yourself
in Calculus.
Important Link: Click here to enrol and know more about this course.
This course is designed in a way that will help you to learn various important skills
such as inference, Bayesian Network, Belief Propagation, Graphical Model,
Markov Random Field, Markov Random Field, Markov Chain Monte Carlo
(MCMC), Algorithmsand Expectation-Maximization (EM) Algorithm.
Benefits:
Pre-requisites:
Before enrolling on this course, one should have a basic understanding of
mathematics and at least one programming knowledge.
Important Link: Click here to enrol and know more information related to this
course.
Extra Benefits:
Pre-requisites:
Before enrolling on this course, you must have prior knowledge of basic mathematics
concepts, and good interest in data analysis will be an advantage. Further, no
previous programming knowledge is mandatory to start this course.
Important Link: Click here to enrol and know more about this course.
6. Probability and Statistics
This course is offered by the University of London under the guidance of Dr James
Abdey. This course is specially designed for probability, descriptive statistics, point
and interval estimation of means and proportions, etc. It helps in building essential
skills for good decision making and predicting future results.
Extra benefits:
You will be provided with a Shareable Certificate after completion of this course.
Further, you will also get the entire course agenda, such as recorded video lectures,
class notes, practice theoretical & programming assignments, Graded Quizzes, etc.
Pre-requisites:
This course is specially designed for beginners; hence no mathematics and
programming knowledge is required to start this course.
Important Link: Click here to enrol and know more about this course
Mathematics is one of the key players to develop programming skills, and this course
is designed in the exact same way to help you to master the mathematical
foundation required for writing programs and algorithms for AI and ML.
Course content
This course is categorised into 3 sections:
1) Linear Algebra:
2) Multivariate calculus
It helps in understanding the learning part of ML. It is what is used to learn from
examples, update the parameters of different models and improve the performance.
o Derivatives
o Integrals
o Gradients
o Differential Operators
o Convex Optimization
3) Probability Theory
Probability theory is one of the important concepts that help us to make
assumptions about underlying data in deep learning and AI algorithms. It is
important for us to understand the key probability concepts
o Elements of Probability
o Random Variables
o Distributions
o Variance and Expectation
o Special Random Variables
Extra benefits:
Along with a certificate of completion, video lectures and online study materials, this
course also includes projects and quizzes upon unlocking each section, which helps
you to solidify your knowledge. Further, this course not only helps in building your
own algorithms but also start putting your algorithms to use in your next projects.
Pre-requisites:
This course is designed for beginners as well as experienced levels. Further, basic
knowledge of Python is needed as concepts are coded in Python and R.
Important link:Click here to enrol and know more about this course.
Conclusion
Mathematics is always a key player in entering the programming domain. All
programming languages like Java, Python, R, Apex, C, etc., are required to have good
mathematics knowledge to build your logical concepts and algorithms. In this topic,
we have discussed a few important and best maths courses available online for
learning Machine learning and AI solutions. Hopefully, after reading this article, you
will be able to choose the best maths course to start your journey in ML and build
your career in the IT world.
Probability and Statistics Books for
Machine Learning
Probability and statistics both are the most important concepts for Machine
Learning. Probability is about predicting the likelihood of future events, while
statistics involves the analysis of the frequency of past events.
Nowadays, Machine Learning has become one of the first choices for most freshers
and IT professionals. But, in order to enter this field, one must have some pre-
specified skills and one of those skills in Mathematics. Yes, Mathematics is very much
important to learn ML technology and develop efficient applications for the business.
When talking about mathematics for Machine Learning, it especially focuses on
Probability and Statistics, which are the essential topics to get started with ML.
Probability and statistics are considered as the base foundation for ML and data
science to develop ML algorithms and build decision-making capabilities. Also,
Probability and statistics are the primary prerequisites to learn ML.
In this topic, we will discuss a few important books on Probability and statistics that
help you in making the ML process easy and implementing algorithms to business
scenarios too. Here, we will discuss some of the best books for Probability and
Statistics from basic to advanced levels.
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Probability can be calculated by the number of times the event occurs divided
by the total number of possible outcomes. Let's suppose we tossed a coin, then the
probability of getting head as a possible outcome can be calculated as below
formula:
P (H) = ½
P (H) = 0.5
Where;
Types of Probability
For better understanding the Probability, it can be categorized further in different
types as follows:
P(A|B) = P(A∩B)/P(B)
Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as A and B as
P(A ∩ B)= p(A).P(B|A), which means: "The chance of both things happening is the
chance that the first one happens, and then the second one is given when the first
thing happened."
Statistics is the part of applied Mathematics that deals with studying and developing
ways for gathering, analyzing, interpreting and drawing conclusion from empirical data.
It can be used to perform better-informed business decisions.
o Descriptive Statistics
o Inferential Statistics
Use of Statistics in ML
Statistics methods are used to understand the training data as well as interpret the
results of testing different machine learning models. Further, Statistics can be used to
make better-informed business and investing decisions.
Best Probability and Statistics books for
Machine Learning
Probability and statistics both are equally important for learning Machine learning
technology, but the main question is regarding the best books or sources of learning
Probability and statistics for ML. Although there are so many books available over
the internet as well as offline stores choosing the best appropriate book is the main
problem for aspirants. There are a few best books on Probability and Statistics are
given as follows:
Price (Amazon):$118.15
Star Ratings:4.4/5
This book is available with the latest Python version 3.6+, which includes all essential
areas of Probability, Statistics, and ML illustrated using Python. This book gives you
exposure to various machine learning methods and examples using different
analytical methods and Python codes which help you in deploying your theoretical
concepts into real-time scenarios. It also provides detailed descriptions of various
important results using modern Python libraries such as Pandas, Scikit-learn,
TensorFlow, and Keras. Many abstract mathematical ideas, such as convergence in
probability theory, are developed and illustrated with numerical examples.
The authors of this book Gareth James, Daniela Witten, Trevor Hastie and Rob
Tibshirani, have divided this book into two editions.
o Deep learning
o Survival analysis
o Multiple testing
o Naive Bayes and generalized linear models
o Bayesian additive regression trees
o Matrix completion
This book is available in both online and offline modes. Either you can download a
PDF of this book or also order it on the Amazon marketplace site.
Price:$84.95 (Amazon)
Star Ratings:4.6/5
Overview: The books illustrate important ideas in different fields such as medical,
finance, marketing, etc., which is a reference of a common framework.
As this book shows the statistical approach, hence it mainly focuses on explaining the
concepts rather than mathematics. It contains different examples of each topic with
different colour graphics.
This book is one of the best resources for Machine Learning professionals and one
who is interested in data mining concepts. The various concepts of the book range
from supervised to unsupervised learning.
Overview: This book is written and designed by three popular statisticians named
Robert V. Hogg, Elliot Tanis, and Dale Zimmerman. The latest edition of this book is
the tenth edition, which focuses on the existence of variation in each process, and
also helps readers to understand this variation with the help of Probability and
Statistics.
The book includes the applied introduction to Probability and statistics that
reinforces the mathematical concepts with different real-world examples and
applications. These examples also illustrate relevance to the key concepts of statistics.
The book's syllabus is designed for two-semester courses, but it can be completed in
a one-semester course only.
Conclusion
Machine learning is a very broad technology that has so many concepts related to
mathematics and computer programming; based on that, ML can be used to build
intelligent software & system for future prediction. If you are very much confident in
basic and advanced mathematics such as Probability and statistics, then you can
perform better in this industry. Hopefully, this topic will help you to select the best
books for Probability and statistics.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Although machine learning can be used as a risk management tool, it also contains
many risks itself. While 49% of companies are exploring or planning to use machine
learning, only a small minority recognize the risks it poses. In which, only 41% of
organizations in a global McKinsey survey say they can comprehensively identify and
prioritize machine learning risks. Hence, it is necessary to be aware of some of the
risks of machine learning-and how they can be adequately evaluated and managed.
1. Poor Data
As we know, a machine learning model only works on the data that we provide to it,
or we can say it completely depends on human-given training data to work. What we
will be input that we will get as an output, so if we will enter the poor data, the ML
model will generate abrupt output. Poor data or dirty data includes errors in training
data, outliers, and unstructured data, which cannot be adequately interpreted by the
model.
2. Overfitting
An overfitted model fits the training data so perfectly that it becomes unable to learn
the variability for the algorithm. It means it won't be able to generalize well when it
comes to testing real data.
3. Biased data
Biased data means that human biases can creep into your datasets and spoil
outcomes. For instance, the popular selfie editor FaceApp was initially inadvertently
trained to make faces "hotter" by lightening the skin tone-a result of having been fed
a much larger quantity of photos of people with lighter skin tones.
Machine learning is a very new technology in the IT sector; hence, less availability of
trained and skilled resources is a very big issue for the industries. Further, lack of
strategy and experience due to fewer resources leads to wastage of time and money
as well as negatively affect the organization's production and revenue. According to
a survey of over 2000 people, 860 reported to lack of clear strategy and 840 were
reported to lack of talent with appropriate skill sets. This survey shows how lack of
strategy and relevant experience creates a barrier in the development of machine
learning for organizations.
5. Security Risks
Security of data is one of the major issues for the IT world. Security also affects the
production and revenue of organizations. When it comes to machine learning, there
are various types of security risks exist that can compromise machine learning
algorithms and systems. Data scientists and machine learning experts have reported
3 types of attacks, primarily for machine learning models. These are as follows:
Data is one of the main key players in developing Machine learning models. We
know machine learning requires a huge amount of structured and unstructured data
for training models so they can predict accurately in future. Hence, to achieve good
results, we need to secure data by defining some privacy terms and conditions as
well as making it confidential. Hackers can launch data extraction attacks that can fly
under the radar, which can put your entire machine learning system at risk.
7. Third-party risks
These types of security risks are not so famous in industries as there are very minimal
chances of these risks in industries. Third-party risks generally exist when someone
outsources their business to third-party service providers who may fail to properly
govern a machine learning solution. This leads to various types of data breaches in
the ML industry.
8. Regulatory challenges
Hence, machine learning risks can be identified and minimized through appropriate
talent, strategy and skilled resources throughout the organization.
Conclusion
There is no surprise if we say machine learning is a continuously growing technology
that is employed in so many industries to make business automated and faster. But
as well, as we have recently seen, there are some risks also associated with machine
learning solutions. However, data scientists and ML experts are continuously
researching more on ML technology and developing new solutions for improving it.
In this topic, we have discussed a few important risks associated with ML solutions
when implementing them in your business and steps to assess these risks as well.
Hopefully, after reading this topic, you have in-depth knowledge of various risks
associated with machine learning.
There are so many popular brands in the market who claims their laptops as the best,
but you should not stop just by here to see their name and reputation only. Instead,
you must do a bit of research before purchasing the best laptop for a machine
learning application. Further, apart from configuration and features, budget is also a
crucial factor when purchasing a laptop. In this article, ''Best Laptops for Machine
Learning'', we will discuss various laptops with their features and configurations of
GPU and RAM.
Before purchasing any laptop for machine learning, we must be aware of a few
important factors such as portability, RAM, CPU, GPU, etc. So, let's start with a
quick overview of these factors.
1. Portability: This is one of the most important factors when you are purchasing the
best suitable device for Machine Learning especially. However, if you do not have any
concern with portability, then you can go with a personal computer. Nowadays, all
companies are following remote working culture, so portability becomes a significant
factor when purchasing a device.
The Higher the Processing Power, the heavier is the laptop. Now, this can mean a lot
of things.
4. GPU: This is one of the key factors required for solving complex matrix problems.
In machine learning and deep learning, there is a various neural network that is
computationally more intensive. Hence, GPU becomes important for enabling parallel
processing. Any task that takes months or weeks to perform, you can complete
within a few hours only with the help of a GPU.
5. Storage: Although storage matters when purchasing a laptop, but still if you feel
less storage, you can opt for cloud storage options too. Further, a minimum of 1TB
HDD is advised while purchasing any laptop, especially for Machine Learning.
6. Operating System (OS): When talking about operating systems, you can go to
either Linux, windows, or Mac too.
1. Lambda TensorBook
This is one of the best laptops with out-of-box functionalities and is pre-installed
with TensorFlow and PyTorch. This laptop is specially designed for deep learning
that comes with Lambda Stack, which includes frameworks like TensorFlow and
PyTorch. Lambda Stack makes upgrading frameworks easy such as ubuntu,
TensorFlow, PyTorch, Jupyter, nvidia cuda, and cuDNN.
2. GIGABYTE G5 GD
Gigabytes have always been the first choice for all data scientists, machine learning
professionals as well as gamers. This laptop comes under 1000$ and is available in
various online as well as offline stores. In this price range, this laptop comes with a
very nice set of specifications.
If you are really looking for the most affordable laptop for machine learning and
gaming, then this laptop will fulfill all your requirements.
It comes with both 14-inches as well as 16-inches displays and is also the best option
for machine learning professionals. Although MacBook is made of aluminum, so it is
quite expensive than other laptops. The price range of the Apple MacBook Pro varies
from 2600 USD to 3000 USD.
Although this laptop is specially designed for gaming professionals, still, it is also one
of the best choices among all data scientists and machine learning professionals. The
price range of Acer Nitro 5 AN515 varies between 1300-1400 USD and is available on
various online as well as offline stores.
This laptop is looked like a gaming laptop, but one of the best laptops for AI and
Machine Learning; it is powered by some of AMD's finest desktop hardware at a low
price. This laptop came with a big screen and was outfitted with a Pascal GPU and
Kaby lake processor.
The price range of ASUS ROG Strix GL702VS is between 1600 USD to 1700 USD.
It is one of the best laptops under a $2K budget and ideal for ML professionals who
want Intel processors, excellent RAM size, and RTX 30X GPUs.
It comes with great features, including a Full-size Island-style RGB Backlit Keyboard
with Numeric Keypad, Dual front-facing stereo speakers with dual digital
microphones, 15.6" Full HD (1920 x 1080) Widescreen LED-backlit IPS Display with
16:9 aspect ratio, etc. It is best suited for Gaming, business, personal, and ML
projects.
Feature Specifications
7. Razer Blade 15
The next best laptop for machine learning applications is Razer Blade 15 series
laptop. It is specifically used for multimedia, business purpose, gaming, and building
ML applications.
This laptop is available in 15 inches screen with classic black color and beautiful
design. It is great for machine learning projects as it comes with an i7 core processor.
It comes with a dedicated Graphics card.
Feature Specifications:
o Processor: Core i7
o Memory: 16 GB DDR4
o Display: Available in 15 inches Screen
o Weight: 3 kg 920 g
o Storage: 1 TB in Hard Disk Drive
o OS: Windows 11 Home
o Link to buy: Click here
While looking for the best laptop for Machine learning, MSI P65 can't be ignored.
MSI is one of the popular brands that offer a great range of best laptops. It is also
known for providing the best laptops for Gaming. The best thing about this laptop is
its processor with high processing power and great performance with the impressive
screen.
Feature Specifications:
o Processor: Core i9
o Memory: 32GB RAM DDR4
o Display: Available in 15.6 inches Screen with 4K display
o Battery Life: 4 Kilowatt Hours
o Weight: 1 Kg 900g
o Storage: 1 TB in Hard Disk Drive
o OS: Windows 10 Pro
o Link to buy: Click here
Conclusion
In this topic, we have discussed various laptops suitable for machine learning
professionals as well as data scientists. However, choosing the best laptops depends
on your project as well as your budget; for e.g., if you are looking for laptops with
high performance regardless of look and feel, then you can prefer Apple MacBook
Pro 15.
Due to the popularity of machine learning across the world, all organizations are
adopting this technology. Similar to other industries, the finance sector also has seen
exponential growth in the use cases of machine learning applications to get better
outcomes for both consumers and businesses. In this topic, "Machine Learning in
Finance", we will discuss various important concepts related to the finance industry
using machine learning algorithms, benefits of ML in finance, use cases of ML in
finance, etc. Before starting this topic, firstly, we will understand the basic
introduction to machine learning and its relation to the finance sector.
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Supervised Learning
o Unsupervised Learning
o Semi-supervised Learning
o Reinforcement Learning
Initially, machine learning was adopted by very few financial service providers, but in
recent past years, the use of machine learning and its application has been seen in
several areas of the finance industry like banks, fintech, banking regulators, insurance
sectors, trading, etc.
Further, with the rise in big data, machine learning in finance has become more
prominent; hence leading banks and other financial services are deploying ML
technologies to optimize portfolios, streamline their business, and manage financial
assets across the globe.
In the finance sector, machine learning algorithms are used to detect fraud, money
laundering activities, trading activities, and various financial advisory services
to investors. It can analyze millions of data sets within a short time to improve the
outcomes without being explicitly programmed.
Below are a few reasons to use Machine Learning in the finance industry:
Here are a few important use cases where ML algorithms are being used in the
finance industry as follows:
1. Financial Monitoring
2. Process automation
3. Secure transaction
4. Risk management
5. Algorithmic trading
6. Financial regulators and advisory
7. Customer data management
8. Decision making and investment prediction
9. Customer service improvement
10. Customer retention program
11. Marketing
1. Financial monitoring
Financial monitoring is a monitoring process by which financial analyst prevents
money laundering, enhance network security, detect flags, etc. hence, machine
learning helps the analyst to provide improved financial monitoring services to
clients.
2. Process automation
Machine Learning has replaced most of the manual work in fiancé sectors by
automating repetitive tasks through intelligent process automation for enhanced
business productivity. Further, with automation, organizations have achieved
improved customer service experience at a reduced cost.
Chatbots, auto-fill forms, employee training gamification, etc., are a few popular
examples of process automation in the finance sector.
3. Secure transaction
Since all banking and finance activities are mostly happening with digital payment
systems, hence the chances of transactional fraud also increased in a few years.
Machine learning has reduced the risk of transactional fraud as well as a number of
false rejections.
4. Risk Management
Financial Sector is one of the most sensitive industries that may involve lots of risky
situations if not managed in a perfect manner. Financial Sector is about lots of cash
or credit transactions between different institutions or banks and their customers.
Due to this reason, there are various chances of being mishandled.
5. Algorithmic Trading
Algorithmic trading is one of the best use cases of Machine Learning in the Finance
sector. In fact, Algorithmic Trading (AT) has become a dominant force in the global
financial markets.
Machine learning allows the trading companies to make decisions after analyzing the
trade results and closely monitoring the funds and news in real-time. With real-time
monitoring, it can detect patterns of the stock market going up or down.
ML algorithms used in these apps enable the customers to keep an eye on their daily
spending on the app and also allow them to analyze this data in finding their
spending patterns and areas where they can save their money.
One of the great examples of such ML apps is Robo-advisor, one of the rapidly
growing apps in this sector. These advisors work as regular advisors, and they
specifically focus on the target investors with limited resources who want to
efficiently manage their funds. These ML-based Robo-advisors use traditional data
processing techniques for building financial portfolios and solutions, for example,
trading, investments, retirement plans, etc., for their users.
But nowadays, financial data has become very vast due to its different sources, such
as social media activities, transactional details, mobile transactions, and market data.
Hence it has become very difficult for financial specialists to manage such a huge
amount of data manually.
To solve this issue, different machine learning techniques can be integrated with
finance systems which can manage such large volumes of data and can offer the
benefit of extracting real intelligence from data. Different AI and ML tools, such as
NLP (Natural language processing), data mining, etc., can help to get insights from
data that make the business more profitable.
The binary classification model is used that determines the customers at risk, which
then follows a recommender.
11. Marketing
As AI and Machine Learning models make better predictions on the basis of
past/historical data, which makes them the best tools for marketing. These ML tools
use different algorithms which can help finance companies for creating a robust
marketing strategy by analyzing the mobile app usage, web activity, responses to the
previous ad campaign, etc.
Conclusion
In this topic, we have seen how machine learning is currently being used and
benefiting the Finance industry. The value of ML applications in finance is increasing
day by day. However, the real long-term value will probably appear in next coming
years. Because of lots of applications of ML tools in the finance sector, various banks,
and financial institutions are investing billions in this technology. With these
investments, companies are getting various benefits, including reduced operational
costs, increased revenue, enhanced customer experiences, and many more.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Storing new Leads: Machine Learning helps to train machines for storing
data in a database through past data. Whenever a new lead appears in the
database, it gets automatically stored database on the basis of previous
training data and classification metrics.
o Lead analysis: Machine learning algorithms help to determine whether a lead
is valuable or not, and based on demographic scores, lead analysis is done by
ML algorithms.
o Lead classification: based on demographic score, leads are automatically
classified in a system. Whenever the lead score is below the classification
score, it gets neglected by the system, and if the lead score is above the
classification score, ML algorithms wait for the next possible action of lead.
o Behaviour Analysis: Whenever a lead is successfully classified and takes the
next action, machine learning algorithms help to count the sales threshold.
And based on this calculation, the system analyzes various details such as lead
revert time, link clicks, insights, acquisition, events, web visits, etc.
o Forwarding for the next 'targeted action: Whenever the system qualifies a
lead by crossing the benchmark sales threshold, it is forwarded to the next
level for further manual/targeted actions such as arranging a call or meeting
with leads.
o Enhance calculator function: At this stage, the final output is again utilized in
training the sales threshold counting function and demographic counting
function. This process ensures the continual refinement of machine learning
algorithms.
o Link clicks
o Open rate
o Replied
Personal virtual assistance or chatbots are one of the best applications of Machine
learning technologies today. ML engineers are continuously focusing on developing
advanced chatbots for conversation with the customers in business. ML tools are
dedicated to tracking entire chat history based on their geographical location,
region, occurrence frequency, text strings, etc. Further, if any customer returns again
and again or shows more interest, then machine learning algorithms try to ask for
contact detail and save them for you to contact later.
There are millions of websites running over the internet, and many of them may have
the same technology which your company is targeting. So, if it is the same, then it
can solve many issues in your business-like ranking keywords, most queried
keywords, etc. In this analysis, machine learning algorithms and tools are also helpful
to find out the competitors/similar websites.
Machine learning allows you to identify such visitors and differentiate them from
your target audience as those visitors may come for other reasons instead of buying
services.
o Remove unwanted form filling: Most of the leads are generated through
smartphones, and form is one of the best ways to generate leads through
smartphones. Even with the auto-fill feature, no one wants to waste their time
accessing a post. Hence, machine learning helps the customers to access
blogs without filling multi fields forms, and only they need to browse at their
own pace. Sometimes customers are ready to provide their contact detail but
don't want to fill out forms; then, in these cases, machine learning algorithms
take care of these things automatically.
o Develop a hyper-personalized experience: Machine learning helps to create
a truly personalized experience. However, you can create content and target
your audience, but without ML, it is impossible to deliver a hyper-personalized
experience to customers.
o Allow leads to self-nurture: Machine Learning allows the customer to self-
nurture before interrupting the sales and marketing team. It allows the
customer to access the content at their own pace and inform them about
products and services through personalized content recommendations.
However, you can retarget them by social advertisements, but on your site,
they can be unrestricted by forms or pushy sales teams.
Conclusion
Machine Learning is one of the most popular technologies that is used in various
industries such as marketing, healthcare, finance, banking, infrastructure, digital
marketing, SEO, product recommendation, etc. Based on some research, it is found
that adding an AI engine with ML in lead generation strategy can deliver 51% more
lead conversions instantly. Machine learning is also useful to automate the lead
generation process through various tools across your website, such as adaptive
content hubs, self-nurturing landing pages, Personalized Exit-Intent Popups,
Human Lead Verification, etc. Hence, we can say lead generation is a complex
process when you have a large customer base, but machine learning has solved this
process by narrowing down your target list and reducing the efforts needed to
convert the customer as well as increase the business revenue.
Further, data science is the field of study that helps us extract useful data from
structured and unstructured formatsdata format. Later, this extracted data is used to
train machine learning models. Hence, we can say data science is the study of
cleaning, preparing, and analyzing the data, whereas Machine learning is the
subfield of data science. When we talk about a career in data science and Machine
learning, then yes, both these technologies have a great future scope with
tremendous jobs in the IT & software domain. Although various institutions and
organizations offer so many certification courses, we have listed a few repudiated
certification courses for ML and data science that will surely help you boost your
career.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1. Introduction to ML
2. Exploring and Using data sets
3. Review of Machine Learning Algorithms
4. Machine Learning with Scikit
5. Deep Learning with Keras and TensorFlow
6. Building a Machine Learning Pipeline
o This is one of the best courses for anyone who wants to become a data
scientist or machine learning engineer.
o This course gives you analytics skills so you can lead a team of analysts.
o Business analysts (BA) want to learn data science and ML techniques.
o Information architects who need expertise in machine learning algorithms
o Analytics professionals who work in machine learning or artificial intelligence
o Graduates are looking to build a career in data science and machine learning.
o Data scientists, data analysts, and professionals wish to turn large volumes of
data into actionable insights.
o Early career professionals and senior managers, including Technical managers,
Business intelligence analysts, IT practitioners, Management consultants, and
Business managers.
o Those with some academic/ professional training in applied mathematics/
statistics. Participants without this experience will have to put in extra work
and be provided support by Great Learning.
o This course lets you learn from the best MIT faculty with recorded video
lectures to build industry-valued skills.
o This course also provides the facility of weekend support from other mentors
or experts in data science and ML.
o After completing this course, you will become entitled to a certificate of
completion from the Massachusetts Institute of Technology (MIT) IDSS.
o This course gives you hands-on exposure to 3 projects and 15+ case studies.
Upon completing this course and hands-on projects, you will be provided a
certificate that you can share with prospective employers and your professional
network.
Extra Benefits:
Prerequisites: Before starting this course, you must have a computer science &
engineering background.
When to register: You can register for this certification anytime through
the edx website.
Along with a globally valid certificate, you will also know various core areas of
machine learning, such as:
This program enables you to implement the live data algorithms and practice
debugging and improving models with the help of SVM(support vector machines)
and ensemble methods. Moreover, this course also offers you the internal working of
neural networks and their construction and adoption of neural networks for different
data types. This program uses Python and the NumPy library for code exercises and
projects. Projects can be submitted and performed in Jupyter Notebooks.
Prerequisites: Python
What is Big Data and Machine
Learning
Big Data and Machine Learning have become the reason behind the success of
various industries. Both these technologies are becoming popular day by day among
all data scientists and professionals. Big data is a term that is used to describe
large, hard-to-manage, structured, and unstructured voluminous data.
Whereas, Machine learning is a subfield of Artificial Intelligence that enables
machines to automatically learn and improve from experience/past data.
Both Machine learning and big data technologies are being used together by most
companies because it becomes difficult for the companies to manage, store, and
process the collected data efficiently; hence in such a case, Machine learning helps
them.
Before going in deep with these two most popular technologies, i.e., Big Data and
Machine Learning, we will discuss a quick introduction to big data and machine
learning. Further, we will discuss the relationship between big data and machine
learning. So, let's start with the introduction to Big data and Machine Learning.
PauseNext
Unmute
Duration 18:10
Loaded: 9.54%
Â
Fullscreen
Big data is a very vast field for anyone who is looking to make a career in the IT
industry.
o Capturing
o Curating
o Storing
o Searching
o Sharing
o Transferring
o Analyzing
o Visualization
Data can be structured as well as unstructured and comes from various sources. It
can be audio, video, text, emails, transactions, and many more. Due to various
formats of data, storing, managing, and organizing the data becomes a big challenge
for organizations. Although storing raw data is not difficult but converting
unstructured data into a structured format and making them accessible for business
uses is practically complex for IT expertise.
Rendering and data sorting is very necessary to control data flows. Further, the
superiority of processing data with high accuracy and speed is also necessary for
storing, managing, and organizing data in an efficient manner. Smart sensors, smart
metering, and RFID tags make it necessary to deal with huge data influx in almost
real-time. Sorting, assessing, and storing such deluges of data in a timely fashion
become necessary for most organizations.
o Veracity (Accuracy)
In general, Veracity refers to the accuracy of data sets. But when it comes to Big data,
it is not only limited to the accuracy of big data but also tells us how trustworthy is
the data source. Further, it also determines the reliability of data and how meaningful
it is for analysis. In one line, we can say Veracity is defined as the quality and
consistency of data.
Value in Big Data refers to the meaningful or usefulness of stored data for your
business. In big data, data is stored in structured as well as an unstructured format,
but regardless of its volume, usually, it is not meaningful. Hence, we need to convert
it into a useful format for the business requirements of organizations. For e.g., data
having missing or corrupt values, missing key structured elements, etc., are not useful
for companies to provide better customer service, create marketing campaigns, etc.
Hence, it leads to reducing the revenue and profit in their businesses.
Sources of data in Big Data
Big data can be of various formats of data either in structured as well as unstructured
form, and comes from various different sources. The main sources of big data can be
of the following types:
o Social Media
Data is collected from various social media platforms such as Facebook, Twitter,
Instagram, Whatsapp, etc. Although data collected from these platforms can be
anything like text, audio, video, etc., the biggest challenge is to store, manage and
organize these data in an efficient way.
There are various online cloud platforms, such as Amazon AWS, Google Cloud, IBM
cloud, etc., that are also used as a source of big data for machine learning.
o Internet of things:
The Internet of Things (IoT) is a platform that offers cloud facilities, including data
storage and processing through IoT. Recently, cloud-based ML models are getting
popular. It starts with invoking input data from the client end and processing
machine learning algorithms using an artificial neural network (ANN) over cloud
servers and then returning with output to the client again.
Nowadays, every second, thousands of web pages are created and uploaded over
the internet. These web pages can be in the form of text, images, videos, etc. Hence,
these web pages are also a source of big data.
o Image recognition
o Speech Recognition
o Healthcare
o Finance and Banking industry
o Computational Biology
o Energy production
o Automation
o Self-driven vehicle
o Natural Language Processing (NLP)
o Personal virtual assistance
o Marketing and Trading
o The education sector, etc.
With the rise of big data, the use of machine learning has also increased in all
industries. Below is the table to show the differences between machine learning and
big data as follows:
Machine Learning can be categorized mainly as Big Data can be categorized as structu
supervised learning, unsupervised learning, semi- unstructured, and semi-structured data.
supervised learning, and reinforcement learning.
It helps to analyze input datasets with the use of It helps in analyzing, storing, managing,
various algorithms. organizing a huge volume of unstructured
sets.
It uses tools such as Numpy, Pandas, Scikit Learn, It uses tools such as Apache Hadoop, MongoD
TensorFlow, Keras.
In machine learning, machines or systems learn Big data mainly deals in extracting raw data
from training data and are used to predict future looks for a pattern that helps to build st
results using various algorithms. decision-making ability.
It works with limited dimensional data; hence it is It works with high-dimensional data; henc
relatively easier to recognize features. shows complexity in recognizing features.
An ideal machine learning model does not require It requires human intervention because it m
human intervention. deals with a huge amount of high-dimensi
data.
It is useful for providing better customer service, It is also helpful in areas as diverse as s
product recommendations, personal virtual marketing analysis, medicine & health
assistance, email spam filtering, automation, agriculture, gambling, environmental protec
speech/text recognition, etc. etc.
The scope of machine learning is to make The scope of big data is very vast as it will no
automated learning machines with improved just limited to handling voluminous data; inst
quality of predictive analysis, faster decision it will be used for optimizing the data stored
making, cognitive analysis, more robust, etc. structured format for enabling easy analysis.
There is no secret that almost all organizations, such as Google, Amazon, IBM,
Netflix, etc., have already discovered the power of big data analytics enhanced by
machine learning.
Machine Learning is a very crucial technology, and with big data, it has become more
powerful for data collection, data analysis, and data integration. All big organizations
use machine learning algorithms for running their business properly.
We can apply machine learning algorithms to every element of Big data operation,
including:
Machine Learning enables machines or systems to learn from past experience and
use data received from big data, and predict accurate results. Hence, this leads to
generating improved quality business operations and building better customer
relationship management. Big Data helps machine learning by providing a variety of
data so machines can learn more or multiple samples or training data.
In such ways, businesses can accomplish their dreams and get the benefit of big data
using ML algorithms. However, for using the combination of ML and big data,
companies need skilled data scientists.
Machine learning algorithms can be applied to every element of Big Data operation,
including:
o Data Segmentation
o Data Analytics
o Simulation
All these stages are integrated to create the big picture out of Big Data with insights,
patterns, which later get categorized and packaged into an understandable format.
Conclusion
In this article, we have discussed Big data and machine learning separately and the
basic differences between both technologies. Also, we have seen how machine
learning and big data can be used together to learn machine learning models using
the high quality of data from the huge amount of unstructured as well as structured
data. Further, we have also seen some applications that use big data and machine
learning and provide amazing results.
Example: Let's use K Nearest Neighbor to the iris dataset, then save the model.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Code:
Output:
Now, we will save the above model to string using pickle -
Code:
Output:
2. Pickled Model as File using joblib: Joblib replaces pickle because it is faster on
objects with large numpy arrays. These functions only accepts file-like object instead
of filename.
The pickled model as file using joblib offers the following functions:
Example:
Output:
This tutorial will show us how to create a machine-learning model without writing
code.
We will create a model to classify food items. We will use a Kaggle food dataset that
includes different food items such as Salads, Potatoes, and Cakes. You can download
the dataset from https://www.kaggle.com/cristeaioan/ffml-dataset.
Teachable Machine
Yes, it is possible with the help of a teaching machine. A teachable machine is a web-
based tool that quickly and easily creates models. It can be used for image, sound
and pose recognition. It is also flexible. It can be used to teach a model how to
identify images and pose through images or live webcam. It is free and best for
students. Teachable Machine creates a Tensorflow model, which can be integrated
with any website app, Android application, or other platforms. There is no need to
create an account. It was so easy.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Step 2: Choose an image project. We will see two options again: standard or
embedded. We aren't making this for micro-controllers, so we recommend choosing
a standard. If the users are interested, then select Embedded Image Model. Even if
they choose Embedded, the process will remain the same. It is only the model that
will differ.
Clicking on Standard Image Project will take us to the screen below. Here we can
add classes to the model. we have two choices: upload images from the
databank or use the live camera to capture images.
Step 3: Now create classes and upload the images. We will only create three
classes: Salad, Potato, and Cake. We have
replaced class1 with Salad and class2 with Potato and class3 is now called Cake.
The user can make classes as many as they like.
Click on Train Model after we have uploaded the images. There are three options
available: Batch Size, Epochs and Learning rate. These options are not something
we have ever heard of, so we don't be alarmed if they're new to us. It's important
that we play with the models and determine which values give the best accuracy to
make it more efficient. A model is useless if it's not accurate. We can adjust their
values to find the best model. Here we will use default values.
Step 4: After the model has been trained, it is time to export it.
We will see several options when we click Export Model. The code snippets can help
integrate the model into our application. Tensorflow.js models are compatible with
all JavaScript libraries and frameworks. Some frameworks only support a specific type
of model. We will check to see if our library or framework supports this model.
The download of the model can take some time. This is how we create a machine-
learning model.
We can also create models for audio and pose, similar to the image project. Let's see
what we can do.
Poses Model
We started training the model. The pose project must be selected in the teachable
machine to create the pose model. We will create two classes, one for sitting and one
for standing. Then we will upload the images.
After the training is complete, we can preview the model's Output by uploading any
image. This allows us to check the model's efficiency and Output before exporting it.
The below image shows that the Output from an image we uploaded to preview is
correct, i.e., sitting. This means that the model is doing well.
Audio Model
An audio project will create a model capable of detecting sound. We created three
classes: Background Noise, Clapping Rain, and Thunderstorm. In the preview section,
after training the model, we tested the model's efficiency using noise. In the Output
of the preview, we can see more background noise. We need to increase the number
of samples to improve the model's learning.
Data Structure for Machine Learning
Machine Learning is one of the hottest technologies used by data scientists or ML
experts to deploy a real-time project. However, only skills of machine learning are
not sufficient for solving real-world problems and designing a better product, but
also you have to gain good exposure to the data structure.
The data structure used for machine learning is quite similar to other software
development fields where it is often used. Machine Learning is a subset of
artificial intelligence that includes various complex algorithms to solve
mathematical problems to a great extent. Data structure helps to build and
understand these complex problems. Understanding the data structure also helps
you to build ML models and algorithms in a much more efficient way than other ML
professionals. In this topic, "Data Structure for Machine Learning", we will discuss
various concepts of data structure used in Machine Learning, along with the
relationship between data structure and ML. So, let's start with a quick overview of
Data structure and Machine Learning.
In other words, the data structure is the collection of data type 'values' which are
stored and organized in such a way that it allows for efficient access and
modification.
There are two different types of data structures: Linear and Non-linear data
structures.
Now let's discuss popular data structures used for Machine Learning:
Array:
An array is one of the most basic and common data structures used in Machine
Learning. It is also used in linear algebra to solve complex mathematical problems.
You will use arrays constantly in machine learning, whether it's:
An array contains index numbers to represent an element starting from 0. The lowest
index is arr[0] and corresponds to the first element.
Let's take an example of a Python array used in machine learning. Although the
Python array is quite different from than array in other programming languages, the
Python list is more popular as it includes the flexibility of data types and their length.
If anyone is using Python in ML algorithms, then it's better to kick your journey from
array initially.
Method Description
Count() It returns the count or total available element with an integer value.
Extend() It is used to add the element of a list to the end of the current list.
Index() It returns the index of the first element with the specified value.
Pop() It is used to remove an element from a specified position using an index number.
Stacks:
Stacks are based on the concept of LIFO (Last in First out) or FILO (First In Last
Out). It is used for binary classification in deep learning. Although stacks are easy
to learn and implement in ML models but having a good grasp can help in many
computer science aspects such as parsing grammar, etc.
Stacks enable the undo and redo buttons on your computer as they function similar
to a stack of blog content. There is no sense in adding a blog at the bottom of the
stack. However, we can only check the most recent one that has been added.
Addition and removal occur at the top of the stack.
Linked List:
A linked list is the type of collection having several separately allocated
nodes. Or in other words, a list is the type of collection of data elements that
consist of a value and pointer that point to the next node in the list.
In a linked list, insertion and deletion are constant time operations and are very
efficient, but accessing a value is slow and often requires scanning. So, a linked list is
very significant for a dynamic array where the shifting of elements is required.
Although insertion of an element can be done at the head, middle or tail position, it
is relatively cost consuming. However, linked lists are easy to splice together and split
apart. Also, the list can be converted to a fixed-length array for fast access.
Queue:
A Queue is defined as the "FIFO" (first in, first out). It is useful to predict a queuing
scenario in real-time programs, such as people waiting in line to withdraw cash in the
bank. Hence, the queue is significant in a program where multiple lists of codes need
to be processed.
The queue data structure can be used to record the split time of a car in F1 racing.
1) Trees
Binary Tree:
The concept of a binary tree is very much similar to a linked list, but the only
difference of nodes and their pointers. In a linked list, each node contains a data
value with a pointer that points to the next node in the list, whereas; in a binary
tree, each node has two pointers to subsequent nodes instead of just one.
Binary trees are sorted, so insertion and deletion operations can be easily done with
O(log N) time complexity. Similar to the linked list, a binary tree can also be
converted to an array on the basis of tree sorting.
In a binary tree, there are some child and parent nodes shown in the above image.
Where the value of the left child node is always less than the value of the parent
node while the value of the right-side child nodes is always more than the parent
node. Hence, in a binary tree structure, data sorting is done automatically, which
makes insertion and deletion efficient.
2) Graphs
A graph data structure is also very much useful in machine learning for link
prediction. Graphs are directed or undirected concepts with nodes and ordered or
unordered pairs. Hence, you must have good exposure to the graph data structure
for machine learning and deep learning.
3) Maps
Maps are the popular data structure in the programming world, which are mostly
useful for minimizing the run-time algorithms and fast searching the data. It stores
data in the form of (key, value) pair, where the key must be unique; however, the
value can be duplicated. Each key corresponds to or maps a value; hence it is named
a Map.
In different programming languages, core libraries have built-in maps or, rather,
HashMaps with different names for each implementation.
o In Java: Maps
o In Python: Dictionaries
o C++: hash_map, unordered_map, etc.
Python Dictionaries are very useful in machine learning and data science as various
functions and algorithms return the dictionary as an output. Dictionaries are also
much used for implementing sparse matrices, which is very common in Machine
Learning.
Ordering in a heap DS is applied along the hierarchy but not across it, where the
value of the parent node is always more than that of child nodes either on the left or
right side.
Here, the insertion and deletion operations are performed on the basis of promotion.
It means, firstly, the element is inserted at the highest available position. After that, it
gets compared with its parent and promoted until it reaches the correct ranking
position. Most of the heaps data structures can be stored in an array along with the
relationships between the elements.
When we use machine learning for solving a problem, we need to evaluate the
model performance, i.e., which model is fastest and requires the smallest amount of
space and resources with accuracy. Moreover, if a model is built using algorithms,
comparing and contrasting two algorithms to determine the best for the job is
crucial to the machine learning professional. For such cases, skills in data structures
become important for ML professionals.
With the knowledge of data structure and algorithms with ML, we can answer the
following questions easily:
Conclusion
In this article, we have discussed how Data structure is helpful in building Machine
Learning algorithms. A data structure is a key player in the programming world to
solve most of the computing problems, and gaining the knowledge of data structure
and implementing the best algorithm gives you the best and optimum solution for
an ML problem. Further, having a strong knowledge of data structure will help you to
build a strong foundation and use the skills to create a better Project in Machine
Learning.
What is Hypothesis?
The hypothesis is defined as the supposition or proposed explanation based on
insufficient evidence or assumptions. It is just a guess based on some known facts
but has not yet been proven. A good hypothesis is testable, which results in either
true or false.
Example: Let's understand the hypothesis with a common example. Some scientist
claims that ultraviolet (UV) light can damage the eyes then it may also cause
blindness.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In this example, a scientist just claims that UV rays are harmful to the eyes, but we
assume they may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.
There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:
It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.
Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.
Hence hypothesis (h) can be concluded as a single hypothesis that maps input to
proper output and can be evaluated as well as used to make predictions.
The hypothesis (h) can be formulated in machine learning as follows:
y= mx + b
Where,
Y: Range
m: Slope of the line which divided test data or changes in y divided by change in x.
x: domain
c: intercept (constant)
Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:
Now, assume we have some test data by which ML algorithms predict the outputs for
input as follows:
If we divide this coordinate plane in such as way that it can help you to predict
output or result as follows:
Based on the given test data, the output result will be as follows:
However, based on data, algorithm, and constraints, this coordinate plane can also
be divided in the following ways as follows:
Hypothesis space (H) is the composition of all legal best possible ways to divide the
coordinate plane so that it best maps input to proper output.
Further, each individual best possible way is called a hypothesis (h). Hence, the
hypothesis and hypothesis space would be like this:
Hypothesis in Statistics
Similar to the hypothesis in machine learning, it is also considered an assumption of
the output. However, it is falsifiable, which means it can be failed in the presence of
sufficient evidence.
Significance level
The significance level is the primary thing that must be set before starting an
experiment. It is useful to define the tolerance of error and the level at which effect
can be considered significantly. During the testing process in an experiment, a 95%
significance level is accepted, and the remaining 5% can be neglected. The
significance level also tells the critical or threshold value. For e.g., in an experiment, if
the significance level is set to 98%, then the critical value is 0.02%.
P-value
The p-value in statistics is defined as the evidence against a null hypothesis. In other
words, P-value is the probability that a random chance generated the data or
something else that is equal or rarer under the null hypothesis condition.
If the p-value is smaller, the evidence will be stronger, and vice-versa which means
the null hypothesis can be rejected in testing. It is always represented in a decimal
form, such as 0.035.
Whenever a statistical test is carried out on the population and sample to find out P-
value, then it always depends upon the critical value. If the p-value is less than the
critical value, then it shows the effect is significant, and the null hypothesis can be
rejected. Further, if it is higher than the critical value, it shows that there is no
significant effect and hence fails to reject the Null Hypothesis.
Conclusion
In the series of mapping instances of inputs to outputs in supervised machine
learning, the hypothesis is a very useful concept that helps to approximate a target
function in machine learning. It is available in all analytics domains and is also
considered one of the important factors to check whether a change should be
introduced or not. It covers the entire training data sets to efficiency as well as the
performance of the models.
Hence, in this topic, we have covered various important concepts related to the
hypothesis in machine learning and statistics and some important parameters such
as p-value, significance level, etc., to understand hypothesis concepts in a better way.
Generative Learning Algorithms, on the other hand, take a different approach. They
try to capture each class distribution separately rather than finding a boundary
between classes. A Generative Learning Algorithm, as mentioned, will examine the
distribution of infected and healthy patients separately. It will then attempt to learn
each distribution's features individually. When a new example is presented, it will be
compared to both distributions, and the class that it most closely resembles will be
assigned, P(X|y) for a given P(y) here, P(y) is known as a class prior.
These Bayes Theory predictions are used to predict generative learning algorithms:
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
By analysing only, the numbers of P(X|y) as well as P(y) in the specific class, we can
determine P(y), i.e., considering the characteristics of a sample, how likely is it that it
belongs to class "y".
Let's take a look at the case of a classification binary problem in which all datasets
have IID (Independently and identically distributed). To determine P(X|y), we can use
Multivariate Gaussian Distribution to calculate a probability density equation for
every particular class. In order to determine P(y) or the class prior for each class, we
can make use of the Bernoulli distribution since all sample data used in binary
classification could be 0 or 1.
Thus, Gaussian Discriminant Analysis works extremely well with a limited volume of
data (say several thousand examples) and may be more robust than Logistic
Regression if our fundamental assumptions regarding data distribution are correct.
Google
Instead of asking, "Which Google apps use ML?" we should ask, "Do any Google
Applications not use ML?" The answer is probably no! Google has a lot of money in
Machine Learning Research and plans eventually to integrate it into all its products.
Google's flagship products, Google Search and Google Translate use ML currently.
Google Search uses RankBrain, which is a deep neural net that assists in providing
relevant search results. RankBrain uses intelligent guesses to determine if our search
is for "Tim Cook" and if there are unique words or phrases in Google Search.
However, Google Translate analyses millions of documents and is able to identify the
most common patterns and vocabulary. Google Photos uses image recognition.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Google Photos uses image recognition. Deep Learning is used by Google Photos to
sort millions upon millions of images online in order to classify them better. Google
Assistant uses Image Recognition and Natural Language Processing, which allows it
to be multi-talented and answer our questions.
Facebook
Facebook is where we should go if we want to see our friends, watch celebrities, or
look at cat photos. Facebook has 2.41 billion Monthly Active Users! Machine Learning
is the only way to achieve this level of popularity. Facebook uses Machine Learning in
all aspects of its News Feed, including Targeted Advertising.
Facebook utilizes Facial recognition to recognize our friends and suggests their
names. A Machine Learning System analyses the pixels of an image to generate
unique templates for every face. The facial fingerprint can be used to identify the
face and suggest tags.
Twitter
Twitter is the best place to find interesting tweets, intelligent debates, and more!
Twitter is the best place to find out about current politics, global warming dangers,
and smart comments from celebrities. Guess how all those tweets are managed?
Machine Learning is the answer!
Twitter uses an ML algorithm for organizing our tweets. Tweets based on what we
like and tweets from family and friends will be given a higher priority and appear
higher in our feed. Tweets that receive a large number of retweets or likes will have a
higher chance of getting noticed. These tweets can be located in the "In case you
missed it" category. The tweets were previously arranged in reverse chronological
order. This is what some people want back. Twitter currently uses the Natural
Language Process capabilities by IBM Watson to find and delete abusive tweets.
Twitter uses deep learning to determine what's happening in the live feed. This is
achieved by training the neural network using tags to recognize images in videos.
Let's say we add the tags "Puppy", "Animal", "Poodle", "Husky" etc. The algorithm will
identify a dog in our video and use that information to identify other dogs in our
videos.
Baidu
Baidu Google for China! While this may not be the case, Baidu is the Chinese Search
Engine most often in comparison to Google. Like Google, it utilizes Machine Learning
in many of Baidu's applications, such as Baidu Search, as well as DuerOS, Baidu's
assistant for voice. The Xiaoyu Zaikia home robot, similar to Alexa, is also used.
Service. Baidu's Search Engine is the main focus, as 75% of Chinese use it. Machine
Learning Algorithms (HMLA) are used to recognize images and voice recognition.
This allows for the best possible (and also smarter!) service. Baidu also made
significant investments in natural language processing. This is evident in DuerOS.
DuerOS Baidu's Voice Assistant makes use of natural language processing, image,
and voice recognition to build an intelligent system that is able to have an entire
conversation while sounding human. The voice assistant makes use of ML to
understand the complexity of human speech and duplicates it in a flawless manner.
Baidu's NLP expertise is also applied to the Little Fish home robot, similar to Alexa
but different. It can turn its head in order to "listen" to the voice coming from the
other direction and then respond accordingly.
Pinterest
The users might have heard of Pinterest, whether they are a regular pinner or a
beginner. Pinterest allows us to pin images, videos, and GIFs we are interested in.
Since this app relies on images being saved from the internet, it makes sense that its
most important feature is to identify images.
Machine Learning is the answer! Pinterest uses Image Recognition algorithms for
identifying patterns in images we pin so similar images can be displayed when we
search them. Imagine we pin a green shirt. We will be able to view images of similar
green shirts by Image Recognition. Pinterest can't guarantee that these green shirts
will be fashionable!
The same approach was used to introduce Transfer learning into machine learning.
This involves using knowledge from a task to solve a problem in the target task.
Although most machine learning algorithms are designed for a single task, there is
an ongoing interest in developing transfer learning algorithms.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
One might wonder how to decide which layers are best to freeze and which ones to
train. It is easy to see that layers must be frozen if we wish to inherit features from a
pre-trained model. We need to find new species if the model that detected some
flowers is not working. A new dataset with new species will contain many similar
features to the model. Therefore, we keep fewer layers in order to make the most of
that model's knowledge. Consider another example: If a model that detects people in
images is already trained and we want to use this knowledge to detect cars in those
images, it's not a good idea to freeze many layers. This is because high-level features
such as noses, eyes, mouth, etc., will be lost, making them useless for the new
dataset (car detection). We only use low-level features of the base network to train
the network using a new dataset.
Let's look at all scenarios where the target task size and data set differ from the base
network.
o The target dataset is smaller than the base network data: Because the
target dataset is so small, we can fine-tune our pre-trained network using this
target dataset. This could lead to overfitting. There may also be changes in the
number of classes for the target task. In such cases, we may need to remove
some layers that are not fully connected from the end and add a new layer
that is fully connected. We now freeze the rest of our model and train only
newly added layers.
o The target dataset is large, similar to the base training dataset: If the
dataset is large enough to hold a pre-trained model, there won't be any
chance of overfitting. This is where the last fully connected layer is removed,
and a new fully connected layer with the correct number of classes is added.
The entire model is now trained on a new dataset. This allows the model to be
tuned on a large new dataset while keeping the architecture unchanged.
o The target dataset is smaller than the base network data, and therefore,
it is different: The target dataset is unique, so pre-trained models with high-
level features will not work. We can remove the most layers from the end of a
pre-trained model and add layers that satisfy the number of classes in the new
dataset. We can then use the low-level features of the pre-trained model to
train the remaining layers to adapt to a new dataset. Sometimes it can be
beneficial to train the entire network, even after adding a layer at the end.
o The target dataset is larger than the base network data: As the target
network is complex and diverse, it is best to remove layers from pre-trained
networks and add layers that satisfy a number of classes. Then train the entire
network without freezing any layers.
Conclusion
Transfer learning can be a quick and effective way to solve a problem. Transfer
learning gives us the direction to go. Most of the best results can be achieved by this
method.
This can be used to project the features of higher dimensional space into lower-
dimensional space in order to reduce resources and dimensional costs. In this topic,
"Linear Discriminant Analysis (LDA) in machine learning”, we will discuss the LDA
algorithm for classification predictive modeling problems, limitation of logistic
regression, representation of linear Discriminant analysis model, how to make a
prediction using LDA, how to prepare data for LDA, extensions to LDA and much
more. So, let's start with a quick introduction to Linear Discriminant Analysis (LDA) in
machine learning.
Note: Before starting this topic, it is recommended to learn the basics of Logistic
Regression algorithms and a basic understanding of classification problems in machine
learning as a prerequisite
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
To overcome the overlapping issue in the classification process, we must increase the
number of features regularly.
Example:
Let's assume we have to classify two different classes having two sets of data points
in a 2-dimensional plane as shown below image:
However, it is impossible to draw a straight line in a 2-d plane that can separate
these data points efficiently but using linear Discriminant analysis; we can
dimensionally reduce the 2-D plane into the 1-D plane. Using this technique, we can
also maximize the separability between multiple classes.
Let's consider an example where we have two classes in a 2-D plane having an X-Y
axis, and we need to classify them efficiently. As we have already seen in the above
example that LDA enables us to draw a straight line that can completely separate the
two classes of the data points. Here, LDA uses an X-Y axis to create a new axis by
separating them using a straight line and projecting data onto a new axis.
Hence, we can maximize the separation between these classes and reduce the 2-D
plane into 1-D.
To create a new axis, Linear Discriminant Analysis uses the following criteria:
Using the above two conditions, LDA generates a new axis in such a way that it can
maximize the distance between the means of the two classes and minimizes the
variation within each class.
In other words, we can say that the new axis will increase the separation between the
data points of the two classes and plot them onto the new axis.
Why LDA?
o Logistic Regression is one of the most popular classification algorithms that
perform well for binary classification but falls short in the case of multiple
classification problems with well-separated classes. At the same time, LDA
handles these quite efficiently.
o LDA can also be used in data pre-processing to reduce the number of
features, just as PCA, which reduces the computing cost significantly.
o LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to
extract useful data from different faces. Coupled with eigenfaces, it produces
effective results.
o Face Recognition
Face recognition is the popular application of computer vision, where each
face is represented as the combination of a number of pixel values. In this
case, LDA is used to minimize the number of features to a manageable
number before going through the classification process. It generates a new
template in which each dimension consists of a linear combination of pixel
values. If a linear combination is generated using Fisher's linear discriminant,
then it is called Fisher's face.
o Medical
In the medical field, LDA has a great application in classifying the patient
disease on the basis of various parameters of patient health and the medical
treatment which is going on. On such parameters, it classifies disease as mild,
moderate, or severe. This classification helps the doctors in either increasing
or decreasing the pace of the treatment.
o Customer Identification
In customer identification, LDA is currently being applied. It means with the
help of LDA; we can easily identify and select the features that can specify the
group of customers who are likely to purchase a specific product in a
shopping mall. This can be helpful when we want to identify a group of
customers who mostly purchase a product in a shopping mall.
o For Predictions
LDA can also be used for making predictions and so in decision making. For
example, "will you buy this product” will give a predicted result of either one
or two possible classes as a buying or not.
o In Learning
Nowadays, robots are being trained for learning and talking to simulate
human work, and it can also be considered a classification problem. In this
case, LDA builds similar groups on the basis of different parameters, including
pitches, frequencies, sound, tunes, etc.
o PCA is an unsupervised algorithm that does not care about classes and labels
and only aims to find the principal components to maximize the variance in
the given dataset. At the same time, LDA is a supervised algorithm that aims
to find the linear discriminants to represent the axes that maximize separation
between different classes of data.
o LDA is much more suitable for multi-class classification tasks compared to
PCA. However, PCA is assumed to be an as good performer for a
comparatively small sample size.
o Both LDA and PCA are used as dimensionality reduction techniques, where
PCA is first followed by LDA.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
There are 3 most common ensemble learning methods in machine learning. These
are as follows:
o Bagging
o Boosting
o Stacking
1. Bagging
Bagging is a method of ensemble modeling, which is primarily used to solve
supervised machine learning problems. It is generally completed in two steps as
follows:
Example: In the Random Forest method, predictions from multiple decision trees are
ensembled parallelly. Further, in regression problems, we use an average of these
predictions to get the final output, whereas, in classification problems, the model is
selected as the predicted class.
2. Boosting
Boosting is an ensemble method that enables each member to learn from the
preceding member's mistakes and make better predictions for the future. Unlike the
bagging method, in boosting, all base learners (weak) are arranged in a sequential
format so that they can learn from the mistakes of their preceding learner. Hence, in
this way, all weak learners get turned into strong learners and make a better
predictive model with significantly improved performance.
3. Stacking
Stacking is one of the popular ensemble modeling techniques in machine
learning. Various weak learners are ensembled in a parallel manner in such a
way that by combining them with Meta learners, we can predict better
predictions for the future.
Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of
two or more base/learner's models and a meta-model that combines the predictions
of the base models. These base models are called level 0 models, and the meta-
model is known as the level 1 model. So, the Stacking ensemble method
includes original (training) data, primary level models, primary level prediction,
secondary level model, and final prediction. The basic architecture of stacking can
be represented as shown below the image.
o Original data: This data is divided into n-folds and is also considered test
data or training data.
o Base models: These models are also referred to as level-0 models. These
models use training data and provide compiled predictions (level-0) as an
output.
o Level-0 Predictions: Each base model is triggered on some training data and
provides different predictions, which are known as level-0 predictions.
o Meta Model: The architecture of the stacking model consists of one meta-
model, which helps to best combine the predictions of the base models. The
meta-model is also known as the level-1 model.
o Level-1 Prediction: The meta-model learns how to best combine the
predictions of the base models and is trained on different predictions made
by individual base models, i.e., data not used to train the base models are fed
to the meta-model, predictions are made, and these predictions, along with
the expected outputs, provide the input and output pairs of the training
dataset used to fit the meta-model.
o Split training data sets into n-folds using the RepeatedStratifiedKFold as this
is the most common approach to preparing training datasets for meta-
models.
o Now the base model is fitted with the first fold, which is n-1, and it will make
predictions for the nth folds.
o The prediction made in the above step is added to the x1_train list.
o Repeat steps 2 & 3 for remaining n-1folds, so it will give x1_train array of size
n,
o Now, the model is trained on all the n parts, which will make predictions for
the sample data.
o Add this prediction to the y1_test list.
o In the same way, we can find x2_train, y2_test, x3_train, and y3_test by using
Model 2 and 3 for training, respectively, to get Level 2 predictions.
o Now train the Meta model on level 1 prediction, where these predictions will
be used as features for the model.
o Finally, Meta learners can now be used to make a prediction on test data in
the stacking model.
Voting ensembles:
This is one of the simplest stacking ensemble methods, which uses different
algorithms to prepare all members individually. Unlike the stacking method, the
voting ensemble uses simple statistics instead of learning how to best combine
predictions from base models separately.
The voting ensemble differs from than stacking ensemble in terms of weighing
models based on each member's performance because here, all models are
considered to have the same skill levels.
Member Assessment: In the voting ensemble, all members are assumed to have the
same skill sets.
Combine with Model: Instead of using combined prediction from each member, it
uses simple statistics to get the final prediction, e.g., mean or median.
Weighted Average Ensemble
The weighted average ensemble is considered the next level of the voting ensemble,
which uses a diverse collection of model types as contributing members. This
method uses some training datasets to find the average weight of each ensemble
member based on their performance. An improvement over this naive approach is to
weigh each member based on its performance on a hold-out dataset, such as a
validation set or out-of-fold predictions during k-fold cross-validation. Furthermore,
it may also involve tuning the coefficient weightings for each model using an
optimization algorithm and performance on a holdout dataset.
Combine With Model: It considers the weighted average of prediction from each
member separately.
Blending Ensemble:
Blending is a similar approach to stacking with a specific configuration. It is
considered a stacking method that uses k-fold cross-validation to prepare out-of-
sample predictions for the meta-model. In this method, the training dataset is first to
split into different training sets and validation sets then we train learner models on
the training sets. Further, predictions are made on the validation set and sample set,
where validation predictions are used as features to build a new model, which is later
used to make final predictions on the test set using the prediction values as features.
Combine With Model: Linear model (e.g., linear regression or logistic regression).
PlayNext
Unmute
/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Calculate the likelihood of the instance not being part of it for each class.
o After we calculate all classes, we review all the calculated values and pick the
smallest value.
o The most minimal value (lowest chance) is chosen because it has the lowest
chance that it does not belong to the class in question. This means it is most
likely to be part of the class. This is why this class is chosen.
Let's consider an example: For instance, there are two types of classes: Apples and
Bananas, and we need to determine if a sentence is connected to bananas or apples
in light of that the word frequency is a particular number of words. Here is a table-
based representation of the basic dataset:
1 2 1 1 0 0 Apples
2 1 1 3 9 6 Bananas
3 3 4 0 0 1 Apples
4 2 3 1 1 0 Apples
Bayes Theorem can be utilized to calculate the likelihood of an given that another
event takes place. The formula is:
In the case of A and B being two events, P(A) is the likelihood of occurring of A. P(A|
B) is the chance for A occurring in the event that has already occurred. P(B) means
that the chance of an event happening can't be zero since it already happened.
Now let's look at the way Naive Bayes is used and how Complement Naive Bayes
operates. The standard Naive Bayes algorithm works:
where "fi" is frequency of some attribute. For instance, the number of times specific
words appear in the same sentence.
When we have a close examine the formulae, we will notice how the complement
Naive Bayes is essentially the opposite of normal Naive Bayes. CNB formula will be
the class that is predicted. In Naive Bayes, the class that has the largest value derived
by the formula will be the one that will be predicted. Also, as Complement Naive
Bayes is just the reverse of the CNB formula, the class with the lowest value
calculated by the CNB formula is the predicted class.
Now, let's look at an example of a shopper and attempt to model it by using our
CNB and our data,
2 2 0 1 1 ?
It is necessary to evaluate the numbers and choose the expected class with the lower
value. It is necessary to do this for bananas and select the class with the lowest value.
i.e., If we have a value of (y equals Apples) is lower than it is predicted to be Apples;
however, if it is the case that (y = bananas) is less than the value for (y = Apples), the
class is forecast as Bananas.
Utilizing this formula, we can use the Complement Naive Bayes Formula for both
classes.
In the present, as 5.601 < 75.213, The predicted class would be Apples.
We don't employ the class with the highest value since higher values mean there is a
higher probability the sentence that contains these words is not related to that class.
This is the reason this algorithm is referred to as "complement" Naive Bayes.
To assess our model, we'll verify the accuracy of the test set as well as the report on
the classification of the classifier. We will utilize the scikit-learn library for
implementing our Complement Naive Bayes algorithm.
Code:
Classifier Report :
accuracy 0.60 45
macro avg 0.41 0.60 0.49 45
weighted avg 0.40 0.60 0.48 45
We can get an accuracy rate of 65.41 percent for the training set, and the accuracy is
60.00 percent on the testing set. These are the same and quite high given the high
quality of the data. The data is known because it is difficult to identify using simple
classifiers like those we've applied in this case. So, the accuracy is acceptable.
Conclusion
We now know the basics of Complement Naive Bayes classifiers and how they
function when we find ourselves in an unbalanced dataset, test employing
Complement Naive Bayes.
In this tutorial we will train an Iris Species classification classifier and then deploy the
model with Streamlit, an open-source app framework that allows us to deploy ML
models easily.
Streamlit Library:
Streamlit allows us to create apps for our machine-learning project with simple
Python scripts. Hot reloading is also supported, so our app can be updated live while
we edit and save our file. Streamlit API allows us to create an app in a few lines of
code (as we'll see below). Declaring a variable is the same thing as adding a widget.
We don't need to create a backend, handle HTTP requests or define different routes.
It's easy to set up and maintain.
First, we will train the model. As the primary purpose of this tutorial, we will not be
doing much pre-processing.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Dataset:
Output:
We will now drop the Id column as it is not necessary for Iris species classification.
Next, we will divide the data into a training and testing set and use a Random Forest
Classifier. Any other classifier can also be used, such as logistic
regression or support vector machine.
Code:
Output:
To use this model for predicting unknown data, we must save it. A pickle is a tool
that serializes and deserializes a Python object structure.
Code:
A new file called "classifier1.pkl", will be created in the same directory. We can now
use Streamlit to deploy our model -
Code:
This command can be executed by entering the following command into the
terminal:
Output:
app1.py is where the Streamlit code was written.
After the website opens in our browser, we can then test it. We can also use this
method to deploy deep learning and machine-learning models.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Centroid Based Methods:
This is the most basic of the algorithms for iterative clustering in which clusters are
formed due to the proximity of points of information to the centre of the cluster. In
this case, the cluster's centre, i.e., the centroid, is constructed in a way that the
distance between data points is minimal with the centre. This is the most basic of the
NP-Hard challenges, and therefore solutions are usually constructed over several
trials.
For example: K- which is a reference to the algorithm, can be one of the most
popular instances of the algorithm.
The main issue in this algorithm is that we have to define K prior to the start of the
process. The algorithm also has issues when dense clusters are based on density.
The choice of the distance functions is a matter of personal preference. It's not a
simple parting out of the data set; rather, it offers an extensive array of clusters that
merge at specific distances. These models are easy to understand, but they lack the
ability to scale.
Density Models:
This model of clustering will search the data space to find areas with the various
amount of data points that are in this data area. It will separate different density
areas according to the different densities that exist within the space of data.
There are two types of subspace clustering based on their search strategies.
Conclusion
In this tutorial, we have discussed different types of methods used for clustering
algorithms, which can be used for differentiating the attribute values.
What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of
various unsupervised machine learning algorithms, which is used to determine
the local maximum likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical models. Further, it is a
technique to find maximum likelihood estimation when the latent variables are
present. It is also referred to as the latent variable model.
Key Points:
EM Algorithm
The EM algorithm is the combination of various unsupervised ML algorithms, such as
the k-means clustering algorithm. Being an iterative approach, it consists of two
modes. In the first mode, we estimate the missing or latent variables. Hence it is
referred to as the Expectation/estimation step (E-step). Further, the other mode is
used to optimize the parameters of the models so that it can explain the data more
clearly. The second mode is known as the maximization-step or M-step.
o Expectation step (E - step): It involves the estimation (guess) of all missing
values in the dataset so that after completing this step, there should not be
any missing value.
o Maximization step (M - step): This step involves the use of estimated data in
the E-step and updating the parameters.
o Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data of the latent variables and then use that data to
update the values of the parameters in the M-step.
Steps in EM Algorithm
The EM algorithm is completed mainly in 4 steps, which include Initialization Step,
Expectation Step, Maximization Step, and convergence Step. These steps are
explained as follows:
o 1st Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption that
data is obtained from a specific model.
Let's understand a case where we have a dataset with multiple data points generated
by two different processes. However, both processes contain a similar Gaussian
probability distribution and combined data. Hence it is very difficult to discriminate
which distribution a given point may belong to.
The processes used to generate the data point represent a latent variable or
unobservable data. In such cases, the Estimation-Maximization algorithm is one of
the best techniques which helps us to estimate the parameters of the gaussian
distributions. In the EM algorithm, E-step estimates the expected value for each
latent variable, whereas M-step helps in optimizing them significantly using the
Maximum Likelihood Estimation (MLE). Further, this process is repeated until a good
set of latent values, and a maximum likelihood is achieved that fits the data.
Applications of EM algorithm
The primary aim of the EM algorithm is to estimate the missing data in the latent
variables through observed data in datasets. The EM algorithm or latent variable
model has a broad range of real-life applications in machine learning. These are as
follows:
Advantages of EM algorithm
o It is very easy to implement the first two basic steps of the EM algorithm in
various machine learning problems, which are E-step and M- step.
o It is mostly guaranteed that likelihood will enhance after each iteration.
o It often generates a solution for the M-step in the closed form.
Disadvantages of EM algorithm
o The convergence of the EM algorithm is very slow.
o It can make convergence for the local optima only.
o It takes both forward and backward probability into consideration. It is
opposite to that of numerical optimization, which takes only forward
probabilities.
Conclusion
In real-world applications of machine learning, the expectation-maximization (EM)
algorithm plays a significant role in determining the local maximum likelihood
estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables
in statistical models. It is often used for the latent variables, i.e., to estimate the latent
variables through observed data in datasets. It is generally completed in two
important steps, i.e., the expectation step (E-step) and the Maximization step (M-
Step), where E-step is used to estimate the missing data in datasets, and M-step is
used to update the parameters after the complete data is generated in E-step.
Further, the importance of the EM algorithm can be seen in various applications such
as data clustering, natural language processing (NLP), computer vision, image
reconstruction, structural engineering, etc.
The ML pipeline is a high-level API for MLlib within the "spark.ml" package. A typical
pipeline contains various stages. However, there are two main pipeline stages:
1. Transformer: It takes a dataset as an input and creates an augmented dataset
as output. For example, A tokenizer works as Transformer, which takes a text
dataset, and transforms it into tokenized words.
2. Estimator: An estimator is an algorithm that fits on the input dataset to
generate a model, which is a transformer. For example, regression is an
Estimator that trains on a dataset with labels and features and produces a
logistic regression model.
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
In ML workflow, all these steps are run together with the same script. It means the
same script will be used to extract data, clean data, model, and deploy. However, it
may generate issues while trying to scale an ML model. These issues involve:
o If we need to deploy multiple versions of the same model, we need to run the
complete workflow cycle multiple times, even when the very first step, i.e.,
ingestion and preparation, are exactly similar in each model.
o If we want to expand our model, we need to copy and paste the code from
the beginning of the process, which is an inefficient and bad way of software
development.
o If we want to change the configuration of any part of the workflow, we need
to do it manually, which is a much more time-consuming process.
For solving all the above problems, we can use a Machine learning pipeline. With the
ML pipeline, each part of the workflow acts as an independent module. So
whenever we need to change any part, we can choose that specific module and use
that as per our requirement.
To solve such issues, ML pipelines can be used, which can remember and automate
the complete pre-processing steps in the same order.
1. Data Ingestion
Each ML pipeline starts with the Data ingestion step. In this step, the data is
processed into a well-organized format, which could be suitable to apply for further
steps. This step does not perform any feature engineering; rather, this may perform
the versioning of the input data.
2. Data Validation
The next step is data validation, which is required to perform before training a new
model. Data validation focuses on statistics of the new data, e.g., range, number of
categories, distribution of categories, etc. In this step, data scientists can detect if any
anomaly present in the data. There are various data validation tools that enable us to
compare different datasets to detect anomalies.
3. Data Pre-processing
Data pre-processing is one of the most crucial steps for each ML lifecycle as well as
the pipeline. We cannot directly input the collected data to train the model without
pr-processing it, as it may generate an abrupt result.
The pre-processing step involves preparing the raw data and making it suitable for
the ML model. The process includes different sub-steps, such as Data cleaning,
feature scaling, etc. The product or output of the data pre-processing step becomes
the final dataset that can be used for model training and testing. There are different
tools in ML for data pre-processing that can range from simple Python scripts to
graph models.
However, there could be some difficulties with larger models or with large training
data sets. So, for this, efficient distribution of the model training or model tuning is
required.
This issue of the model training stage can be solved with pipelines as they are
scalable, and a large number of models can be processed concurrently.
5. Model Analysis
After model training, we need to determine the optimal set of parameters by using
the loss of accuracy metrics. Apart from this, an in-depth analysis of the model's
performance is crucial for the final version of the model. The in-depth analysis
includes calculating other metrics such as precision, recall, AUC, etc. This will also
help us in determining the dependency of the model on features used in training and
explore how the model's predictions would change if we altered the features of a
single training example.
6. Model Versioning
The model versioning step keeps track of which model, set of hyperparameters, and
datasets have been selected as the next version to be deployed. For various
situations, there could occur a significant difference in model performance just by
applying more/better training data and without changing any model parameter.
Hence, it is important to document all inputs into a new model version and track
them.
7. Model Deployment
After training and analyzing the model, it's time to deploy the model. An ML model
can be deployed in three ways, which are:
However, the common way to deploy the model is using a model server. Model
servers allow to host multiple versions simultaneously, which helps to run A/B tests
on models and can provide valuable feedback for model improvement.
8. Feedback Loop
Each pipeline forms a closed-loop to provide feedback. With this close loop, data
scientists can determine the effectiveness and performance of the deployed models.
This step could be automated or manual depending on the requirement. Except for
the two manual review steps (the model analysis and the feedback step), we
can automate the entire pipeline.
o Unattended runs
The pipeline allows to schedule different steps to run in parallel in a reliable
and unattended way. It means you can focus on other tasks simultaneously
when the process of data modeling and preparation is going on.
o Easy Debugging
Using pipeline, there is a separate function for each task(such as different
functions for data cleaning and data modeling). Therefore, it becomes easy to
debug the complete code and find out the issues in a particular step.
o Easy tracking and versioning
We can use a pipeline to explicitly name and version the data sources, inputs,
and output rather than manually tracking data and outputs for each iteration.
o Fast execution
As we discussed above, in the ML pipeline, each part of the workflow acts as
an independent element, which allows the software to run faster and generate
an efficient and high-quality output.
o Collaboration
Using pipelines, data scientists can collaborate over each phase of the ML
design process and can also work on different pipeline steps simultaneously.
o Reusability
We can create pipeline templates for particular scenarios and can reuse them
as per requirement. For example, creating a template for retraining and batch
scoring.
o Heterogeneous Compute
We can use multiple pipelines which are reliably coordinated over
heterogeneous computer resources as well as different storage locations. It
allows making efficient use of resources by running separate pipelines steps
on different computing resources, e.g., GPUs, Data Science VMs, etc.
ML Pipeline Tools
There are different tools in Machine learning for building a Pipeline. Some are given
below along with their usage:
Steps while building Tools
the pipeline
o Reinforcement learning does not require any labeled data for the learning
process. It learns through the feedback of action performed by the agent.
Moreover, in reinforcement learning, agents also learn from past experiences.
o Reinforcement learning methods are used to solve tasks where decision-
making is sequential and the goal is long-term, e.g., robotics, online chess, etc.
o Reinforcement learning aims to get maximum positive feedback so that they
can improve their performance.
o Reinforcement learning involves various actions, which include taking action,
changing/unchanged state, and getting feedback. And based on these actions,
agents learn and explore the environment.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Coal mining:
Let's suppose people A and B are digging in a coal mine in the hope of getting a
diamond inside it. Person B got success in finding the diamond before person A and
walks off happily. After seeing him, person A gets a bit greedy and thinks he too
might get success in finding diamond at the same place where person B was digging
coal. This action performed by person A is called greedy action, and this policy is
known as a greedy policy. But person A was unknown because a bigger diamond
was buried in that place where he was initially digging the coal, and this greedy
policy would fail in this situation.
In this example, person A only got knowledge of the place where person B was
digging but had no knowledge of what lies beyond that depth. But in the actual
scenario, the diamond can also be buried in the same place where he was digging
initially or some completely another place. Hence, with this partial knowledge about
getting more rewards, our reinforcement learning agent will be in a dilemma on
whether to exploit the partial knowledge to receive some rewards or it should
explore unknown actions which could result in many rewards.
However, both these techniques are not feasible simultaneously, but this issue can be
resolved by using Epsilon Greedy Policy (Explained below).
There are a few other examples of Exploitation and Exploration in Machine Learning
as follows:
Example 1: Let's say we have a scenario of online restaurant selection for food
orders, where you have two options to select the restaurant. In the first option, you
can choose your favorite restaurant from where you ordered food in the past; this is
called exploitation because here, you only know information about a specific
restaurant. And for other options, you can try a new restaurant to explore new
varieties and tastes of food, and it is called exploration. However, food quality might
be better in the first option, but it is also possible that it is more delicious in another
restaurant.
Example 2: Suppose there is a game-playing platform where you can play chess with
robots. To win this game, you have two choices either play the move that you believe
is best, and for the other choice, you can play an experimental move. However, you
are playing the best possible move, but who knows new move might be more
strategic to win this game. Here, the first choice is called exploitation, where you
know about your game strategy, and the second choice is called exploration, where
you are exploring your knowledge and playing a new move to win the game.
As the agent start and learns more about the environment, the epsilon decreases by
some rate in the defined rate, so the likelihood of exploration becomes less and less
probable as the agent learns more and more about the environment. In such a case,
the agent becomes greedy for exploiting the environment.
To find if the agent will select exploration or exploitation at each step, we generate a
random number between 0 and 1 and compare it to the epsilon. If this random
number is greater than ε, then the next action would be decided by the exploitation
method. Else it must be exploration. In the case of exploitation, the agent will take
action with the highest Q-value for the current state.
Examples-
We can understand the above concept with rolling dice. Let's say the agent will
explore if the dice land on 1; otherwise, he will exploit. This method is called an
epsilon greedy action with the value of epsilon ε=1/6, which is the probability of
getting 1 on dice. It can be expressed as follows:
In the above formula, the action selected at attempt 't' will be a greedy action
(exploit) with probability 1- ε or maybe a random action (explore) with probability ε.
Notion of Regret
Whenever we do something and don't find the proper outcome, then regret our
decision as we have previously discussed an example of exploitation and exploration
for choosing a restaurant. For that example, if we choose a new restaurant instead of
our favorite, but the food quality and overall experience are poor, then we will regret
our decision and will consider what we paid for as a complete loss. Moreover, if we
order food from the same restaurant again, the regret level increases along with the
number of losses. However, reinforcement learning methods can reduce the amount
of loss and the level of regret.
Regret in Reinforcement Learning
Hence, the regret in reinforcement learning can be defined as the difference between
the reward generated by the optimal action a* multiplied by T and the sum from 1 to
T of each reward of arbitrary action. It can be expressed as follows:
Regret:LT=TE[r?a^* ]-∑[r|at]
Conclusion
Exploitation and exploration techniques in reinforcement machine learning have
enhanced various types of parameters such as improved performance, increased
learning rate, better decision making, etc. All these parameters are significant for
learning the agents in the reinforcement learning method. Further, the disadvantage
of exploitation and exploration techniques is that both require synchronization with
these parameters as well as the specific environment, which may cause more
supervision for reinforcement learning agents. This topic exposed some most used
exploration techniques used in reinforcement learning. From the above examples, we
can conclude that we must prefer exploration methods to reduce regrets and make
the learning process faster and more significant.