0% found this document useful (0 votes)
167 views46 pages

ML Unit V

The document is a syllabus for a Machine Learning course that covers the following topics: 1. The basics of machine learning including different types of learning and machine learning methods. 2. Specific machine learning methods like regression, classification, neural networks, decision trees, and probabilistic models. 3. Applying machine learning in practice including model performance measurement and popular machine learning tools and libraries. 4. Using machine learning for predictive data analytics and different learning approaches. 5. Applications of machine learning like image recognition, speech recognition, spam filtering, fraud detection, and medical diagnosis.

Uploaded by

Tharun Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views46 pages

ML Unit V

The document is a syllabus for a Machine Learning course that covers the following topics: 1. The basics of machine learning including different types of learning and machine learning methods. 2. Specific machine learning methods like regression, classification, neural networks, decision trees, and probabilistic models. 3. Applying machine learning in practice including model performance measurement and popular machine learning tools and libraries. 4. Using machine learning for predictive data analytics and different learning approaches. 5. Applications of machine learning like image recognition, speech recognition, spam filtering, fraud detection, and medical diagnosis.

Uploaded by

Tharun Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

SYLLABUS L T P C

1 0 0 1
AD8552 MACHINE LEARNING

UNIT I MACHINE LEARNING BASICS 8


Introduction to Machine Learning (ML) - Essential concepts of ML – Types of learning –
Machine learning methods based on Time – Dimensionality – Linearity and Non linearity –
Early trends in Machine learning – Data Understanding Representation and visualization.

UNIT II MACHINE LEARNING METHODS 11


Linear methods – Regression -Classification –Perceptron and Neural networks – Decision
trees –Support vector machines – Probabilistic models ––Unsupervised learning –
Featurization

UNIT III MACHINE LEARNING IN PRACTICE 9


Ranking – Recommendation System - Designing and Tuning model pipelines- Performance
measurement – Azure Machine Learning – Open-source Machine Learning libraries –
Amazon’s Machine Learning Tool Kit: Sagemaker

UNIT IV MACHINE LEARNING AND DATA ANALYTICS 9


Machine Learning for Predictive Data Analytics – Data to Insights to Decisions – Data
Exploration –Information based Learning – Similarity based learning – Probability based
learning – Error based learning – Evaluation – The art of Machine learning to Predictive
Data Analytics.

UNIT V APPLICATIONS OF MACHINE LEARNING 8


Image Recognition – Speech Recognition – Email spam and Malware Filtering – Online
fraud detection – Medical Diagnosis.

11
Lecture Notes – Unit 5
UNIT V APPLICATIONS OF MACHINE LEARNING 8

Image Recognition – Speech Recognition – Email spam and Malware Filtering – Online fraud

detection – Medical Diagnosis.

5.1 IMAGE RECOGNITION

Image recognition is the ability of AI to detect the object, classify, and


recognize it. The last step is close to the human level of image processing. The best
example of image recognition solutions is the face recognition – say, to unblock your
smartphone you have to let it scan your face. So first of all, the system has to detect the
face, then classify it as a human face and only then decide if it belongs to the owner of the
smartphone.
The major steps in image recognition process are gather and organize data,
build a predictive model and use it to recognize images.

Using Random Forests for Face Recognition

A popular dataset that we haven't talked much about yet is the Olivetti face
dataset. The Olivetti face dataset was collected in 1990 by AT&T Laboratories Cambridge.
The dataset comprises facial images of 40 distinct subjects, taken at different times and
under different lighting conditions. In addition, subjects varied their facial expression
(open/closed eyes, smiling/not smiling) and their facial details (glasses/no glasses).

Images were then quantized to 256 grayscale levels and stored as unsigned
8-bit integers. Because there are 40 distinct subjects, the dataset comes with 40 distinct
target labels. Recognizing faces thus constitutes an example of a multiclass classification
task.

Loading the dataset:


Like many other classic datasets, the Olivetti face dataset can be loaded using
scikit-learn:
Although the original images consisted of 92 x 112 pixel images, the version
available through scikit-learn contains images downscaled to 64 x 64 pixels.

To get a sense of the dataset, plot some example images. Let's pick eight indices
from the dataset in a random order:

We can plot these example images using Matplotlib, but we need to make sure we
reshape the column vectors to 64 x 64 pixel images before plotting:
You can see how all the faces are taken against a dark background and are upright. The
facial expression varies drastically from image to image, making this an interesting
classification problem.

Preprocessing the dataset

Before we can pass the dataset to the classifier, we need to preprocess it. Specifically, we
want to make sure that all example images have the same mean grayscale level:
We repeat this procedure for every image to make sure the feature values of every data point
(that is, a row in X) are centered around zero:

The preprocessed data can be visualized using the preceding code:


Training and testing the random forest

We continue to follow our best practice to split the data into training and test sets:

Then we are ready to apply a random forest to the data:

Here we want to create an ensemble with 50 decision trees:

Because we have a large number of categories (that is, 40), we want to make sure the
random forest is set up to handle them accordingly:

We can play with other optional arguments, such as the number of data points required in
a node before it can be split:
However, we might not want to limit the depth of each tree. This is again, a parameter we will
have to experiment with in the end. But for now, let's set it to a large integer value, making the
depth effectively unconstrained:

Then we can fit the classifier to the training data:

We can check the resulting depth of the tree using the following function:

This means that although we allowed the tree to go up to depth 1000, in the end only 25
layers were needed.
The evaluation of the classifier is done once again by predicting the labels first (y_hat) and then
passing them to the accuracy_score function:

We find 87% accuracy, which turns out to be much better than with a single decision tree:
We can play with the optional parameters to see if we get better. The most important one
seems to be the number of trees in the forest. We can repeat the experiment with a forest
made from 100 trees:

With this configuration, we get 91% accuracy!


5.2 SPEECH RECOGNITION
Speech Recognition incorporates computer science and linguistics to identify spoken words and
converts them into text. It allows computers to understand human language.

Speech Recognition

Speech recognition is a machine's ability to listen to spoken words and identify them. You
can then use speech recognition in Python to convert the spoken words into text, make a
query or give a reply. You can even program some devices to respond to these spoken
words. You can do speech recognition in python with the help of computer programs that
take in input from the microphone, process it, and convert it into a suitable form.

Speech recognition seems highly futuristic, but it is present all around you.
Automated phone calls allow you to speak out your query or the query you wish to be
assisted on; your virtual assistants like Siri or Alexa also use speech recognition to talk to
you seamlessly.

How Does Speech Recognition work?

Speech recognition in Python works with algorithms that perform linguistic


and acoustic modeling. Acoustic modeling is used to recognize phenones/phonetics in our
speech to get the more significant part of speech, as words and sentences.
Working of Speech Recognition

Speech recognition starts by taking the sound energy produced by the person
speaking and converting it into electrical energy with the help of a microphone. It then
converts this electrical energy from analog to digital, and finally to text.

It breaks the audio data down into sounds, and it analyzes the sounds using
algorithms to find the most probable word that fits that audio. All of this is done using Natural
Language Processing and Neural Networks. Hidden Markov models can be used to find
temporal patterns in speech and improve accuracy.

Picking and Installing a Speech Recognition Package

To perform speech recognition in Python, you need to install a speech recognition package to
use with Python. There are multiple packages available online. The table below outlines some
of these packages and highlights their specialty.
Picking and installing a speech recognition package

For this implementation, you will use the Speech Recognition package. It allows:
Easy speech recognition from the microphone.
Makes it easy to transcribe an audio file.
It also lets us save audio data into an audio file.
It also shows us recognition results in an easy-to-understand format.

Speech Recognition in Python: Converting Speech to Text

Now, create a program that takes in the audio as input and converts it to text.

Importing necessary modules

Let’s create a function that takes in the audio as input and converts it to text.

Converting speech to text


Now, use the microphone to get audio input from the user in real-time, recognize it, and
print it in text.

Converting audio input to text


5.3 SPAM FILTERING
Spam Detector is used to detect unwanted, malicious and virus infected texts and
helps to separate them from the non spam texts. It uses a binary type of classification
containing the labels such as ‘ham’ (non spam) and spam. Application of this can be seen in
Google Mail (GMAIL) where it segregates the spam emails in order to prevent them from getting
into the user’s inbox.
In this Machine Learning Spam Filtering application, we will develop a Spam
Detector app using Support Vector Machine (SVM) technique for classification and Natural
Language Processing. We will detect whether the piece of input text is “ham”(nonspam) or
“spam”. We will split our dataset into training and testing and then train our classifier with SVM
classifier.

What is a Support Vector Machine?


Support Vector Machine (SVM) is a supervised learning algorithm used for
classification and regression problems. The main objective of SVM is to find a hyperplane in an
N( total number of features)-dimensional space that differentiates the data points. So we need
to find a plane that creates the maximum margin between two data point classes.

Hyperplane and Support Vectors


Hyperplanes are nothing but a boundary that helps to separate and group the data
into particular classes. A Hyperplane in 2-Dimension is basically just a line. So the dimension of
the hyperplane is decided on the basis of the number of features in the dataset minus 1. So a
hyperplane in R2 is a line and in R3 is a plane.

Support Vectors are data points that are close to the hyperplane and it helps to maximize the
margin of the classifier. With the help of a support vector from both classes, we can form a
negative hyperplane and a positive hyperplane. So basically we want to maximize the distance
between the decision boundaries i.e. maximum margin hyperplane and support vectors from
both sides which will minimize the error.
The Kernel trick
SVM works very well on the linearly separable data i.e. the data points which
can be classified using a straight line. But the question is do we need to manually decide
which dimensional plane we are supposed to have for our dataset. So the answer is NO.
SVM has a technique called kernel trick which takes low dimensional input space and
converts into high dimensional space i.e. it converts non-separable problems into
separable problems.
What is Natural Language Processing (NLP)?
Human languages include unstructured forms of data i.e. texts and voices which cannot be
understood by computers. Natural Language Processing (NLP) is an Artificial Intelligence (AI)
field that enables computer programs to recognize, interpret, and manipulate human languages.

Prerequisites:
This project requires you to have a good knowledge of Python and Natural Language
Processing(NLP). Modules required for this project are pandas, pickle , sklearn , numpy and nltk.
You can install with using following command:

The versions which are used in this application for python and its corresponding modules are
as follows:
1) python : 3.8.5
2) sklearn : 0.24.2
3) pickle : 4.0
4) numpy : 1.19.5
5) pandas : 1.1.5
6) nltk : 3.2.5

Application Structure:

spam.csv: Dataset for our project. It contains Labels as “ham” or “spam” and Email Text.
spamdetector.py: This file is used to load the dataset and train our classifier.
training_data.pkl: This file contains a trained classifier in binary format which will be used
to predict the output.
SpamGui.py: Gui file for our project where we load the trained classifier and predict the
output for a given message.

Steps for developing a Spam Detector:

1) Import Libraries and initialize variables.


Firstly create a file called “spamdetector.py” and import all libraries which have been shared
in the prerequisites section.
2) Preprocessing the data

In this, we will use python’s pandas’ module to read the dataset file which we are using for
training and testing purposes. Then we will use the “message_X” variable to store features
(EmailText column) and “labels_Y” variable to store target (Label column) from our dataset.

After we get the features and targets from our dataset we will clean the data. Firstly, we will
filter out all the non-alphabetic characters like digits or symbols, and then using the natural
language processing module ‘nltk’ we will tokenize our messages.
Also we will stem all the words to their root words.
Stemming: Stemming is the process of reducing words into their root words.
For example, if the message contains some error word like “frei” which might be misspelled for
“free”. Stemmer will stem or reduce that error word to its root word i.e. “fre”. As a result, “fre”
is the root word for both “free” and “frei”.
3) Bag of Words (vectorization)
Bag of words or vectorization is the process of converting sentence words into binary vector
format. It is useful as models require data to be in numeric format. So if the word is present in
that particular sentence then we will put 1 otherwise 0. This can be achieved by
“TFidfVectorizer”.
Also, we will remove words that do not add much meaning to our sentence which in technical
terms are called “stopwords”. For example, these words might be vowels, articles, or some
common words. So for this, we will add a parameter called “stop_words” in “TFidfVectorizer”.

For labels, we will replace “ham” with 0 and “spam” with 1 as we can have only two outputs.

4) Training the model


We will split our dataset into 80:20 ratio where 80 for training and 20 for testing. For
classifying our data, we will use the Support Vector Machine classifier technique which we
have discussed above.
After training the classifier we will get the following information about best-fit parameters of
SVM for our dataset.

Kernel: The kernel is decided on the basis of data transformation. By default, the kernel is
the Radial Basis Function kernel (RBF). We can change it to linear or Polynomial depending on
our dataset.
C parameter: The c parameter is a regularization parameter that tells the classifier how
much misclassification to avoid. If the value of C is high then the classifier will fit training data
very well which might cause overfitting. A low C value might allow more
misclassification(errors) which can lead to lower accuracy for our classifier.
Gamma: Gamma is a nonlinear hyperplane parameter. High values indicate that data points
that are very close to each other can be grouped. A low value indicates that data points can
be grouped together even if they are separated by large distances.

Now, we will save our classifier and other variables in binary or byte stream-like object
format files using python’s ‘pickle’ module which we will use in the GUI file for prediction.

5) Prediction using Graphical User Interface


We will load our trained model to predict whether the message is “ham” or “spam”. Now
we’ll make a graphical user interface with Python’s Tkinter module and name the file as
“SpamGui.py”. The Tkinter library provides the quickest and easiest way to construct GUI
applications as it has several helpful libraries. Also load all the other required modules.
In this file, we will create a “SpamHam” Class which has a constructor where we will initialize
all variables and load ‘training_data.pkl’ which contains trained models and other variables
using python “pickle” module.
Load the main window for tkinter which contains all the required widgets.
Now we will do the same Preprocessing and vectorization process for input messages also which
we have discussed earlier.

And finally, after preprocessing and vectorizing the user input we will predict whether the
piece of message is “ham” or “spam”.
Machine Learning Spam Filtering Out])ut
Spam D tector X

Welcome to ProiectCurukul

Spam Or Ham ? Message Detector

Your message is: spam

PRIVATE! Your 2003 Account Statement for 0797378824


o shows 800 un-redeemed s. I. M. points. Call 08715203
649 Identifier Code: 40533 Expires 31/10/04

Check
5.4 CREDIT CARD FRAUD DETECTION
Nowadays most people prefer to do payments by cards and don’t like to carry cash with them.
That leads to an increase in the use of cards and also thereby frauds.
Credit card frauds are easy to do, as we know that E-commerce and many other online sites
have increased the online payment modes, Which affects increasing the risk of online frauds.
Credit card fraud is becoming the most common fraud people tend to do.

About Credit Card Fraud Detection Application:


We need to find anomalies in the system for the companies that have a lot of transactions with
the use of cards.
The application aims to build a credit card fraud detection model, which tells us if the
transaction made by the card is fraud or not. So basically we will use the transaction and their
labels as fraud or non-fraud to detect if the new transaction made by the customer is fraud or
not. To prevent customers from being charged for the items they did not purchase, It is
important for credit card companies to recognize fraudulent credit card transactions.

Dataset
In credit card fraud detection project, we will use the dataset which is a csv file. The dataset
consists of transactions that occurred in two days, where there are 492 frauds out of 284,807
transactions. The dataset is highly unbalanced i.e in this most of the transactions are actual
transactions not the fraud one.

Required Libraries for ML Credit Card Fraud Detection Project:


To create a “Credit card fraud detection project” you need to install some libraries in your
system. The libraries are:
Numpy
Pandas
Matplotlib
Seaborn
Sklearn
You can install it using pip, open your command prompt and type:
pip install numpy pandas matplotlib seaborn sklearn

At last, we will Train and Evaluate our Logistic Regression model.


2.) Load the dataset:

Load the dataset we have downloaded above which is creditcard.csv file.

3.) Analysing and Visualizing the dataset:


In this step we will analyze the dataset and perform certain operations to clean the data and
make it ready, to train our model. And also we will visualize the dataset.
credit_card_data.�eed(5}

r 3 J: H :: le ·s set? irs 5 f"GWS of _ datose :

edi card a.a. ad(5)


. .,'·.
nme V1 Vl Vl v• vs V6 V7 vs V9 - V21 Vl2 V2J V24
81 2 53534 l3 55 --0.33B311 0 rn83 0�
-0 0 1 i1 • 0 • • 0060018 -0 6 -00 -0 5 JJ�l86 0. 1012S8 ·033 S O 1
2 0 -1.3 • � -1 0163 1 13209 0. 811 --0.5031. •g o.m ·1 0. 4 .M 0. 6 • 0 ' 2 -0.689281
3 ; --0 01030'3 1 2 l O 23 609 ., �36 - 33 02� ·O 1 300 0.0052 t -0 1903'21
4 --0 II 193 0 5'92 D5m41 � 05-n 08 7m -0.�-.'31 . 9a2 -013 '58 0. '1fil 1

·f'O'tx3col s

credit card_data.tail()

credit_cerd_data.inf:()

!n s : I! dGtcset in-.,- iOt':

c.--..di _card_dau. fo(

<class 'pandas.c�. aae . Da t:af r aae • >


g�: 2:84887 ffltries, II o 284886
Dat:a coluans (toul 31 co.ua,.s):

--------------
Col...-. l01_..,,ll CIXW'IC
;

9 Tm 2il4887 non float64


l 28-!887 non t
2 V2 2.84807 nan n-t:64
3 V3 284807 non-,. ll float
2<14 floatfi4
5 V5 tfi4
6 V6 float64
7 V1 float&<
8 VB float64
9 V9 t64
18 Vl& fl.oat:64
u
u 2 float64
13 Vl3 64
l .807 non float64
l5 Vl5 fl t:64
u, 6 284807 non· float64-
17 7 807 non float64
18 Vl8 284887 non floa 64
19 284887 non flOil 64
29 vie 284807 non-r..ll float64
21 vn 284887 non-r, l fl0ilt64
22 7non-
23 V23 2.8481J7non
2 v2� non·
25 V25 2&=807 non-
26 6 23-i887 non-
27 V2.7 oat64
2.8 V2.l! float64
29 ..-au 64
30 Class 887 non-r ll in M
cltypH: floatt64(38), inttiA(l)
ae.ory sage: 67.
Now we will separate Normal and Fraud transactions, and analyze and visualize that fraud
and normal data :
We will perform some statistical analysis of the data:

We are just performing Exploratory data analysis, just follow along to understand the dataset
better. And make it better so that our model can detect fraud and normal transactions
accurately and efficiently.
4) Splitting the data:
After analyzing and visualizing our data, now we will split our dataset in X and Y or say in
features and labels:

5.) Creating Logistic Regression Model:


Now we will create the machine learning model.

6.) Model Evaluation:


After fitting the data into the model we have to perform model evaluation to check the
accuracy of the model.

As you can see the model we have created gives 95% accuracy on training data. The
accuracy is very good as we are training our model on very less data. So on considering that
our model accuracy is good.
Now evaluating our model on test data:
5.5 Medical Diagnosis using Machine Learning:

Detecting Parkinson’s Disease:

What is Parkinson’s Disease?


Parkinson’s disease is a central nervous system disorder. Its symptoms occur because of low
dopamine levels in the brain. Four Primary symptoms are tremor, rigidity, slow movement and
balance problems. Till now no cure for Parkinson’s Disease is known, treatment aims to reduce
the effects of the symptoms.

About Detecting Parkinson’s Disease Application:


In this Python machine learning application, we will build a model to detect Parkinson’s disease
using one of the Classifier techniques known as RandomForestClassifier as our output contains
only 1’s and 0’s. We’ll load the dataset, get the features and targets, split them into training and
testing sets, and finally pass them to RandomForestClassifier for prediction.

Project Prerequisites:
Install the following libraries using pip :

The versions which are used in this parkinson’s Disease Detection project for python and its
corresponding modules are as follows:
1) python: 3.8.5
2) numpy: 1.19.5
3) pandas: 1.1.5
4) matplotlib: 3.2.2
5) sklearn: 0.24.2

Application file Structure:


parkinsons.csv: dataset file for our project.
parkinsons.py: python file where we will train our model and predict the output.
Steps for detecting Parkinson’s disease:
1) Import Libraries
Firstly we will import all the required libraries which have been shared in the prerequisites
section.

2) Preprocessing
Read the ‘parkinsons.csv’ file in dataframe using the ‘pandas’ library
Fetch the features and targets from the dataframe. Features will be all columns except ‘name’
and ‘status’. Therefore we will drop these two columns. And our target will be ‘status’ column
which contains 0’s(no parkinson’s disease) and 1’s(has parkinson’s disease)

3) Normalization
We will scale our feature data in the range of -1(minimum value) and 1(maximum value).
Scaling is important because variables at different scales do not contribute equal fitting to the
model which may end up creating bias. For that we will be using ‘MinMaxScaler()’ to fit and
then transform the feature data.

4) Training and Testing


Split dataset into 80:20 ratio where 80 rows for training and 20 rows for testing purposes. For
this we will pass scaled features and target data to ‘train_test_split()’.

5) Building the classifier model


We will use Random forest Classifier for the classification of our data points. So let’s see what
random forest is.

What is a Random Forest Algorithm?


Random forest is a supervised learning algorithm. It can be used for both Classification and
Regression problems. It uses an ensemble learning method known as ‘bagging’ (Bootstrap
Aggregation) which is a process of combining multiple classifiers to solve a complex problem.
Random forest creates various random subsets of the given dataset and passes them to
different numbers of decision trees and takes the prediction from each tree. Based on the
majority votes of prediction, random forest takes the average to predict the output and also to
increase the accuracy and overall result. As there are greater numbers of trees in random
forests, it prevents the problem of overfitting.
Random forest searches for the most important feature while splitting a node which helps in
building a better model.

Random forest example:


After we train our model we can see how RandomForestClassifier fits our features and
targets of the dataset. We will plot the first 5 trees of our classifier using the ‘matplotlib’
library.
6) Prediction and Accuracy
Now we will predict our output(y_pred) for testing data(x_test) which is 20% of the dataset
using the model which we have trained. Also, we will calculate accuracy, mean absolute error
and root mean square error of our model.
7) Building the Prediction System.
Finally, we will take the user’s input and check whether data has Parkinson’s disease or not.

In this Machine learning application, we developed a model using the RandomForestClassifier


of the sklearn module of python to detect if an individual has Parkinson’s Disease or not. We
got the machine learning model with 97.43% accuracy, which is good as our dataset contains
less records.
Part A – Q & A
Unit - V
PART -A

S.No Question and Answer CO,K

What is Image Recognition? CO5,K1


1.
Image recognition is the ability of AI to detect the object, classify,
and recognize it. The last step is close to the human level of image
processing. The best example of image recognition solutions is the
face recognition.
2. What is a Random Forest? CO5,K1
A ‘random forest’ is a supervised machine learning algorithm that is
generally used for classification problems. It operates by
constructing multiple decision trees during the training phase. The
random forest chooses the decision of the majority of the trees as
the final decision.
3. How does Random Forest Algorithm works? CO5,K1
Step 1: Select random samples from a given data or training set.
Step 2: This algorithm will construct a decision tree for every training
data.
Step 3: Voting will take place by averaging the decision tree.
Step 4: Finally, select the most voted prediction result as the final
prediction result.
4. What is Speech Recognition? CO5,K1
Speech recognition is a machine's ability to listen to spoken words
and identify them. It recognizes phenones/phonetics in our speech
to get the more significant part of speech, as words and sentences.
5. How does Speech Recognition work? CO5,K1
Speech recognition starts by taking the sound energy produced by
the person speaking and converting it into electrical energy with the
help of a microphone. It then converts this electrical energy from
analog to digital, and finally to text.

6
0
PART -A

S.No Question and Answer CO,K

6. How Do You Design an Email Spam Filter? CO5,K1


Building a spam filter involves the following process:
 The email spam filter will be fed with thousands of emails
 Each of these emails already has a label: ‘spam’ or ‘not spam.’
 The supervised machine learning algorithm will then determine which
type of emails are being marked as spam based on spam words like
the lottery, free offer, no money, full refund, etc.
 The next time an email is about to hit your inbox, the spam filter will
use statistical analysis and algorithms like Decision Trees and SVM to
determine how likely the email is spam
 If the likelihood is high, it will label it as spam, and the email won’t
hit your inbox
 Based on the accuracy of each model, we will use the algorithm with
the highest accuracy after testing all the models.
7. What is a Support Vector Machine? CO5,K1
Support Vector Machine (SVM) is a supervised learning
algorithm used for classification and regression problems. The main
objective of SVM is to find a hyperplane in an N( total number of
features)-dimensional space that differentiates the data points. So we
need to find a plane that creates the maximum margin between two
data point classes.
8. What are Support Vectors in SVM? CO5,K1
Support Vectors are data points that are nearest to the hyperplane. It
influences the position and orientation of the hyperplane. Removing the
support vectors will alter the position of the hyperplane. The support
vectors help us build our support vector machine model.

9. What are Hyperplanes in SVM? CO5,K1


Hyperplanes are nothing but a boundary that helps to separate and
group the data into particular classes. A Hyperplane in 2-Dimension is
basically just a line. So the dimension of the hyperplane is decided on
the basis of the number of features in the dataset minus 1. So a
hyperplane in R2 is a line and in R3 is a plane.

6
1
PART -A
S.N Question and Answer CO,K
o
10 Briefly Explain Logistic Regression. CO5,K1
Logistic regression is a classification algorithm used to predict a binary
outcome for a given set of independent variables. The output of logistic
regression is either a 0 or 1 with a threshold value of generally 0.5. Any
value above 0.5 is considered as 1, and any point below 0.5 is
considered as 0.
11 What are false positives and false negatives? CO5,K1
False positives are those cases in which the negatives are wrongly
predicted as positives. For example, predicting that a credit card
transaction is fraud when, in fact, its not fraud.
False negatives are those cases in which the positives are wrongly
predicted as negatives. For example, predicting that a credit card
transaction is not fraud when, in fact, its fraud.
12 What is accuracy? CO5,K1
It is the number of correct predictions out of all predictions made.
Accuracy = (TP+TN)/(The total number of Predictions)
13 How does logistic regression handle categorical variables? CO5,K1
The inputs to a logistic regression model need to be numeric. The
algorithm cannot handle categorical variables directly. So, they need to
be converted into a format that is suitable for the algorithm to process.
The various levels of a categorical variable will be assigned a unique
numeric value known as the dummy variable. These dummy variables
are handled by the logistic regression model as any other numeric
value.
14 Define Bagging. CO5,K1
Creating a different training subset from sample training data with
replacement is called Bagging. The final output is based on majority
voting.

15 Define Boosting. CO5,K1


Combing weak learners into strong learners by creating sequential
models such that the final model has the highest accuracy is called
Boosting. Example: ADA BOOST, XG BOOST.

6
2
Real time Applications in
day to day life and to
Industry
REAL TIME APPLICATIONS IN DAY TO DAY LIFE
AND TO INDUSTRY

Patient's Sickness Prediction System


Machine learning has been proven effective in the field of healthcare also. Traditional
healthcare systems became increasingly challenging to cater to the needs of millions of
patients. But, with the advent of ML, the paradigm shifted towards value-based
treatment. Every modern healthcare equipment and gadget comes with internal apps that
can store patient's data. You can leverage these data to create a system that can predict
the patient's ailment and forecast the admission. KenSci is an AI-based solution that can
analyze clinical data and predict sickness along with more intelligent resource allocation

Pinterest: It employs computer vision to automatically recognize objects in the images or


“pin” and then recommend similar pins. Other applications cover pam prevention, search,
and discovery, email marketing, ad performance, etc with the help of machine learning.

Smart Replies: You must have observed how Gmail prompts simple phrases to respond
to emails like “Thank You”, “Alright”, “Yes, I’m interested”. These responses are
customized per email when ML and AI understand, estimate, and reflect on how one
counters over time.

Predict Potential Heart Failure


An algorithm designed to scan a doctor’s free-form e-notes and identify patterns in a
patient’s cardiovascular history is making waves in medicine. Instead of a physician
digging through multiple health records to arrive at a sound diagnosis, redundancy is now
reduced with computers making an analysis based on available information.

68
Content Beyond Syllabus
Contents beyond the Syllabus

Machine Learning are Transforming Indian Agriculture


Agriculture plays a vital role in India’s market. Over 58 per cent of those rural
households depend on agriculture as their primary way of livelihood, according to an IBEF
report. Agricultural exports constitute 10 per cent of the country’s exports and are the fourth-
largest exported first commodity class in India.
As stated by the Department of Industrial Policy and Promotion (DIPP), the
Indian agricultural services and agricultural machinery industries have cumulatively attracted
Foreign Direct Investment (FDI) equity inflow of about $2.45 billion and the food processing
sector has attracted approximately $7.81 billion during April 2000 to June 2017.
On the back of increased FDI and conducive government initiatives, the
agriculture industry is increasingly looking at ways to leverage technology to get better crop
yield. Many tech companies and startups have emerged in the last couple of years with
concentrated agri-based solutions which benefit from farmers.
Advantage of implementing AI in Agriculture
Using Artificial intellect in agriculture helps the farmers to comprehend the data insights such
as temperature, precipitation, wind speed, and solar power. The data analysis of historical
values offers a better contrast of the desirable outcomes. The best aspect of implementing AI
in agriculture which it won’t eliminate the jobs of human farmers; instead, it will boost their
procedures.
AI provides more efficient methods to produce, harvest and market essential plants.
•AI implementation emphasis on checking defective crops and enhancing the prospect of
healthy crop production.
•The development of Artificial Intelligence technology has bolstered agro-based companies to
operate better.
•AI is being used in applications such as automated machine adjustments for weather
forecasting and pest or disease identification.
•Artificial intelligence can enhance crop management practices, consequently helping many
tech companies invest in algorithms which are becoming useful in agriculture.
•AI solutions can address the challenges farmers face, such as climate variation, an infestation
of pests and weeds that reduces yields.
70
Effect of Artificial Intelligence in Agriculture
AI technology is rapidly rectifying the problems while recommending specific action that’s
required to overcome the issue. AI is useful in tracking the information to find answers quickly.
Let us see how AI is used in agriculture to enhance outcomes with a minimal environmental
price. By executing AI can identify a disease with 98 per cent accuracy. Therefore, AI helps
farmers track the vegetable and fruit by correcting the light to hasten generation.
CropIn — Applying AI to Boost per-Acre Value
CropIn is a Bengaluru-based startup which claims to be an intuitive, intelligent,
and self-evolving system which provides future-ready farming alternatives to the agricultural
industry.
Together with CropIn’s ‘smart farm’ solution, all the plots had been geo-tagged to discover the
actual plot area. The answer helped in remote sensing and weather advisory, monitoring and
tracking farm actions for whole traceability, educating farmers on the adoption of a right
package of inputs and practices, monitoring crop health and crop estimation, and alarms on
the pest, diseases etc..

Primarily, CropIn uses technologies like AI to help customers analyze and interpret data to
derive real-time technical insights about standing crop and projects spanning geographies. Its
agri-business intelligence alternative named SmartRisk” frees agri-alternate information and
offers risk mitigation and forecasting for practical credit risk assessment and loan recovery
help.

Proprietary machine learning algorithm constructed on satellite and weather information is


used to provide insights at storyline and area degree,” Krishna Kumar, Founder & CEO, Cropin
stated.

You might also like