Ad8552 ML Unit Iv
Ad8552 ML Unit Iv
2
Pleaseread this disclaimerbefore proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
3
DIGITAL NOTES ON
AD8552 Machine Learning
Batch/Year : 2020-2024/III
Date : 06-10-2022
Signature :
4
Table of Contents
S NO CONTENTS SLIDE NO
1 Contents 5
2 Course Objectives 7
9
1 Pre Requisites (Course Names with Code)
5 Course Outcomes 12
7 Lecture Plan 17
10 Assignments 63
11 Part A (Q & A) 65
12 Part B Qs 70
15 Assessment Schedule 80
5
Course Objectives
COURSE OBJECTIVES
7
PRE REQUISITES
PREREQUISITE
11
Course Outcomes
Course Outcomes
Cognitive/
Affective
Expected
Course Level of
Course Outcome Statement Level of
Code the
Course Attainment
Outcome
Course Outcome Statements in Cognitive Domain
13
CO – PO/PSO Mapping
CO- PO/PSO Mapping
2 2 1 - - - - - - - - -
CO2 2 2 1
2 1 1 - - - - - - - - -
CO1 2 1 -
2 2 1 - - - - - - - - -
CO4 2 1 1
2 2 1 - - - - - - - - -
CO5 2 1 1
15
Lecture Plan
Unit IV
LECTURE PLAN
UNIT – IV
Taxonomy level
Proposed date
Actual Lecture
pertaining CO
No of periods
Mode of
delivery
Date
S No Topics
17
Activity Based Learning
Unit IV
ACTIVITY BASED LEARNING
(MODEL BUILDING/PROTOTYPE)
S NO TOPICS
Work Sheet
19
ACTIVITY BASED LEARNING
(MODEL BUILDING/PROTOTYPE)
S NO TOPICS
Work Sheet
20
Lecture Notes – Unit 4
UNIT IV MACHINE LEARNING AND DATA ANALYTICS
Machine Learning for Predictive Data Analytics – Data to Insights to Decisions – Data
Exploration –Information based Learning – Similarity based learning – Probability based
learning – Error based learning – Evaluation – The art of Machine learning to Predictive Data
Analytics.
Price Prediction: Businesses such as hotel chains, airlines,and online retailers need to
constantly adjust their prices in order to maximize returns based on factors such as seasonal
changes, shifting customer demand, and the occurrence of special events. Predictive
analytics models can be trained to predict optimal prices based on historical sales records.
Businesses can then use these predictions as an input into their pricing strategy decisions.
Dosage Prediction: Doctors and scientists frequently decide how much of a medicine or
other chemical to include in a treatment. Predictive analytics models can be used to assist
this decision-making by predicting optimal dosages based on data about past dosages and
associated outcomes.
22
Risk Assessment: Risk is one of the key influencers in almost every decision an
organization makes. Predictive analytics models can be used to predict the risk associated
with decisions such as issuing a loan or underwriting an insurance policy. These models are
trained using historical data from which they extract the key indicators of risk.
Diagnosis: Doctors, engineers,and scientists regularly make diagnoses as part of their work.
Typically, these diagnoses are based on their extensive training, expertise, and experience.
Predictive analytics models can help professionals make better diagnoses by leveraging large
collections of historical examples at a scale beyond anything one individual would see over
his or her career. The diagnoses made by predictive analytics models usually become an
input into the professional’s existing diagnosis process.
Document Classification: Predictive data analytics can be used to automatically classify
documents into different categories. Examples include email spam filtering,news sentiment
analysis, customer complaint redirection, and medical decision making. In fact, the definition
of a document can be expanded to include images,sounds, and videos, all of which can be
classified using predictive data analytics models.
What is Machine Learning?
Machine learning is defined as an automated process that extracts patterns from data. To
build the models used in predictive data analytics applications, we use supervised machine
learning. Supervised machine learning techniques automatically learn a model of the
relationship between a set of descriptive features and a target feature based on a set of
historical examples, or instances. We can then use this model to make predictions for new
instances. These two separate steps are shown in Figure1.2
dataset includes descriptive features that describe the mortgage and a target feature that
indicates whether the mortgage applicant ultimately defaulted on the loan or paid it back in
full
• This model is consistent with the dataset as there are no instances in the dataset for
which the model does not make a correct prediction.
• Machine learning algorithms automate the process of learning a model that captures
the relationship between the descriptive features and the target feature in a dataset.
• Notice that this model does not use all the features and the feature that it uses is a
derived feature (in this case a ratio): feature design and feature selection are two
important topics that we will return to again and again.
What is the relationship between the descriptive features and the target feature
(OUTCOME) in the following dataset?
• The real value of machine learning becomes apparent in situations like this when we
want to build prediction models from large datasets with multiple features
How Does Machine Learning Work?
Underfitting occurs when the prediction model selected by the algorithm is too simplistic
to represent the underlying relationship in the dataset between the descriptive features
and the target feature
Overfitting
Occurs when the prediction model selected by the algorithm is so complex that the model
fits to the data set too closely and becomes sensitive to noise in the data.
Striking a balance between overfitting and underfitting when trying to predict age from
income.
It is a Goldilocks model: it is just right, striking a good balance between underfitting and
overfitting. We find these Goldilocks models by using machine learning algorithms with
appropriate inductive biases.
The Predictive Data Analytics Project Lifecycle: Crisp-DM
One of the most commonly used processes for predictive data analytics
projects is the Cross Industry Standard Process for Data Mining (CRISP-DM). Key
features of the CRISP-DM process that make it attractive to data analytics practitioners are
that it is non-proprietary; it is application, industry, and tool neutral; and it explicitly views
the data analytics process from both an application-focused and a Technical perspective.
A diagram of the CRISP-DM process that shows the six key phases and indicates the
important relationships between them. This figure is based on Figure 2 of Wirth and
Hipp (2000).
Business Understanding: Predictive data analytics projects never start out with the
goal of building a prediction model. Instead, they are focused on things like gaining
new customers, selling more products, or adding efficiencies to a process. So, during
the first phase in any analytics project, the primary goal of the data analyst is to fully
understand the business (or organizational) problem that is being addressed, and then
To design a data analytics solution for it.
Data Understanding:Once the manner in which predictive data analytics will be used
to address a business problem has been decided, it is important that the data analyst
fully understand the different data sources available within an organization and the
Different kinds of data that are contained in the sesources.
Data Preparation: Building predictive data analytics models requires specific kinds of
data, organized in a specific kind of structure known as an analytics base table (ABT).
This phase of CRISP-DM includes all the activities required to convert the disparate data
sources that are available in an organization into a well-formed ABT From which machine
learning models can be induced.
Modeling: The modeling phase of the CRISP-DM process is when the machine
learning work occurs. Different machine learning algorithms are used to build a range of
prediction models from which the best model will be selected for deployment.
Evaluation: Before models can be deployed for use within an organization, it is Important
that they are fully evaluated and proved to befit for the purpose.This phase Of CRISP-DM
covers all the evaluation tasks required to show that a prediction model will be able to make
accurate predictions after being deployed and that it does not suffer from overfitting or
underfitting.
address specific business problems and the data structures that are required to build
predictive analytics models, and in particular the analytics base table (ABT). Designing
ABTs that properly represent the characteristics of a prediction subject is a key skill for
analytics practitioners. An approach to first develop a set of domain concepts that
describe the prediction subject, and then expand these into concrete descriptive
features.
Q) What predictive analytics solutions could be proposed to help address this business
problem?
Ans) Potential analytics solutions include:
Claim prediction
Member prediction
Application prediction
Payment prediction
data sources typically combined to create an analytics base table. The basic structure in
Three key data considerations are particularly important when we are designing features
Data availability data availability, because we must have data available to implement any
feature we would like to use. For example, in an online payments service scenario, we might
define a feature that calculates the average of a customer’s account balance over the past six
months.
Timing Timing with which data becomes available for inclusion in a feature. With the
exception of the definition of the target feature, data that will be used to define a feature
must be available before the event around which we are trying to make predictions
occurs.For example,if we were building a model to predict the Outcomes of soccer
matches,we might consider including the attendance at the match as a descriptive feature.
Longevity There is potential for features to go stale if something about the environment
from which they are generated
changes.For example,to make predictions of the outcome of loans granted by a bank,we
might use the borrower’s salary as a descriptive feature.
a) Actual b) Aligned
Figure: Observation and outcome periods defined by an event rather than by a fixed
point in time (each line represents a prediction subject and stars signify events).
• In some cases only the descriptive features have a time component to them, and
the target feature is time independent.
a) Actual b) Aligned
Figure: Modeling points in time for a scenario with no real outcome period (each line
represents a customer, and stars signify events).
• Conversely, the target feature may have a time component and the descriptive features
may not.
a) Actual b) Aligned
Figure: Modeling points in time for a scenario with no real observation period (each
line represents a customer, and stars signify events).
• Data analytics practitioners can often be frustrated by legislation that stops them
from including features that appear to be particularly well suited to an analytics
solution in an ABT
• There are significant differences in legislation in different jurisdictions, but a couple
of key relevant principles almost always apply.
Anti-discrimination legislation
Data protection legislation
Although, data protection legislation changes significantly across different jurisdictions,
there are some common tenets on which there is broad agreement which affect the
design of ABTs
The use limitation principle
The purpose specification principle
The collection limitation principle
• Implementing a derived feature, however, requires data from multiple sources to be
combined into a set of single feature values
A few key data manipulation operations are frequently used to calculate derived feature
values:
aggregating data sources
deriving new features by combining or transforming existing
features
filtering fields in a data source
filtering rows in a data source
joining data sources
What are the observation period and outcome period for the motor insurance
claim prediction scenario?
• The observation period and outcome period are measured over different dates
for each insurance claim, defined relative to the specific date of that claim.
• The observation period is the time prior to the claim event, over which the
descriptive features capturing the claimant’s behavior are calculated.
• The outcome period is the time immediately after the claim event, during
which it will emerge whether the claim is fraudulent or genuine.
• What features could you use to capture the Claim Frequency domain concept?
Figure: Example domain concepts for a motor insurance fraud prediction analytics solution
• What features could you use to capture the Claim Frequency domain concept?
• What features could you use to capture the Claim Types domain concept?
Figure: A subset of the domain concepts and related features for a motor insurance fraud
prediction analytics solution.
• The following table illustrates the structure of the final ABT that was designed for the
motor insurance claims fraud detection solution.
• The table contains more descriptive features than the ones we have discussed
• The table also shows the first four instances.
Table: The ABT for the motor insurance claims fraud detection
solution.
3. Data Exploration
Table: The structures of the tables included in a data quality report to describe
(a) continuous features and (b) categorical features
Table: Portions of the ABT for the motor insurance claims fraud
detection problem
Table: A data quality report for the motor insurance claims fraud
detection ABT
Table: A data quality report for the motor insurance claims fraud
detection ABT.
Figure: Histograms for different sets of data each of which exhibit well-known, common
characteristics.
The probability density function for the normal distribution (or Gaussian distribution) is
where x is any value, and µ and 𝜎 are parameters that define the shape of the
distribution: the population mean and population standard deviation.
Figure: Three normal distributions with Figure: Three normal distributions with
different means but identical identical means but different
standard deviations. standard deviations.
Figure: An illustration of the 68 95 99:7 percentage rule that a normal distribution
defines as the expected distribution of observations. The grey region defines the area where
95% of observations are expected.
A data quality issue is loosely defined as anything unusual about the data in an ABT.
The most common data quality issues are:
missing values
irregular cardinality
Outliers
The data quality issues we identify from a data quality report will be of two types:
Data quality issues due to invalid data
Data quality issues due to valid data.
Table: The data quality plan for the motor insurance fraud prediction ABT.
where ai,is a specific value of feature a, and lower and upper are the lower and upper
thresholds.
• In this chapter we are going to introduce a machine learning algorithm that tries
to build predictive models using only the most informative features.
• In this context an informative feature is a descriptive feature whose values split the
instances in the dataset into homogeneous sets with respect to the target feature
value.
(a) Brian (b) John (c) Aphra (d) Aoife
Figure: Cards showing character faces and names for the Guess-Who game
Figure: Cards showing character faces and names for the Guess-Who game
Is it a man? .
Does the person wear glasses?
1+2+3+3
= 2.25
4
Figure: The different question sequences that can follow in a game of Guess-Who
beginning with the question Is it a man?
So, on average if you ask Question (1) first the average number of questions you
have to ask per game is:
2+2+2+2
4 =2
• This is not because of the literal message of the answers: YES or NO.
• It is to do with how the answer to each questions splits the domain into
different sized sets based on the value of the descriptive feature the question is
2
asked about and the likelihood of each possible answer to the question.
Big Idea
So the big idea here is to figure out which features are the most informative
ones to ask questions about by considering the effects of the different answers
to the questions, in terms of:
• Each of the non-leaf nodes (root and interior) in the tree specifies a test to be
carried out on one of the query’s descriptive features.
• Each of the leaf nodes specifies a predicted classification for the query.
Figure: (a) and (b) show two decision trees that are consistent with the instances
in the spam dataset. (c) shows the path taken through the tree shown in (a) to
make a prediction for the query instance: SUSPICIOUS WORDS = ’true’,
UNKNOWN SENDER = ’true’, CONTAINS IMAGES = ’true’.
• Both of these trees will return identical predictions for all the examples in the
dataset.
• The tree that tests SUSPICIOUS WORDS at the root is very shallow because the
SUSPICIOUS WORDS feature perfectly splits the data into pure groups of ’spam’
and ’ham’.
• Descriptive features that split the dataset into pure sets with respect to the
target feature provide information about the target feature.
3.4.2 Shannon’sEntropyModel
An easy way to understand the entropy of a set is to think in terms of the uncertainty
associated with guessing the result if you were to make a random selection from the
set.
What is a log?
Remember the log of a to the base b is the number to which we must raise b to get a.
log2(1) = 0 because 20 = 1
log2(8) = 3 because 23 = 8
log5(25) = 2 because 52 = 25
Shannon’s model of entropy is a weighted sum of the logs of the probabilities of each of the
possible outcomes when we make a random selection from a set.
Table: The relationship between the entropy of a message and the set it was selected
from.
3.4.3 Information Gain
The equation calculates the entropy for a dataset with respect to a target feature
3.5 Similarity-based Learning
Based on computational measure of similarity (in the form of distance measure) between
instances
Feature space: each descriptive has its own dimensional axis. It is an
abstract m-dimensional space that is created by making each descriptive feature in a
dataset an axis of an m-dimensional coordinate system and mapping each instance in
the dataset to a point in this coordinate space based on the values of its descriptive
features
Working of feature space: if the values of the descriptive features of two or
more instances in a dataset are the same, then these instances will be mapped to the
same point in the feature space and vice versa
Distance between two points in the feature space is a useful measure of
the similarity of the descriptive features of two instances
metric(a,b): is a real value function that returns the distance between
two points a and b in the feature space. It has following properties:
•Non-negativity
•Identity
•Symmetry
•Triangular inequality
Two examples of distance metric
•Euclidean distance
•Manhattan distance (taxi-cam distance)=sum of absolute differences.
Minkowski distance: a family of distance metrics based on differences between features
Decision boundary: is the boundary between regions of the feature space in which
different target lavels will be predicted. It is generated by aggregating the
neighboring local models (Voronoi regions) that make the same prediction
Noise effects:
Using Kronecker delta approach:
n-n algorithm is sensitive to noise because any errors in the description of labeling
of training data results in erroneous local models and incorrect predictions. One way
to mitigate against noise is to modify the algorithm to return the majority target level
within the set of k nearest neighbors to the query q
• We can use estimates of likelihoods to determine the most likely prediction that
should be made.More importantly, we revise these predictions based on data
we collect and whenever extra evidence becomes available.
• A probability function, P(), returns the probability of a feature taking a specific value.
• A joint probability refers to the probability of an assignment of specific values to
multiple different features
• A conditional probability refers to the probability of one feature taking a specific
value given that we already know the value of a different feature
• A probability distribution is a data structure that describes the probability of each
possible value a feature can take. The sum of a probability distribution must equal 1.0
• A joint probability distribution is a probability distribution
over more than one feature assignment and is written as a
multi-dimensional matrix in which each cell lists the
probability of a particular combination of feature values
being assigned
• The sum of all the cells in a joint probability distribution
must be 1.0.
• Baye’s Theorem
Bayes’ Theorem defines the conditional probability of an event, X, given some evidence,Y,
in terms of the product of the inverse conditional probability, P(Y | X), and the prior
probability of the event P(X).
• Bayesian Prediction
To make Bayesian predictions, we generate the probability of the event that a target
feature, t, takes a specific level, l, given the assignment of values to a set of descriptive
features,q, from a query instance. We can restate Bayes’ Theorem using this terminology
and generalize the definition of Bayes’ Theorem so that it can take into account more than
one piece of evidence(each descriptive feature value is a separate piece of evidence).
The Generalized Bayes’Theorem is defined as
To calculate a probability using the Generalized Bayes’ Theorem, we need to calculate three
probabilities:
The technical term for this splitting of the data into smaller and smaller
sets based on larger and larger sets of conditions is data fragmentation. Data
fragmentation is essentially an instance of the curse of dimensionality. As the
number of descriptive features grows, the number of potential conditioning events
grows. Consequently, an exponential increase is required in the size of the dataset as
each new descriptive feature is added to ensure that for any conditional probability,
there are enough instances in the training dataset matching the conditions so that the
resulting probability is reasonable.
Conditional Independence and Factorization
If knowledge of one event has no effect on the probability
of another event, and vice versa, then the two events are
independent of each other.
f two events X and Y are independent then:
P(X|Y) = P(X)
P(X, Y) = P(X) x P(Y)
• Full independence between events is quite rare.
• A more common phenomenon is that two, or more, events may be independent if we
know that a third event has happened
• This is known as conditional independence
3.7 Error-Based Learning
for a parameterized model that minimizes the total error across the predictions made by that
model with respect to a set of training instances. In this section introduces the key ideas of a
parameterized model, measuring error and an error surface.
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown
by a Simple Linear Regression model is linear or a sloped straight line, hence it is called
Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous
or categorical values.
a)A scatter plot of the SIZE and RENTAL PRICE features from the office rentals
dataset; (b)the scatter plot from (a)with al inear model relating RENTALPRICE to
SIZE overlaid.
Measuring Error
In order to formally measure the fit of a linear regression model with a set
of training data, we require an error function. An error function captures the
error between the predictions made by a model and the actual values in a
training dataset.
• There are many different kinds of error functions, but for measuring the fit of simple
linear regression models, the most commonly used is the sum of squared errors error
function, or L2.
• To calculateL2 we use our candidate model to make a prediction for each member of the
training dataset,and then calculate the error (or residual) between these predictions and
the actual target feature values in the training set.
Error Surface:
For every possible combination of weights, w[0] and w[1], there is a
corresponding sum Of squared errors value. We can think about all these error values
joined to make a surface Defined by the weight combinations, as shown in the Figure given
below. Here, each pair of Weights w[0]and w[1] defines a point on the x-y plane, and the
sum of squared errors for the model using these weights determines the height of the error
surface above the x-y Plane for that pair of weights. The x-y plane is known as a weight
space, and the surface is known as an error surface. The model that best fits the training
data is the model Corresponding to the lowest point on the error surface.
3.8 Evaluation
When evaluating machine learning (ML) models, the question that arises is
whether the model is the best model available from the model’s hypothesis space in terms
of generalization error on the unseen / future data set. Whether the model is trained and
tested using the most appropriate method. Out of available models, which model to
select? These questions are taken care of using what is called as a hold-out method.
Hold-out method for Model Evaluation
The hold-out method for model evaluation represents the mechanism of splitting the
dataset into training and test datasets. The model is trained on the training set and then
tested on the testing set to get the most optimal model. This approach is often used when
the data set is small and there is not enough data to split into three sets (training,
validation, and testing). This approach has the advantage of being simple to implement,
but it can be sensitive to how the data is divided into two sets. If the split is not random,
then the results may be biased. Overall, the hold out method for model evaluation is a
good starting point for training machine learning models, but it should be used with
caution. The following represents the hold-out method for model evaluation.
In the above diagram, you may note that the data set is split into two parts. One split is
set aside or held out for training the model. Another set is set aside or held out for
testing or evaluating the model. The split percentage is decided based on the volume of
the data available for training purposes. Generally, 70-30% split is used for splitting the
dataset where 70% of the dataset is used for training and 30% dataset is used for
testing the model.
This technique is well suited if the goal is to compare the models based on the
model accuracy on the test dataset and select the best model. However, there is always a
possibility that trying to use this technique can result in the model fitting well to the test
dataset. In other words, the models are trained to improve model accuracy on the test
dataset assuming that the test dataset represents the population. The test error, thus,
becomes an optimistically biased estimation of generalization error. However, that is not
desired. The final model fails to generalize well to the unseen or future dataset as it is trained
to fit well (or overfit) concerning the test data.
The following is the process of using the hold-out method for model evaluation:
•Split the dataset into two parts (preferably based on a 70-30% split; However, the
percentage split will vary)
•Train the model on the training dataset; While training the model, some fixed set of
hyperparameters is selected.
•Test or evaluate the model on the held-out test dataset
•Train the final model on the entire dataset to get a model which can generalize better on the
unseen or future dataset.
Predictive data analytics projects use machine learning to build models that
capture the relationships in large datasets between descriptive features and a target
feature. A specific type of learning, called inductive learning, is used, where learning
entails inducing a general rule from a set of specific instances. This observation is
important because it highlights that machine learning has the same properties as inductive
learning. Predictive analytics project can use CRISP-DM process to manage a project
through its lifecycle.
The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model that
serves as the base for a data science process. It has six sequential phases:
1.Business understanding – What does the business need?
2.Data understanding – What data do we have / need? Is it clean?
3.Data preparation – How do we organize the data for modeling?
4.Modeling – What modeling techniques should we apply?
5.Evaluation – Which model best meets the business objectives?
6.Deployment – How do stakeholders access the results?
Assignments
ASSIGNMENT - 1
65
Part A – Q & A
Unit - IV
PART -A
What are the machine learning methods that can be used for CO4,K1
1.
predictive analysis?
Methods used in predictive analytics include machine learning
algorithms, advanced mathematics, statistical modeling, descriptive
analytics and data mining. The term predictive analytics designates
an approach rather than a particular technology
Price Prediction
Dosage Prediction
Risk Assessment
Document Classification:
6
7
PART -A
6
8
PART -A
S.N Question and Answer CO,K
o
10 CO4,K1
Draw the structure of a data quality plan.
The most common data quality issues, however, are missing values,
irregular cardinality problems, and outliers.
6
9
PART -A
7
0
Part B – Questions
S.No Question and Answer CO,K
PART -B
1. Explain the fundamental concepts of different learning for predictive CO4,K4
machine learning.
3. Explain Information based learning with decision tree concepts with CO4,K4
suitable example.
in machine learning.
machine learning.
6. Explain The art of Machine learning to Predictive Data Analytics with CO4,K4
suitable example.
machine learning.
7
2
Supportive online
Certification courses
(NPTEL, Swayam,
Coursera, Udemy, etc.,)
SUPPORTIVE ONLINE COURSES
Course
S No Course title Link
provider
1 Similarity-Based https://www.coursera.org/lect
Recommender for ure/deploying-machine-
Rating Prediction
learning-models/similarity-
Coursera
based-recommender-for-
rating-prediction-N098n
https://www.coursera.org/lear
2 Predictive modeling in
n/predictive-modeling-
Coursera analytics
analytics
74
Real time Applications in
day to day life and to
Industry
REAL TIME APPLICATIONS IN DAY TO DAY LIFE
AND TO INDUSTRY
76
Content Beyond Syllabus
Contents beyond the Syllabus
In short neural networks are adaptive and modify themselves as they learn from
subsequent inputs. For example, below is a representation of a neural network that
performs image recognition for ‘humans’. The system has been trained with a lot of
samples of human and non-human images. The resulting network works as a function
that takes an image as input and outputs label human or non-human.
78
Building predictive capabilities using Machine Learning and Artificial Intelligence
Let’s implement what we have learned about neural networks in an everyday predictive
example. For example, we want to model a neural network for the banking system that
predicts debtor risk. For such a problem, we have to build a recurrent neural network that can
model patterns over time. RNN will require colossal memory and a large quantity of input data.
The neural system will take data sets of previous debtors.
Input variables can be age, income, current debt, etc. and provide the risk factor for the
debtor. Each time we ask our neural network for an answer, we also save a set of our
intermediate calculations and use them the next time as part of our input. That way, our
model will adjust its predictions based on the data that it has seen recently.
Assessment Schedule
(Proposed Date & Actual
Date)
Assessment Schedule
81
Prescribed Text Books &
Reference
Prescribed Text Books & Reference Books
TEXT BOOKS
1. Ameet V Joshi, Machine Learning and Artificial Intelligence, Springer Publications, 2020
2. John D. Kelleher, Brain Mac Namee, Aoife D’ Arcy, Fundamentals of Machine learning for
Predictive Data Analytics, Algorithms, Worked Examples and case studies, MIT press,2015
REFERENCES
1. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer Publications,
2011
2. Stuart Jonathan Russell, Peter Norvig, John Canny, Artificial Intelligence: A Modern
Approach, Prentice Hall, 2020
1. Machine Learning Dummies, John Paul Muller, Luca Massaron, Wiley Publications, 2021
83
Mini Project Suggestions
Mini Project Suggestion
85
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
relianceon the contentsof this informationis strictly prohibited.
86