0% found this document useful (0 votes)
17 views86 pages

Ad8552 ML Unit Iv

Uploaded by

saiprassad20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views86 pages

Ad8552 ML Unit Iv

Uploaded by

saiprassad20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

1

2
Pleaseread this disclaimerbefore proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.

3
DIGITAL NOTES ON
AD8552 Machine Learning

Department : Artificial Intelligence and Data Science

Batch/Year : 2020-2024/III

Created by : Ms. A. AKILA

Date : 06-10-2022

Signature :

4
Table of Contents
S NO CONTENTS SLIDE NO

1 Contents 5

2 Course Objectives 7

9
1 Pre Requisites (Course Names with Code)

4 Syllabus (With Subject Code, Name, LTPC details) 11

5 Course Outcomes 12

6 CO- PO/PSO Mapping 15

7 Lecture Plan 17

8 Activity Based Learning 19

9 Lecture Notes 21-62

10 Assignments 63

11 Part A (Q & A) 65

12 Part B Qs 70

11 Supportive Online Certification Courses 72

Real time Applications in day to day life and to


74
13
Industry

14 Contents Beyond the Syllabus 76

15 Assessment Schedule 80

16 Prescribed Text Books & Reference Books 82

17 Mini Project Suggestion 84

5
Course Objectives
COURSE OBJECTIVES

 To understand the basics of Machine Learning (ML)

 To understand the methods of Machine Learning

 To know about the implementation aspects of machine learning

 To understand the concepts of Data Analytics and Machine Learning

 To understand and implement use cases of ML


·
· ·
·

7
PRE REQUISITES
PREREQUISITE

AD8102 – Fundamentals of Data Science


MA8101 – Artificial Intelligence
GE8152 -- Problem Solving and Python Programming
Syllabus
SYLLABUS L T P C
1 0 0 1
AD8552 MACHINE LEARNING

UNIT I MACHINE LEARNING BASICS 8


Introduction to Machine Learning (ML) - Essential concepts of ML – Types of learning –
Machine learning methods based on Time – Dimensionality – Linearity and Non linearity –
Early trends in Machine learning – Data Understanding Representation and visualization.

UNIT II MACHINE LEARNING METHODS 11


Linear methods – Regression -Classification –Perceptron and Neural networks – Decision
trees –Support vector machines – Probabilistic models ––Unsupervised learning –
Featurization

UNIT III MACHINE LEARNING IN PRACTICE 9


Ranking – Recommendation System - Designing and Tuning model pipelines- Performance
measurement – Azure Machine Learning – Open-source Machine Learning libraries –
Amazon’s Machine Learning Tool Kit: Sagemaker

UNIT IV MACHINE LEARNING AND DATA ANALYTICS 9


Machine Learning for Predictive Data Analytics – Data to Insights to Decisions – Data
Exploration –Information based Learning – Similarity based learning – Probability based
learning – Error based learning – Evaluation – The art of Machine learning to Predictive
Data Analytics.

UNIT V APPLICATIONS OF MACHINE LEARNING 8


Image Recognition – Speech Recognition – Email spam and Malware Filtering – Online
fraud detection – Medical Diagnosis.

11
Course Outcomes
Course Outcomes
Cognitive/
Affective
Expected
Course Level of
Course Outcome Statement Level of
Code the
Course Attainment
Outcome
Course Outcome Statements in Cognitive Domain

CO1 Understand the basics of ML


K2

Explain various Machine Learning methods


CO2 K2
CO1 Demonstrate various ML techniques using
standard packages.
K1

CO4 Explore knowledge on Machine learning and


Data Analytics K1

Apply ML to various real time examples K1


CO5

13
CO – PO/PSO Mapping
CO- PO/PSO Mapping

Overall Correlation Matrix of the Course as per Anna University Curriculum


Cour
se PO1 P01 PO1
Cod PO1 PO2 PO1 PO4 PO5 PO6 PO7 PO8 PO9 0 1 2
e

Correlation Matrix of the Course Outcomes to Programme Outcomes and Programme


Specific Outcomes Including Course Enrichment Activities

Cou Programme Outcomes (POs), Programme Specific Outcomes (PSOs)


rse
Out
com PO PO PO PO PO PO PO PO PO PO PO PO PS PS PS
es 1 2 1 4 5 6 7 8 9 10 11 12 O1 O2 O1
(CO
s)
2 1 1 - - - - - - - - -
CO1 2 1 -

2 2 1 - - - - - - - - -
CO2 2 2 1

2 1 1 - - - - - - - - -
CO1 2 1 -

2 2 1 - - - - - - - - -
CO4 2 1 1

2 2 1 - - - - - - - - -
CO5 2 1 1

15
Lecture Plan
Unit IV
LECTURE PLAN

UNIT – IV

Taxonomy level
Proposed date

Actual Lecture

pertaining CO
No of periods

Mode of
delivery
Date
S No Topics

1 Machine Learning for Predictive Chalk and


1
Data Analytics 12.10.2022 K3 Talk, PPT
2 Chalk and
Data to Insights to Decisions 1 13.10.2022 K3 Talk, PPT
3
1 Chalk and
Information based Learning 14.10.2022 K3 Talk, PPT
4
1 Chalk and
Similarity based learning 18.10.2022 K3 Talk, PPT
5 Chalk and
1
Probability based learning 19.10.2022 K3 Talk, PPT
6
Error based 1 Chalk and
learning 20.10.2022 K3 Talk, PPT
7 Chalk and
1
Evaluation 21.10.2022 K3 Talk, PPT
8 The art of Machine learning to Chalk and
1
Predictive Data Analytics 26.10.2022 K3 Talk, PPT
9 The art of Machine learning to Chalk and
1
Predictive Data Analytics 27.10.2022 K3 Talk, PPT

17
Activity Based Learning
Unit IV
ACTIVITY BASED LEARNING

(MODEL BUILDING/PROTOTYPE)

S NO TOPICS
Work Sheet

19
ACTIVITY BASED LEARNING

(MODEL BUILDING/PROTOTYPE)

S NO TOPICS
Work Sheet

20
Lecture Notes – Unit 4
UNIT IV MACHINE LEARNING AND DATA ANALYTICS
Machine Learning for Predictive Data Analytics – Data to Insights to Decisions – Data
Exploration –Information based Learning – Similarity based learning – Probability based
learning – Error based learning – Evaluation – The art of Machine learning to Predictive Data
Analytics.

1. Machine Learning for Predictive DataAnalytics

Modern organizations collect massive amounts of data. For data to be of


value to an organization, they must be analyzed to extract insights that can be used to
make better decisions. The progression from data to insights to decisions is illustrated in
Figure 1.1. Extracting insights from data is the job of data analytics.

Figure1.1 Predictive data analytics moving from data to insight to decision.

What Is Predictive Data Analytics?


Predictive data analytics is the art of building and using models that
make predictions based on patterns extracted from historical data.
Applications of predictive data analytics

Price Prediction: Businesses such as hotel chains, airlines,and online retailers need to
constantly adjust their prices in order to maximize returns based on factors such as seasonal
changes, shifting customer demand, and the occurrence of special events. Predictive
analytics models can be trained to predict optimal prices based on historical sales records.
Businesses can then use these predictions as an input into their pricing strategy decisions.
Dosage Prediction: Doctors and scientists frequently decide how much of a medicine or
other chemical to include in a treatment. Predictive analytics models can be used to assist
this decision-making by predicting optimal dosages based on data about past dosages and
associated outcomes.

22
Risk Assessment: Risk is one of the key influencers in almost every decision an
organization makes. Predictive analytics models can be used to predict the risk associated
with decisions such as issuing a loan or underwriting an insurance policy. These models are
trained using historical data from which they extract the key indicators of risk.
Diagnosis: Doctors, engineers,and scientists regularly make diagnoses as part of their work.
Typically, these diagnoses are based on their extensive training, expertise, and experience.
Predictive analytics models can help professionals make better diagnoses by leveraging large
collections of historical examples at a scale beyond anything one individual would see over
his or her career. The diagnoses made by predictive analytics models usually become an
input into the professional’s existing diagnosis process.
Document Classification: Predictive data analytics can be used to automatically classify
documents into different categories. Examples include email spam filtering,news sentiment
analysis, customer complaint redirection, and medical decision making. In fact, the definition
of a document can be expanded to include images,sounds, and videos, all of which can be
classified using predictive data analytics models.
What is Machine Learning?
Machine learning is defined as an automated process that extracts patterns from data. To
build the models used in predictive data analytics applications, we use supervised machine
learning. Supervised machine learning techniques automatically learn a model of the
relationship between a set of descriptive features and a target feature based on a set of
historical examples, or instances. We can then use this model to make predictions for new
instances. These two separate steps are shown in Figure1.2

Fig:1.2 The two steps in


supervised machine
learning.
Table 1.1 shows the dataset, of mortgages that a bank has granted in the past. This

dataset includes descriptive features that describe the mortgage and a target feature that
indicates whether the mortgage applicant ultimately defaulted on the loan or paid it back in
full

Table 1.1 A credit scoring dataset.

What is the relationship between the descriptive features


(OCCUPATION, AGE, LOAN-SALARY RATIO) and the target feature (OUTCOME)?

An example of a verysimple prediction model for this domain would be


If LOAN-SALARYRATIO>3then
OUTCOME=default
else
OUTCOME=repay

• This model is consistent with the dataset as there are no instances in the dataset for
which the model does not make a correct prediction.
• Machine learning algorithms automate the process of learning a model that captures
the relationship between the descriptive features and the target feature in a dataset.
• Notice that this model does not use all the features and the feature that it uses is a
derived feature (in this case a ratio): feature design and feature selection are two
important topics that we will return to again and again.
What is the relationship between the descriptive features and the target feature
(OUTCOME) in the following dataset?

if LOAN-SALARY RATIO < 1:5 then


OUTCOME=’repay’
else if LOAN-SALARY RATIO > 4 then
OUTCOME=’default’
else if AGE < 40 and OCCUPATION =’industrial’ then
OUTCOME=’default’
else
OUTCOME=’repay’

• The real value of machine learning becomes apparent in situations like this when we
want to build prediction models from large datasets with multiple features
How Does Machine Learning Work?

• Machine learning algorithms work by searching through a set of possible


prediction models for the model that best captures the relationship between the
descriptive features and the target feature.
• An obvious criteria for driving this search is to look for Models that are consistent
with the data.
• At least two reasons why just searching for consistent models is not sufficiein
orderder to learn useful prediction models.
• First, when we are dealing with large datasets, it is likely that there will be noise in
the data, and prediction models that are consistent with noisy data will make
incorrect predictions. Second, in the vast majority of machine learning projects, the
training set represents only a small sample of the possible set of instances in the
domain. As a result, machine learning is an ill-posed problem.

Notice that there is more than one candidate model left! It is


because a single consistent model cannot be found based on a
 Consistency ≈memorizing the data set
 Consistency with noise in the data isn’t desirable
 Goal: a model that generalises beyond the dataset and that isn’t influenced by the
noise in the dataset
 Inductive bias the set of assumptions that define the model selection criteria of an ML
algorithm
 There are two types of bias that we can use:
 Restriction bias constrains the set of models that the algorithm will consider during
the learning process
 preference bias guides the learning algorithm to prefer certain models over others
 Inductive bias is necessary for learning (beyond the dataset)

ML algorithms work by searching through sets of potential Models


There are two sources of information that guide this)Search
 the training data,
 the inductive bias of the algorithm
If we choose the wrong inductive bias can lead to
underfitting

Underfitting occurs when the prediction model selected by the algorithm is too simplistic

to represent the underlying relationship in the dataset between the descriptive features
and the target feature
Overfitting
Occurs when the prediction model selected by the algorithm is so complex that the model
fits to the data set too closely and becomes sensitive to noise in the data.
Striking a balance between overfitting and underfitting when trying to predict age from
income.

It is a Goldilocks model: it is just right, striking a good balance between underfitting and
overfitting. We find these Goldilocks models by using machine learning algorithms with
appropriate inductive biases.
The Predictive Data Analytics Project Lifecycle: Crisp-DM

One of the most commonly used processes for predictive data analytics
projects is the Cross Industry Standard Process for Data Mining (CRISP-DM). Key
features of the CRISP-DM process that make it attractive to data analytics practitioners are
that it is non-proprietary; it is application, industry, and tool neutral; and it explicitly views
the data analytics process from both an application-focused and a Technical perspective.

A diagram of the CRISP-DM process that shows the six key phases and indicates the
important relationships between them. This figure is based on Figure 2 of Wirth and
Hipp (2000).

Business Understanding: Predictive data analytics projects never start out with the
goal of building a prediction model. Instead, they are focused on things like gaining
new customers, selling more products, or adding efficiencies to a process. So, during
the first phase in any analytics project, the primary goal of the data analyst is to fully
understand the business (or organizational) problem that is being addressed, and then
To design a data analytics solution for it.
Data Understanding:Once the manner in which predictive data analytics will be used
to address a business problem has been decided, it is important that the data analyst
fully understand the different data sources available within an organization and the
Different kinds of data that are contained in the sesources.
Data Preparation: Building predictive data analytics models requires specific kinds of
data, organized in a specific kind of structure known as an analytics base table (ABT).
This phase of CRISP-DM includes all the activities required to convert the disparate data
sources that are available in an organization into a well-formed ABT From which machine
learning models can be induced.
Modeling: The modeling phase of the CRISP-DM process is when the machine
learning work occurs. Different machine learning algorithms are used to build a range of
prediction models from which the best model will be selected for deployment.
Evaluation: Before models can be deployed for use within an organization, it is Important
that they are fully evaluated and proved to befit for the purpose.This phase Of CRISP-DM
covers all the evaluation tasks required to show that a prediction model will be able to make
accurate predictions after being deployed and that it does not suffer from overfitting or
underfitting.

Deployment: Machine learning models are built to serve a purpose within an


organization, and the last phase of CRISP-DM covers all the work that must be done to
successfully integrate a machine learning model into the processes within an organization.

Predictive Analytics Tools


Predictive Analytics Software Tools have advanced analytical capabilities like Text
Analysis, Real-Time Analysis, Statistical Analysis, Data Mining, Machine Learning
modeling and Optimization.
Libraries for Statistical Modeling and Analysis
•Scikit-learn
•Pandas
•Stats model
•NLTK (Natural Language Processing Tool Kit)
•GraphLab
•Neural Designer
Open-Source Analytical Tools
•SAP Business Objects
•IBM SPSS
•Halo Business Intelligence
•Daiku-DSS
•Weka
2. Data to Insights to Decisions

This chapter we present an approach to developing analytics solutions that

address specific business problems and the data structures that are required to build
predictive analytics models, and in particular the analytics base table (ABT). Designing
ABTs that properly represent the characteristics of a prediction subject is a key skill for
analytics practitioners. An approach to first develop a set of domain concepts that
describe the prediction subject, and then expand these into concrete descriptive
features.

2.1 Converting a business problem into an analytics solution

involves answering the following key questions:

1 What is the business problem?


2 What are the goals that the business wants to achieve?
3 How does the business currently work?
4 In what ways could a predictive analytics model help to address the business problem?

Case Study: Motor Insurance Fraud


In spite of having a fraud investigation team that investigates up to 30% of all
claims made, a motor insurance company is still losing too much money due to fraudulent
claims

Q) What predictive analytics solutions could be proposed to help address this business
problem?
Ans) Potential analytics solutions include:

 Claim prediction

 Member prediction

 Application prediction

 Payment prediction

2.2 Assessing Feasibility


Evaluating the feasibility of a proposed analytics solution involves considering the following
questions:

1 Is the data required by the solution available, or could it be made available?


2 What is the capacity of the business to utilize the insights that the analytics solution will
provide?
What are the data and capacity requirements for the proposed Claim Prediction analytics
solution for the motor insurance fraud scenario?
Case Study: Motor Insurance Fraud
[Claim prediction]
Data Requirements: A large collection of historical claims marked as ’fraudulent’ and
’non-fraudulent’. Also, the details of each claim, the related policy, and the related
claimant would need to be available.
Capacity Requirements: The main requirement is that a mechanism could be put in
place to inform claims investigators that some claims were prioritized above others. This
would also require that information about claims become available in a suitably timely
manner so that the claims investigation process would not be delayed by the model.

2.3 Designing the Analytics Base Table (ABT):

The basic structure in which we capture historical datasets. The different

data sources typically combined to create an analytics base table. The basic structure in

which we capture historical datasets is the analytics base table (ABT)

Figure: The general structure of an analytics base table—descriptive features and a


target feature.

Figure: The different data sources typically combined to create an


analytics base table.
The general structure of an analytics base table:
 Descriptive features
 Target feature
• Prediction subject defines the basic level at which predictions are made, and each
row in the ABT will represent one instance of the prediction subject
• One-row-per-subject is often used to describe this structure
Each row in an ABT is composed of a set of descriptive features and a target feature
A good way to define features is to identify the key domain
concepts and then to base the features on these concepts.

Figure: The hierarchical relationship between an analytics solution,


domain concepts, and descriptive features.
Features in an ABT:
 Raw features
 Derived features: requires data from multiple sources to be combined into a set
of single feature values
Common derived feature types:
 Aggregates
 Flags
 Ratios
 Mappings
There are a number of general domain concepts that are often useful:
• Prediction Subject Details
• Demographics
• Usage
• Changes in Usage
• Special Usage
• Lifecycle Phase
• Network Links
Figure: Example domain concepts for a motor insurance fraud claim prediction
analytics solution.

2.4 Designing & Implementing Features

Three key data considerations are particularly important when we are designing features
Data availability data availability, because we must have data available to implement any
feature we would like to use. For example, in an online payments service scenario, we might
define a feature that calculates the average of a customer’s account balance over the past six
months.
Timing Timing with which data becomes available for inclusion in a feature. With the
exception of the definition of the target feature, data that will be used to define a feature
must be available before the event around which we are trying to make predictions
occurs.For example,if we were building a model to predict the Outcomes of soccer
matches,we might consider including the attendance at the match as a descriptive feature.
Longevity There is potential for features to go stale if something about the environment
from which they are generated
changes.For example,to make predictions of the outcome of loans granted by a bank,we
might use the borrower’s salary as a descriptive feature.

Figure: Sample descriptive


feature data illustrating
numeric, binary,
ordinal, interval, categorical,
and textual types.
Propensity models: Many of the predictive models that we build are propensity models,
which inherently have a temporal element
Two key periods of propensity modeling:
 Observation period
• Outcome period
In some cases the observation and outcome period are measured over the same time for
all predictive subjects.

Figure: Modeling points in time.


Often the observation period and outcome period will be measured over different dates for
each prediction subject.

a) Actual b) Aligned

Figure: Observation and outcome periods defined by an event rather than by a fixed
point in time (each line represents a prediction subject and stars signify events).

• In some cases only the descriptive features have a time component to them, and
the target feature is time independent.

a) Actual b) Aligned

Figure: Modeling points in time for a scenario with no real outcome period (each line
represents a customer, and stars signify events).
• Conversely, the target feature may have a time component and the descriptive features
may not.

a) Actual b) Aligned

Figure: Modeling points in time for a scenario with no real observation period (each
line represents a customer, and stars signify events).

• Data analytics practitioners can often be frustrated by legislation that stops them
from including features that appear to be particularly well suited to an analytics
solution in an ABT
• There are significant differences in legislation in different jurisdictions, but a couple
of key relevant principles almost always apply.
 Anti-discrimination legislation
 Data protection legislation
Although, data protection legislation changes significantly across different jurisdictions,
there are some common tenets on which there is broad agreement which affect the
design of ABTs
 The use limitation principle
 The purpose specification principle
 The collection limitation principle
• Implementing a derived feature, however, requires data from multiple sources to be
combined into a set of single feature values
A few key data manipulation operations are frequently used to calculate derived feature
values:
 aggregating data sources
 deriving new features by combining or transforming existing
 features
 filtering fields in a data source
 filtering rows in a data source
 joining data sources

Case Study: Motor Insurance Fraud

What are the observation period and outcome period for the motor insurance
claim prediction scenario?
• The observation period and outcome period are measured over different dates
for each insurance claim, defined relative to the specific date of that claim.
• The observation period is the time prior to the claim event, over which the
descriptive features capturing the claimant’s behavior are calculated.
• The outcome period is the time immediately after the claim event, during
which it will emerge whether the claim is fraudulent or genuine.

• What features could you use to capture the Claim Frequency domain concept?

Case Study: Motor Insurance Fraud

Figure: Example domain concepts for a motor insurance fraud prediction analytics solution
• What features could you use to capture the Claim Frequency domain concept?

Figure: A subset of the domain concepts and related features for a


motor insurance fraud prediction analytics solution.

• What features could you use to capture the Claim Types domain concept?

Figure: Example domain concepts for a motor insurance fraud


prediction analytics solution.
• What features could you use to capture the Claim Details domain concept?

Figure: A subset of the domain concepts and related features for a motor insurance fraud
prediction analytics solution.

• The following table illustrates the structure of the final ABT that was designed for the
motor insurance claims fraud detection solution.
• The table contains more descriptive features than the ones we have discussed
• The table also shows the first four instances.

Table: The ABT for the motor insurance claims fraud detection
solution.
3. Data Exploration

3.1 The Data Quality Report


A data quality report includes tabular reports that describe the characteristics of each feature
in an ABT using standard statistical measures of central tendency and variation.
The tabular reports are accompanied by data visualizations:
 A histogram for each continuous feature in an ABT
 A bar plot for each categorical feature in an ABT.

Table: The structures of the tables included in a data quality report to describe
(a) continuous features and (b) categorical features

Case Study: Motor Insurance Fraud

Table: Portions of the ABT for the motor insurance claims fraud
detection problem
Table: A data quality report for the motor insurance claims fraud
detection ABT

Table: A data quality report for the motor insurance claims fraud
detection ABT.

Figure: Visualizations of the continuous and categorical features in


the motor insurance claims fraud detection ABT in Table 2
Figure: Visualizations of the continuous and categorical features in
the motor insurance claims fraud detection ABT in Table 2

Figure: Visualizations of the continuous and categorical features in


the motor insurance claims fraud detection ABT in Table 2.

Figure: Visualizations of the


continuous and categorical features in
the motor insurance claims fraud
detection ABT in Table
3.2 Getting To Know The Data

For categorical features, we should:


• Examine the mode, 2nd mode, mode %, and 2nd mode %
as these tell us the most common levels within these Features and will identify if any
levels dominate the dataset.
For continuous features we should:
• Examine the mean and standard deviation of each feature to get a sense of the central
tendency and variation of the values within the dataset for the feature.
• Examine the minimum and maximum values to understand the range that is possible
for each feature.

When we generate histograms of features there are a number of common,


well understood shapes that we should look out for.

Figure: Histograms for different sets of data each of which exhibit well-known, common
characteristics.

Figure: Histograms for different sets of data each of which exhibit


well-known, common characteristics
A uniform distribution indicates that a feature is
equally likely to take a value in any of the
ranges present.

Features following a normal distribution are


characterized by a strong tendency towards a
central value and symmetrical variation to
either side of this.

Skew is simply a tendency toward very high


(right skew) or very low (keyword left skew)
values.
In a feature following an exponential
distribution the likelihood of occurrence of a
small number of low values is very high, but
sharply diminishes as values increase.

A feature characterized by a multimodal


distribution has two or more very commonly
occurring ranges of values that are clearly
separated

The probability density function for the normal distribution (or Gaussian distribution) is

where x is any value, and µ and 𝜎 are parameters that define the shape of the
distribution: the population mean and population standard deviation.

Figure: Three normal distributions with Figure: Three normal distributions with
different means but identical identical means but different
standard deviations. standard deviations.
Figure: An illustration of the 68 95 99:7 percentage rule that a normal distribution
defines as the expected distribution of observations. The grey region defines the area where
95% of observations are expected.

3.3 Identifying Data Quality Issues

A data quality issue is loosely defined as anything unusual about the data in an ABT.
The most common data quality issues are:
 missing values
 irregular cardinality
 Outliers
The data quality issues we identify from a data quality report will be of two types:
 Data quality issues due to invalid data
 Data quality issues due to valid data.

Table: The data quality plan for the motor insurance fraud prediction ABT.

3.4 Handling Data Quality Issues

• Approach 1: Drop any features that have missing value.


• Approach 2: Apply complete case analysis.
• Approach 3: Derive a missing indicator feature from features with missing
value.
• Imputation replaces missing feature values with a plausible estimated value
based on the feature values that are present.
• The most common approach to imputation is to replace missing values for a feature
with a measure of the central tendency of that feature
• We would be reluctant to use imputation on features missing in excess of 30% of
their values and would strongly recommend against the use of imputation on features
missing in excess of 50% of their values.
• The easiest way to handle outliers is to use a clamp the transformation that clamps all
values above an upper threshold and below a lower threshold to this threshold values,
thus removing the offending outliers.

where ai,is a specific value of feature a, and lower and upper are the lower and upper
thresholds.

3.4 Information based Learning

• In this chapter we are going to introduce a machine learning algorithm that tries
to build predictive models using only the most informative features.

• In this context an informative feature is a descriptive feature whose values split the
instances in the dataset into homogeneous sets with respect to the target feature
value.
(a) Brian (b) John (c) Aphra (d) Aoife

Figure: Cards showing character faces and names for the Guess-Who game

Man Long Hair Glasses Name

(a) Brian (b) John (c) Aphra (d) Aoife

Figure: Cards showing character faces and names for the Guess-Who game

Is it a man? .
Does the person wear glasses?

In both of the diagrams:


• one path is 1 question long,
• one path is 2 questions long,
• and two paths are 3 questions long.

Consequently, if you ask Question (2) first the average number of


questions you have to ask per game is:

1+2+3+3
= 2.25
4
Figure: The different question sequences that can follow in a game of Guess-Who
beginning with the question Is it a man?

All the paths in this diagram are two questions long.

So, on average if you ask Question (1) first the average number of questions you
have to ask per game is:
2+2+2+2
4 =2

• On average getting an answer to Question (1) seems to give you more


information than an answer to Question (2): less follow up questions.

• This is not because of the literal message of the answers: YES or NO.

• It is to do with how the answer to each questions splits the domain into
different sized sets based on the value of the descriptive feature the question is
2
asked about and the likelihood of each possible answer to the question.

Big Idea
So the big idea here is to figure out which features are the most informative
ones to ask questions about by considering the effects of the different answers
to the questions, in terms of:

• how the domain is split up after the answer is received,


• likelihood of each of the answers.
3.4.1 Fundamentals

A decision tree consists of:


• a root node (or starting node),
• interior nodes
• and leaf nodes (or terminating nodes).

• Each of the non-leaf nodes (root and interior) in the tree specifies a test to be
carried out on one of the query’s descriptive features.

• Each of the leaf nodes specifies a predicted classification for the query.

Table: An email spam prediction dataset

Figure: (a) and (b) show two decision trees that are consistent with the instances
in the spam dataset. (c) shows the path taken through the tree shown in (a) to
make a prediction for the query instance: SUSPICIOUS WORDS = ’true’,
UNKNOWN SENDER = ’true’, CONTAINS IMAGES = ’true’.

• Both of these trees will return identical predictions for all the examples in the
dataset.

• So, which tree should we use?


• Apply the same approach as we used in the Guess-Who game: prefer decision trees
that use less tests (shallower trees).

• This is an example of Occam’s Razor.

How do we create shallow?rees?

• The tree that tests SUSPICIOUS WORDS at the root is very shallow because the
SUSPICIOUS WORDS feature perfectly splits the data into pure groups of ’spam’
and ’ham’.

• Descriptive features that split the dataset into pure sets with respect to the
target feature provide information about the target feature.

• So we can make shallow trees by testing the informative features early on in


the tree.

• All we need to do that is a computational metric of the purity of a set:


entropy

3.4.2 Shannon’sEntropyModel

 Claude Shannon’s entropy model defines a computational measure of the impurity of


the elements of a set.

 An easy way to understand the entropy of a set is to think in terms of the uncertainty
associated with guessing the result if you were to make a random selection from the
set.

 Entropy is related to the probability of a outcome.


 High probability → Low entropy
 Low probability → High entropy

 If we take the log of a probability and multiply it by -1 we get this mapping!

What is a log?

Remember the log of a to the base b is the number to which we must raise b to get a.

log2(0.5) = −1 because 2−1 = 0.5

log2(1) = 0 because 20 = 1

log2(8) = 3 because 23 = 8

log5(25) = 2 because 52 = 25

log5(32) = 2.153 because 52.153 = 32


Figure: (a) A graph illustrating how the value of a binary log (the log to the base 2) of a
probability changes across the range of probability values. (b) the impact of multiplying
these values by − 1.

Shannon’s model of entropy is a weighted sum of the logs of the probabilities of each of the
possible outcomes when we make a random selection from a set.

What is the entropy of a set of 52 different playing cards?


What is the entropy of a set of 52 playing cards if we only distinguish between the cards
based on their suit { ♥ , ♣ , ♦ , ♠ } ?

Figure: The entropy of different sets of playing cards measured in bits.

Table: The relationship between the entropy of a message and the set it was selected
from.
3.4.3 Information Gain

The measure of informativeness that we will use is known as information gain


and is a measure of the reduction in the overall entropy of a set of
instances that is achieved by testing on a descriptive feature.
Computing information gain is a three-step process:
1. Compute the entropy of the original dataset with respect to the target feature.
This gives us a measure of how much information is required in order to organize
the dataset into pure sets.
2. Foreachdescriptivefeature,createthesetsthatresultbypartitioningtheinstancesin
the dataset using their feature values, and then sum the entropy scores of each of
these sets. This gives a measure of the information that remains required to
organize the instances into pure sets after we have split them using the descriptive
feature.
3. Subtract the remaining entropy value (computed in step 2) from the original
entropy value(computedinstep1)to give the information gain.

The equation calculates the entropy for a dataset with respect to a target feature
3.5 Similarity-based Learning

Based on computational measure of similarity (in the form of distance measure) between
instances
Feature space: each descriptive has its own dimensional axis. It is an
abstract m-dimensional space that is created by making each descriptive feature in a
dataset an axis of an m-dimensional coordinate system and mapping each instance in
the dataset to a point in this coordinate space based on the values of its descriptive
features
Working of feature space: if the values of the descriptive features of two or
more instances in a dataset are the same, then these instances will be mapped to the
same point in the feature space and vice versa
Distance between two points in the feature space is a useful measure of
the similarity of the descriptive features of two instances
metric(a,b): is a real value function that returns the distance between
two points a and b in the feature space. It has following properties:
•Non-negativity
•Identity
•Symmetry
•Triangular inequality
Two examples of distance metric
•Euclidean distance
•Manhattan distance (taxi-cam distance)=sum of absolute differences.
Minkowski distance: a family of distance metrics based on differences between features

Nearest Neighbor algorithm:


When this model is used to make prediction for new instances, the distance
in the feature space between the query instance and each instance in the data set
instances is computer, and the prediction returned by the model is the target feature
level of the dataset's instance that is nearest to the query in the feature space. It stores
entire training dataset in memory resulting in a negative effect on time complexity of the
algorithm
Algorithm : Nearest neighbor algorithm.
Require: a set of training instances
Require: a query instance
1: Iterate across the instances in memory to find the nearest neighbor—this is the instance
with the shortest distance across the feature space to the query instance.
2: Make a prediction for the query instance that is equal to the value of the target feature of
the nearest neighbor.

 Decision boundary: is the boundary between regions of the feature space in which
different target lavels will be predicted. It is generated by aggregating the
neighboring local models (Voronoi regions) that make the same prediction

 Noise effects:
 Using Kronecker delta approach:

 n-n algorithm is sensitive to noise because any errors in the description of labeling
of training data results in erroneous local models and incorrect predictions. One way
to mitigate against noise is to modify the algorithm to return the majority target level
within the set of k nearest neighbors to the query q

 Using weighted k nearest neighbor approach:

 Efficient Memory search:


 Assuming that the training dataset will remain relatively stable, the time issue can eb
offset by investing in one-off computation to create an index of the instances that
enables efficient retrieval of the nearest neighbors without doing an exhaustive
search of the entire training dataset.
 k-d tree: stands for k-dimensional tree which is a balanced binary tree in which each
of the nodes in the tree index one on the instances in a training dataset
 This tree is constructed so that the nodes that are nearby in the tree index training
instances that are nearby in the feature space
3.6 Probability-based Learning

• We can use estimates of likelihoods to determine the most likely prediction that
should be made.More importantly, we revise these predictions based on data
we collect and whenever extra evidence becomes available.

• A probability function, P(), returns the probability of a feature taking a specific value.
• A joint probability refers to the probability of an assignment of specific values to
multiple different features
• A conditional probability refers to the probability of one feature taking a specific
value given that we already know the value of a different feature
• A probability distribution is a data structure that describes the probability of each
possible value a feature can take. The sum of a probability distribution must equal 1.0
• A joint probability distribution is a probability distribution
over more than one feature assignment and is written as a
multi-dimensional matrix in which each cell lists the
probability of a particular combination of feature values
being assigned
• The sum of all the cells in a joint probability distribution
must be 1.0.
• Baye’s Theorem
Bayes’ Theorem defines the conditional probability of an event, X, given some evidence,Y,
in terms of the product of the inverse conditional probability, P(Y | X), and the prior
probability of the event P(X).

• Bayesian Prediction
To make Bayesian predictions, we generate the probability of the event that a target
feature, t, takes a specific level, l, given the assignment of values to a set of descriptive
features,q, from a query instance. We can restate Bayes’ Theorem using this terminology
and generalize the definition of Bayes’ Theorem so that it can take into account more than
one piece of evidence(each descriptive feature value is a separate piece of evidence).
The Generalized Bayes’Theorem is defined as

To calculate a probability using the Generalized Bayes’ Theorem, we need to calculate three
probabilities:

1. P(t=l),the prior probability of the target feature t taking the level l


2. P(q[1],…,q[m]),the joint probability of the descriptive features of a query instance
Taking a specific set of values
3. P(q[1], …, q[m] | t = l), the conditional probability of the descriptive features of a
query instance taking a specific set of values given that the target feature takes the
Level l

The technical term for this splitting of the data into smaller and smaller
sets based on larger and larger sets of conditions is data fragmentation. Data
fragmentation is essentially an instance of the curse of dimensionality. As the
number of descriptive features grows, the number of potential conditioning events
grows. Consequently, an exponential increase is required in the size of the dataset as
each new descriptive feature is added to ensure that for any conditional probability,
there are enough instances in the training dataset matching the conditions so that the
resulting probability is reasonable.
Conditional Independence and Factorization
If knowledge of one event has no effect on the probability
of another event, and vice versa, then the two events are
independent of each other.
f two events X and Y are independent then:
P(X|Y) = P(X)
P(X, Y) = P(X) x P(Y)
• Full independence between events is quite rare.
• A more common phenomenon is that two, or more, events may be independent if we
know that a third event has happened
• This is known as conditional independence
3.7 Error-Based Learning

In error-based machine learning, we perform a search for a set of parameters

for a parameterized model that minimizes the total error across the predictions made by that
model with respect to a set of training instances. In this section introduces the key ideas of a
parameterized model, measuring error and an error surface.
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown
by a Simple Linear Regression model is linear or a sloped straight line, hence it is called
Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous
or categorical values.

a)A scatter plot of the SIZE and RENTAL PRICE features from the office rentals
dataset; (b)the scatter plot from (a)with al inear model relating RENTALPRICE to
SIZE overlaid.

Measuring Error

In order to formally measure the fit of a linear regression model with a set
of training data, we require an error function. An error function captures the
error between the predictions made by a model and the actual values in a
training dataset.
• There are many different kinds of error functions, but for measuring the fit of simple
linear regression models, the most commonly used is the sum of squared errors error
function, or L2.
• To calculateL2 we use our candidate model to make a prediction for each member of the
training dataset,and then calculate the error (or residual) between these predictions and
the actual target feature values in the training set.
Error Surface:
For every possible combination of weights, w[0] and w[1], there is a
corresponding sum Of squared errors value. We can think about all these error values
joined to make a surface Defined by the weight combinations, as shown in the Figure given
below. Here, each pair of Weights w[0]and w[1] defines a point on the x-y plane, and the
sum of squared errors for the model using these weights determines the height of the error
surface above the x-y Plane for that pair of weights. The x-y plane is known as a weight
space, and the surface is known as an error surface. The model that best fits the training
data is the model Corresponding to the lowest point on the error surface.
3.8 Evaluation

When evaluating machine learning (ML) models, the question that arises is
whether the model is the best model available from the model’s hypothesis space in terms
of generalization error on the unseen / future data set. Whether the model is trained and
tested using the most appropriate method. Out of available models, which model to
select? These questions are taken care of using what is called as a hold-out method.
Hold-out method for Model Evaluation
The hold-out method for model evaluation represents the mechanism of splitting the
dataset into training and test datasets. The model is trained on the training set and then
tested on the testing set to get the most optimal model. This approach is often used when
the data set is small and there is not enough data to split into three sets (training,
validation, and testing). This approach has the advantage of being simple to implement,
but it can be sensitive to how the data is divided into two sets. If the split is not random,
then the results may be biased. Overall, the hold out method for model evaluation is a
good starting point for training machine learning models, but it should be used with
caution. The following represents the hold-out method for model evaluation.

In the above diagram, you may note that the data set is split into two parts. One split is
set aside or held out for training the model. Another set is set aside or held out for
testing or evaluating the model. The split percentage is decided based on the volume of
the data available for training purposes. Generally, 70-30% split is used for splitting the
dataset where 70% of the dataset is used for training and 30% dataset is used for
testing the model.
This technique is well suited if the goal is to compare the models based on the
model accuracy on the test dataset and select the best model. However, there is always a
possibility that trying to use this technique can result in the model fitting well to the test
dataset. In other words, the models are trained to improve model accuracy on the test
dataset assuming that the test dataset represents the population. The test error, thus,
becomes an optimistically biased estimation of generalization error. However, that is not
desired. The final model fails to generalize well to the unseen or future dataset as it is trained
to fit well (or overfit) concerning the test data.
The following is the process of using the hold-out method for model evaluation:
•Split the dataset into two parts (preferably based on a 70-30% split; However, the
percentage split will vary)
•Train the model on the training dataset; While training the model, some fixed set of
hyperparameters is selected.
•Test or evaluate the model on the held-out test dataset
•Train the final model on the entire dataset to get a model which can generalize better on the
unseen or future dataset.

3.9 The art of Machine learning to Predictive Data Analytics.

Predictive data analytics projects use machine learning to build models that
capture the relationships in large datasets between descriptive features and a target
feature. A specific type of learning, called inductive learning, is used, where learning
entails inducing a general rule from a set of specific instances. This observation is
important because it highlights that machine learning has the same properties as inductive
learning. Predictive analytics project can use CRISP-DM process to manage a project
through its lifecycle.
The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model that
serves as the base for a data science process. It has six sequential phases:
1.Business understanding – What does the business need?
2.Data understanding – What data do we have / need? Is it clean?
3.Data preparation – How do we organize the data for modeling?
4.Modeling – What modeling techniques should we apply?
5.Evaluation – Which model best meets the business objectives?
6.Deployment – How do stakeholders access the results?
Assignments
ASSIGNMENT - 1

Predicting Gas Prices Using Azure Machine Learning Studio


Gas prices are probably one of the items already in most people’s budget. Constant
increase or decrease can influence prices of other groceries and services as well. There are
a lot of factors that can influence gas prices, from weather conditions to political decisions
and administrative fees, and to totally unpredictable factors such as natural disasters or
wars.The plan for this Azure machine learning tutorial is to investigate some accessible data
and find correlations that can be exploited to create a prediction model.

65
Part A – Q & A
Unit - IV
PART -A

S.No Question and Answer CO,K

What are the machine learning methods that can be used for CO4,K1
1.
predictive analysis?
Methods used in predictive analytics include machine learning
algorithms, advanced mathematics, statistical modeling, descriptive
analytics and data mining. The term predictive analytics designates
an approach rather than a particular technology

2. Which algorithm is used for predictive analysis? CO4,K1

Numerous types of predictive analytics models are designed


depending on these algorithms to perform desired functions. For
instance, these algorithms include regression algorithm,
clustering algorithm, decision tree algorithm, outliers
algorithm, and neural networks algorithm

1. What Is Predictive Data Analytics? CO4,K1

Predictive data analytics is the art of building and using


models that make predictions based on patterns extracted
from historical data.

4. List The Applications of Predictive data analytics. CO4,K1

Price Prediction
Dosage Prediction
Risk Assessment
Document Classification:

5. What is ill-posed problem? CO4,K1

An ill-posed problem is a problem for which a


uniquesolutioncannotbedeterminedusingonlytheinformationthatisav
ailable.

6
7
PART -A

S.No Question and Answer CO,K

6. Draw A diagram of the CRISP-DM process. CO4,K1

7. The image below shows a set of eight Scrabble pieces CO4,K1

What is the entropy in bits of the letters in this set?


We can calculate the probability of randomly selecting a letter of each
type from this set:

8. Define propensity models. CO4,K1


Many of the predictive models that we build are propensity models,
which predict the likelihood (or propensity) of a future outcome based
on a set of descriptive features describing the past.

9. What is data exploration? CO4,K1


Data exploration is a key part of both the Data Understanding and,Data
Preparation phases of CRISP-DM.

6
8
PART -A
S.N Question and Answer CO,K
o
10 CO4,K1
Draw the structure of a data quality plan.

11 What are Data Quality issues. CO4,K1

The most common data quality issues, however, are missing values,
irregular cardinality problems, and outliers.

12 List the two goals in data exploration. CO4,K1

• The first goal is to fully understand the characteristics of the data in


the ABT.
• Second , whether the data suffer from any data quality issue

13 Define Normalization. CO4,K1


Normalization techniques can be used to change a
continuous feature to fall within a specified range while maintaining the
relative differences between the values for the feature.

14 Write about the approaches of Normalisation. CO4,K1


The simplest approach to normalization is range normalization, which
performs a linear scaling of the original values of the continuous feature
into a given range. We use range normalization to convert a feature
Value into the range[low,high]

15 Define binning and its types. CO4,K1


Binning involves converting a continuous feature into a categorical
feature.
Types: qual-width binning and equal-frequency binning

6
9
PART -A

S.N Question and Answer CO,K


o
16 Write about decision tree. CO4,K1
decision tree models, which make predictions based on sequences of
tests on the descriptive feature values of a query. Consequently,
decision trees naturally lend themselves to being trained using
information-based metrics

17 How do you avoid overfitting in decision trees? CO4,K1

Two approaches to avoiding overfitting are distinguished: pre-pruning


(generating a tree with fewer branches than would otherwise be the
case) and post-pruning (generating a tree in full and then removing
parts of it).

18 Define Similarity-based Learning. CO4,K1


Similarity-based approaches to machine learning come from the idea
that the best way to make a predictions is to simply look at what has
worked well in the past and predict the same thing again. The
fundamental concepts required to build a system based on this idea
Are feature spaces and measures of similarity

19 What is the standard approach to building similarity-based models? CO4,K1


nearest neighbor algorithm- This algorithm is built on two fundamental
concepts: (1) a feature space, and (2) measures of similarity between
instances within the feature space.

20 What is joint Probability Distribution? CO4,K1

A joint probability distribution is a probability distribution over more than


one feature assignment and is written as a multi-dimensional matrix in
which each cell lists the probability of a particular combination of feature
values being assigned.

7
0
Part B – Questions
S.No Question and Answer CO,K
PART -B
1. Explain the fundamental concepts of different learning for predictive CO4,K4

data analytics in machine learning.

2. Explain how data exploration is done in predictive data analytics in CO4,K4

machine learning.

3. Explain Information based learning with decision tree concepts with CO4,K4

suitable example.

4. Explain probability-based learning concepts in predictive data analytics CO4,K4

in machine learning.

5. Explain Error-based learning concepts in predictive data analytics in CO4,K4

machine learning.

6. Explain The art of Machine learning to Predictive Data Analytics with CO4,K4

suitable example.

7. Explain Similarity-based learning concepts in predictive data analytics in CO4,K4

machine learning.

7
2
Supportive online
Certification courses
(NPTEL, Swayam,
Coursera, Udemy, etc.,)
SUPPORTIVE ONLINE COURSES

Course
S No Course title Link
provider

1 Similarity-Based https://www.coursera.org/lect
Recommender for ure/deploying-machine-
Rating Prediction
learning-models/similarity-
Coursera
based-recommender-for-
rating-prediction-N098n

https://www.coursera.org/lear
2 Predictive modeling in
n/predictive-modeling-
Coursera analytics
analytics

74
Real time Applications in
day to day life and to
Industry
REAL TIME APPLICATIONS IN DAY TO DAY LIFE
AND TO INDUSTRY

1. Customer Lifetime Value


It is pretty challenging to identify the customer in the market who is most likely to
spend large amounts of money consistently over a long period.
This kind of data through predictive analytics use case allows the business to optimize
their marketing strategies to gain customers with the most significant lifetime value
towards your company and product.
Key Industries: Insurance, Telecommunications, Banking, Retail
2. Product Propensity
Product propensity combines purchasing activity and behavior data with online
behavior metrics from social media and e-commerce. It enables you to identify the
customer’s interest in buying your product and services and the medium to reach those
customers.
It helps to correlate the data to provide insights from different campaigns and social
media channels for your business services and products. Predictive analytics
applications never fail to maximize those channels that have the best chance of
producing significant revenue.
Key Industries: Banking, Insurance, Retail
3. Risk Modeling
Prevention and prediction are two sides of the same coin. Risk comes in various forms
and initiates from a variety of sources. Predictive analytics can draw potential risk areas
from significant data insights collected from most organizations.
It sorts them to analyze the potential risks and suggests the development of situations
that can affect the business. By combining the results of the predictive analytics
applications with the risk management approach, companies can evaluate the risk
issues and decide how to mitigate those risk factors.
For instance, health organizations generate risk scores to identify the patients who
might benefit from enhanced services, preventative care, and wellness consultations.
Key Industries: Banking, Manufacturing, Automotive, Logistics and Transportation,
Utilities, Oil and Gas Utilities, Pharmaceuticals

76
Content Beyond Syllabus
Contents beyond the Syllabus

Neural networks – Building blocks of Data Analysis


The neural network is a system of hardware and software mimicked after the central
nervous system of humans, to estimate functions that depend on vast amounts of
unknown inputs. Neural networks are specified by three things – architecture, activity
rule, and learning rule.
According to Kaz Sato, Staff Developer Advocate at Google Cloud Platform, “A neural
network is a function that learns the expected output for a given input from training
datasets”. A neural network is an interconnected group of nodes. Each processing node
has its small sphere of knowledge, including what it has seen and any rules it was
initially programmed with or developed for itself.

In short neural networks are adaptive and modify themselves as they learn from
subsequent inputs. For example, below is a representation of a neural network that
performs image recognition for ‘humans’. The system has been trained with a lot of
samples of human and non-human images. The resulting network works as a function
that takes an image as input and outputs label human or non-human.

78
Building predictive capabilities using Machine Learning and Artificial Intelligence
Let’s implement what we have learned about neural networks in an everyday predictive
example. For example, we want to model a neural network for the banking system that
predicts debtor risk. For such a problem, we have to build a recurrent neural network that can
model patterns over time. RNN will require colossal memory and a large quantity of input data.
The neural system will take data sets of previous debtors.

Input variables can be age, income, current debt, etc. and provide the risk factor for the
debtor. Each time we ask our neural network for an answer, we also save a set of our
intermediate calculations and use them the next time as part of our input. That way, our
model will adjust its predictions based on the data that it has seen recently.
Assessment Schedule
(Proposed Date & Actual
Date)
Assessment Schedule

S.no Assessment Test Date


1. First Internal Assessment 21.09.2022
2. Second Internal Assessment 07.11.2022
1 Model Examination 08.12.2022

81
Prescribed Text Books &
Reference
Prescribed Text Books & Reference Books

TEXT BOOKS
1. Ameet V Joshi, Machine Learning and Artificial Intelligence, Springer Publications, 2020
2. John D. Kelleher, Brain Mac Namee, Aoife D’ Arcy, Fundamentals of Machine learning for
Predictive Data Analytics, Algorithms, Worked Examples and case studies, MIT press,2015
REFERENCES
1. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer Publications,
2011
2. Stuart Jonathan Russell, Peter Norvig, John Canny, Artificial Intelligence: A Modern
Approach, Prentice Hall, 2020
1. Machine Learning Dummies, John Paul Muller, Luca Massaron, Wiley Publications, 2021

83
Mini Project Suggestions
Mini Project Suggestion

Forecasting HVAC needs


Combine the weather forecast with what your building automation system tells you about
how your facilities are used by staff and the data you can get from your HVAC system and
you can reduce costs for heating, ventilation and air conditioning.
It takes time to get a building to the temperature you want when people are at work
(especially if you’re saving energy by not heating or cooling them out of hours), and that
varies for each building and depends on the weather. Plus, not every building is fully
occupied all year round. Instead of starting the systems at the same time every day for
every building, you can save money and keep employees more comfortable at work by
predicting the right time to ramp up the HVAC system. When Microsoft’s real estate team
applied this to just three buildings, they saw savings of $15,000 annually; that will turn into
more than $500,000 once the system is in 43 buildings — and 60 fewer hours when
employees are sweating or shivering.
4. Customer service and support
Predictive analytics is common in sales tools like Salesforce, but you can also use it to
handle the customers you already have, whether that’s field service or call centers. Adobe
Analytics uses predictive analytics to forecast future customer behavior down to when you’ll
run into special shipping requirements.
MTD makes outdoor equipment like lawn mowers and snow ploughs and credits the
predictive analytics and real-time information it’s added to call center systems with reducing
call abandonment by 65 percent and cutting the average time to handle a call by 40
percent, thanks to better agent scheduling — because managers know in advance when
they’ll need more agents at work.

85
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
relianceon the contentsof this informationis strictly prohibited.

86

You might also like