Ch01 ICS422 01
Ch01 ICS422 01
Class 01
Presented by
Dr. Selvi C
Assistant
Professor
IIIT Kottayam
Predictive
Analytics and
Descriptive
Analytics
2
3
4
5
6
Applied Predictive Analytics
• Bayesian analysis.
• Gradient boosting.
• Incremental response (also called net lift or
uplift models).
• K-nearest neighbor (knn).
• Memory-based reasoning.
• Partial least squares.
• Principal component analysis.
• Support vector machine.
• Time series data mining.
What do you need to get started
using predictive analytics?
Step 1:
• The first thing you need to get started using predictive
analytics is a problem to solve.
• What do you want to know about the future based on
the past?
• What do you want to understand and predict?
• You’ll also want to consider what will be done with the
predictions.
• What decisions will be driven by the insights? What
actions will be taken?
What do you need to get
started using predictive
analytics?
Step 2:
• Second, you’ll need data. In today’s world, that means
data from a lot of places.
• Transactional systems, data collected by sensors, third-
party information, call center notes, web logs, etc.
• You’ll need a data wrangler, or someone with data
management experience, to help you cleanse and prep
the data for analysis.
• To prepare the data for a predictive modeling exercise
also requires someone who understands both the data
and the business problem.
• How you define your target is essential to how you can
interpret the outcome. (Data preparation is considered
one of the most time-consuming aspects of the analysis
process. So be prepared for that.)
What do you need to get started
using predictive analytics?
Step 3:
• After that, the predictive model building begins.
• Increasingly easy-to-use software means more
people can build analytical models.
• But you’ll still likely need some sort of data analyst
who can help you refine your models and come up
with the best performer.
• And then you might need someone in IT who can
help deploy your models. That means putting the
models to work on your chosen data – and that’s
where you get your results.
What do you need to get
started using predictive
analytics?
Step 4:
• Predictive modeling requires a team approach.
You need people who understand the business
problem to be solved.
• Someone who knows how to prepare data for
analysis. Someone who can build and refine the
models.
• Someone in IT to ensure that you have the right
analytics infrastructure for model building and
deployment.
• And an executive sponsor can help make your
analytic hopes a reality.
Applications
17
18
19
20
ICS422 Applied Predictive
Analytics [3- 0-0-3]
Class 02
21
22
Machine learning
• Machine learning is a sub-field of artificial intelligence (AI)
that provides systems the ability to automatically learn and
improve from experience without being explicitly
programmed.
• For the process of learning (model fitting) we need to have
available some observations or data (also known as
samples or examples) in order to explore potential
underlying patterns, hidden in our data. These learned
patterns are nothing more that some functions or
decision boundaries.
• These patterns are learned by the systems
(computer systems) automatically without human
intervention or input.
• Machine learning algorithms are usually
categorized as supervised or unsupervised.
23
Supervised machine
learning
algorithms/methods
• Given a set of data points {x(1),...,x(m)} associated to
a set of outcomes {y(1),...,y(m)}, we want to build a
classifier that learns how to predict y from x.
• Supervised models can be further grouped into
regression and classification cases:
• Classification: A classification problem is when
the output variable is a category e.g. “disease” /
“no disease”.
• Regression: A regression problem is when the
output variable is a real continuous value e.g. stock
price prediction
24
25
26
Classification: Classify our inputs in
one of the predefined and exhaustive
classes
27
Some of the algorithms
used for classification
• Logistic regression
• Random forest
• Decision tree
• Support vector regressor
• k-nearest neighbors
28
Regression
Regression will not give a class as output
but a specific value also called a
forecast or prediction.
29
Algorithms can be used for
regression
• Linear regression
• Random forest
• Decision tree
• Support vector regressor
• k-nearest neighbors
30
Unsupervised machine
learning
algorithms/methods
31
Contd…
• Unsupervised models can be further grouped
into clustering and association case
• Clustering: A clustering problem is where
you want to unveil the inherent groupings in
the data, such as grouping animals based on
some characteristics/features e.g. number of
legs.
• Association: An association rule learning is
where you want to discover association rules
such as people that buy X also tend to buy Y.
32
Contd…
33
Clustering Algorithms
• k-means clustering
• Hierarchical clustering
34
Semi-supervised machine
learning
algorithms/methods
• This family is between the supervised and unsupervised
learning families. The semi-supervised models use both
labeled and unlabeled data for training.
• Similarly to supervised and unsupervised learning,
semi-supervised learning consists of working with
a dataset.
• However, datasets in semi-supervised learning are split
into two parts: a labeled part and an unlabeled one.
This technique is often used when labeling the data or
gathering labeled data is too difficult or too expensive.
The part of the data labeled can also be of bad quality.
35
Contd…
36
Practical applications of
Semi-Supervised Learning
• Speech Analysis: Since labeling of audio files is a very
intensive task, Semi-Supervised learning is a very
natural approach to solve this problem.
• Internet Content Classification: Labeling each webpage
is an impractical and unfeasible process and thus uses
Semi- Supervised learning algorithms.
• Protein Sequence Classification: Since DNA strands are
typically very large in size, the rise of Semi-Supervised
learning has been imminent in this field.
37
Summary
• Supervised: All the observations in the dataset are
labeled and the algorithms learn to predict the output
from the input data.
• Unsupervised: All the observations in the dataset are
unlabeled and the algorithms learn to inherent
structure from the input data.
• Semi-supervised: Some of the observations of the
dataset are labeled but most of them are usually
unlabeled. So, a mixture of supervised and
unsupervised methods are usually used.
38
Summary
• Using Machine learning (ML) models we are
able to perform analyses of massive quantities
of data.
• Data patterns that would be impossible to
identified by a human being, can be accurately
extracted using these ML models within
seconds (in some cases).
• However, most of the times, accurate results
(good models) usually require a lot of time and
resources for the model training (the
procedure under which the model learns a
function or a decision boundary).
39
How to Choose an
Appropriate Approach?
• The type of problem – With the problem to solve in mind, we’re
going to choose an algorithm that has proven to provide good
results for similar problems
• The number of samples available – In general, the larger the
dataset the better but some algorithms perform well on little
dataset too (e.g Naive Bayes, K-Neighbors Classifier, Linear
SVC, SVR)
• The complexity of the model’s algorithm compared to the
amount of data used to train it – More precisely, if the
algorithm is too complex but has been trained on very few data,
it will be too flexible and may end up overfitting
• The expected accuracy – A machine learning model with low
accuracy can get trained way faster than another one aiming for
minimal loss
40
References
• https://www.sas.com/en_in/insights/analytics/predi
ctive-analytics.html
• https://towardsdatascience.com/what-is-machine-l
earning-a-short-
note-on-supervised-unsupervised-semi-supervised-
and-
aed1573ae9bb
• https://www.baeldung.com/cs/machine-learning-int
ro
• https://rstudio-pubs-
static.s3.amazonaws.com/559023_f62f3bce1be14f
b9b248127194c0c
1e3.html
• https://blogs.nvidia.com/blog/2018/08/02/supervis
Thank you