0% found this document useful (0 votes)
12 views42 pages

Ch01 ICS422 01

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views42 pages

Ch01 ICS422 01

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

ICS422 Applied Predictive Analytics [3- 0-0-3]

Class 01

Presented by
Dr. Selvi C
Assistant
Professor
IIIT Kottayam
Predictive
Analytics and
Descriptive
Analytics

2
3
4
5
6
Applied Predictive Analytics

Focuses on how to use predictive analytic techniques to


analyze historical data for the purpose of predicting
future results

Predictive analytics is what translates big data into


meaningful, usable business information

Predictive analytics is the use of data, statistical


algorithms and machine learning techniques to
identify the likelihood of future outcomes based on
historical data.

The goal is to go beyond knowing what has happened to


providing a best assessment of what will happen in the
future.
Predictive Analytics History & Current Advances

Though predictive analytics has been around for decades, it's a


technology whose time has come. More and more organizations
are turning to predictive analytics to increase their bottom line
and competitive advantage. Why now?

• Growing volumes and types of data, and more interest in using


data to produce valuable insights.
• Faster, cheaper computers.
• Easier-to-use software.
• Tougher economic conditions and a need for competitive
differentiation.

With interactive and easy-to-use software becoming more


prevalent, predictive analytics is no longer just the domain of
mathematicians and statisticians. Business analysts and line-of-
business experts are using these technologies as well.
Why is predictive analytics important?
Organizations are turning to predictive analytics to help solve difficult problems and
uncover new opportunities. Common uses include:
• Detecting fraud. Combining multiple analytics methods can improve pattern
detection and prevent criminal behavior. As cybersecurity becomes a growing
concern, high-performance behavioral analytics examines all actions on a network
in real time to spot abnormalities that may indicate fraud, zero-day vulnerabilities
and advanced persistent threats.
• Optimizing marketing campaigns. Predictive analytics are used to determine
customer responses or purchases, as well as promote cross-sell opportunities.
Predictive models help businesses attract, retain and grow their most profitable
customers.
• Improving operations. Many companies use predictive models to forecast
inventory and manage resources. Airlines use predictive analytics to set ticket
prices. Hotels try to predict the number of guests for any given night to maximize
occupancy and increase revenue. Predictive analytics enables organizations to
function more efficiently.
• Reducing risk. Credit scores are used to assess a buyer’s likelihood of default
for purchases and are a well-known example of predictive analytics. A credit
score is a number generated by a predictive model that incorporates all data
relevant to a person’s creditworthiness. Other risk-related uses include insurance
claims and collections.
Who's using it?

• Banking & Financial Services


• Retail
• Oil, Gas & Utilities
• Governments & the Public Sector
• Health Insurance
• Manufacturing
How It Works
• Predictive models use known results to develop (or train) a
model that can be used to predict values for different or new
data. Modeling provides results in the form of predictions that
represent a probability of the target variable (for example,
revenue) based on estimated significance from a set of input
variables.
• There are two types of predictive
models. Classification models predict class membership. For
instance, you try to classify whether someone is likely to leave,
whether he will respond to a solicitation, whether he’s a good or
bad credit risk, etc. Usually, the model results are in the form of
0 or 1, with 1 being the event you are
targeting. Regression models predict a number – for example,
how much revenue a customer will generate over the next year
or the number of months before a component will fail on a
machine.
• Three of the most widely used predictive modeling techniques
are decision trees, regression and neural networks.
Other Popular Techniques You May Hear About

• Bayesian analysis.
• Gradient boosting.
• Incremental response (also called net lift or
uplift models).
• K-nearest neighbor (knn).
• Memory-based reasoning.
• Partial least squares.
• Principal component analysis.
• Support vector machine.
• Time series data mining.
What do you need to get started
using predictive analytics?

Step 1:
• The first thing you need to get started using predictive
analytics is a problem to solve.
• What do you want to know about the future based on
the past?
• What do you want to understand and predict?
• You’ll also want to consider what will be done with the
predictions.
• What decisions will be driven by the insights? What
actions will be taken?
What do you need to get
started using predictive
analytics?
Step 2:
• Second, you’ll need data. In today’s world, that means
data from a lot of places.
• Transactional systems, data collected by sensors, third-
party information, call center notes, web logs, etc.
• You’ll need a data wrangler, or someone with data
management experience, to help you cleanse and prep
the data for analysis.
• To prepare the data for a predictive modeling exercise
also requires someone who understands both the data
and the business problem.
• How you define your target is essential to how you can
interpret the outcome. (Data preparation is considered
one of the most time-consuming aspects of the analysis
process. So be prepared for that.)
What do you need to get started
using predictive analytics?

Step 3:
• After that, the predictive model building begins.
• Increasingly easy-to-use software means more
people can build analytical models.
• But you’ll still likely need some sort of data analyst
who can help you refine your models and come up
with the best performer.
• And then you might need someone in IT who can
help deploy your models. That means putting the
models to work on your chosen data – and that’s
where you get your results.
What do you need to get
started using predictive
analytics?
Step 4:
• Predictive modeling requires a team approach.
You need people who understand the business
problem to be solved.
• Someone who knows how to prepare data for
analysis. Someone who can build and refine the
models.
• Someone in IT to ensure that you have the right
analytics infrastructure for model building and
deployment.
• And an executive sponsor can help make your
analytic hopes a reality.
Applications

17
18
19
20
ICS422 Applied Predictive
Analytics [3- 0-0-3]

Class 02
21
22
Machine learning
• Machine learning is a sub-field of artificial intelligence (AI)
that provides systems the ability to automatically learn and
improve from experience without being explicitly
programmed.
• For the process of learning (model fitting) we need to have
available some observations or data (also known as
samples or examples) in order to explore potential
underlying patterns, hidden in our data. These learned
patterns are nothing more that some functions or
decision boundaries.
• These patterns are learned by the systems
(computer systems) automatically without human
intervention or input.
• Machine learning algorithms are usually
categorized as supervised or unsupervised.
23
Supervised machine
learning
algorithms/methods
• Given a set of data points {x(1),...,x(m)} associated to
a set of outcomes {y(1),...,y(m)}, we want to build a
classifier that learns how to predict y from x.
• Supervised models can be further grouped into
regression and classification cases:
• Classification: A classification problem is when
the output variable is a category e.g. “disease” /
“no disease”.
• Regression: A regression problem is when the
output variable is a real continuous value e.g. stock
price prediction

24
25
26
Classification: Classify our inputs in
one of the predefined and exhaustive
classes

27
Some of the algorithms
used for classification
• Logistic regression
• Random forest
• Decision tree
• Support vector regressor
• k-nearest neighbors

28
Regression
Regression will not give a class as output
but a specific value also called a
forecast or prediction.

29
Algorithms can be used for
regression
• Linear regression
• Random forest
• Decision tree
• Support vector regressor
• k-nearest neighbors

30
Unsupervised machine
learning
algorithms/methods

31
Contd…
• Unsupervised models can be further grouped
into clustering and association case
• Clustering: A clustering problem is where
you want to unveil the inherent groupings in
the data, such as grouping animals based on
some characteristics/features e.g. number of
legs.
• Association: An association rule learning is
where you want to discover association rules
such as people that buy X also tend to buy Y.

32
Contd…

33
Clustering Algorithms
• k-means clustering
• Hierarchical clustering

34
Semi-supervised machine
learning
algorithms/methods
• This family is between the supervised and unsupervised
learning families. The semi-supervised models use both
labeled and unlabeled data for training.
• Similarly to supervised and unsupervised learning,
semi-supervised learning consists of working with
a dataset.
• However, datasets in semi-supervised learning are split
into two parts: a labeled part and an unlabeled one.
This technique is often used when labeling the data or
gathering labeled data is too difficult or too expensive.
The part of the data labeled can also be of bad quality.

35
Contd…

36
Practical applications of
Semi-Supervised Learning
• Speech Analysis: Since labeling of audio files is a very
intensive task, Semi-Supervised learning is a very
natural approach to solve this problem.
• Internet Content Classification: Labeling each webpage
is an impractical and unfeasible process and thus uses
Semi- Supervised learning algorithms.
• Protein Sequence Classification: Since DNA strands are
typically very large in size, the rise of Semi-Supervised
learning has been imminent in this field.

37
Summary
• Supervised: All the observations in the dataset are
labeled and the algorithms learn to predict the output
from the input data.
• Unsupervised: All the observations in the dataset are
unlabeled and the algorithms learn to inherent
structure from the input data.
• Semi-supervised: Some of the observations of the
dataset are labeled but most of them are usually
unlabeled. So, a mixture of supervised and
unsupervised methods are usually used.

38
Summary
• Using Machine learning (ML) models we are
able to perform analyses of massive quantities
of data.
• Data patterns that would be impossible to
identified by a human being, can be accurately
extracted using these ML models within
seconds (in some cases).
• However, most of the times, accurate results
(good models) usually require a lot of time and
resources for the model training (the
procedure under which the model learns a
function or a decision boundary).
39
How to Choose an
Appropriate Approach?
• The type of problem – With the problem to solve in mind, we’re
going to choose an algorithm that has proven to provide good
results for similar problems
• The number of samples available – In general, the larger the
dataset the better but some algorithms perform well on little
dataset too (e.g Naive Bayes, K-Neighbors Classifier, Linear
SVC, SVR)
• The complexity of the model’s algorithm compared to the
amount of data used to train it – More precisely, if the
algorithm is too complex but has been trained on very few data,
it will be too flexible and may end up overfitting
• The expected accuracy – A machine learning model with low
accuracy can get trained way faster than another one aiming for
minimal loss
40
References

• https://www.sas.com/en_in/insights/analytics/predi
ctive-analytics.html
• https://towardsdatascience.com/what-is-machine-l
earning-a-short-

note-on-supervised-unsupervised-semi-supervised-
and-
aed1573ae9bb
• https://www.baeldung.com/cs/machine-learning-int
ro
• https://rstudio-pubs-
static.s3.amazonaws.com/559023_f62f3bce1be14f
b9b248127194c0c
1e3.html
• https://blogs.nvidia.com/blog/2018/08/02/supervis
Thank you

You might also like