0% found this document useful (0 votes)
649 views8 pages

Predictive Modeling

Predictive modeling is a statistical technique used to analyze historical and current data to create models that predict future outcomes. In predictive modeling, data is collected and used to formulate a statistical model, which is then used to make predictions. As additional data becomes available, the model is validated and potentially revised. Predictive modeling is commonly used in fields like marketing, health insurance, and spam filtering to analyze past data and behaviors to predict future probabilities and trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
649 views8 pages

Predictive Modeling

Predictive modeling is a statistical technique used to analyze historical and current data to create models that predict future outcomes. In predictive modeling, data is collected and used to formulate a statistical model, which is then used to make predictions. As additional data becomes available, the model is validated and potentially revised. Predictive modeling is commonly used in fields like marketing, health insurance, and spam filtering to analyze past data and behaviors to predict future probabilities and trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Predictive modeling is a commonly used statistical technique to predict future

behavior. Predictive modeling solutions are a form of data-mining technology


that works by analyzing historical and current data and generating a model to
help predict future outcomes. In predictive modeling, data is collected, a
statistical model is formulated, predictions are made, and the model is
validated (or revised) as additional data becomes available. For example, risk
models can be created to combine member information in complex ways with
demographic and lifestyle information from external sources to improve
underwriting accuracy. Predictive models analyze past performance to assess
how likely a customer is to exhibit a specific behavior in the future. This
category also encompasses models that seek out subtle data patterns to
answer questions about customer performance, such as fraud detection
models. Predictive models often perform calculations during live transactions
for example, to evaluate the risk or opportunity of a given customer or
transaction to guide a decision. If health insurers could accurately predict
secular trends (for example, utilization), premiums would be set appropriately,
profit targets would be met with more consistency, and health insurers would
be more competitive in the marketplace.

Predictive modeling is a process used in predictive analytics to create a
statistical model of future behavior. Predictive analytics is the area of data
mining concerned with forecasting probabilities and trends.
A predictive model is made up of a number of predictors, which are variable
factors that are likely to influence future behavior or results. In marketing, for
example, a customer's gender, age, and purchase history might predict the
likelihood of a future sale.
In predictive modeling, data is collected for the relevant predictors, a statistical
model is formulated, predictions are made and the model is validated (or
revised) as additional data becomes available. The model may employ a simple
linear equation or a complex neural network, mapped out by sophisticated
software.
Predictive modeling is used widely in information technology (IT). In spam
filtering systems, for example, predictive modeling is sometimes used to
identify the probability that a given message is spam.
Other applications of predictive modeling include customer relationship
management (CRM),capacity planning, change management, disaster
recovery, securitymanagement, engineering, meteorology and city planning.

Predictive modelling
From Wikipedia, the free encyclopedia
Predictive modelling is the process by which a model is created or chosen to
try to best predict the probability of an outcome.
[1]
In many cases the model is
chosen on the basis of detection theory to try to guess the probability of an
outcome given a set amount of input data, for example given
an email determining how likely that it is spam.
Models can use one or more classifiers in trying to determine the probability of
a set of data belonging to another set, say spam or 'ham'.
Models[edit]
Nearly any regression model can be used for prediction purposes. Broadly
speaking, there are two classes of predictive models: parametric and non-
parametric. A third class, semi-parametric models, includes features of both.
Parametric models make specific assumptions with regard to one or more of
the population parameters that characterize the underlying
distribution(s),
[2]
while non-parametric regressions make fewer assumptions
than their parametric counterparts.
[3]

Ordinary Least Square[edit]
Ordinary least squares is a method that minimizes the sum of squared
distance between an observed and its predicted value.
Generalized Linear Models (GLM)[edit]
Generalized linear model is a flexible family of models that are unified under a
single method. Logistic regression is a notable special case of GLM. Other types
of GLM include Poisson regression, Gamma regression, Multinomial regression.
Logistic regression[edit]
Logistic regression is a technique in which unknown values of a discrete
variable are predicted based on known values of one or more continuous
and/or discrete variables.Logistic regression differs from ordinary least squares
(OLS) regression in that the dependent variable is binary in nature. This
procedure has many applications. In biostatistics, the researcher may be
interested in trying to model the probability of a patient being diagnosed with a
certain type of cancer based on knowing, say, the incidence of that cancer in
his or her family. In business, the marketer may be interested in modelling the
probability of an individual purchasing a product based on the price of that
product. Both of these are examples of a simple, binary logistic regression
model. The model is "simple" in that each has only one independent, or
predictor, variable, and it is "binary" in that the dependent variable can take on
only one of two values: cancer or no cancer, and purchase or does not
purchase.
Generalized additive models[edit]
Generalized additive model is a smoothing method for multiple predictors that
allows for non-parametric predictions.
Robust regression[edit]
Robust regression include a number of modelling approaches to handle high
leverage observations or violation of assumptions. Models can be both
parametric (e.g. regression with Huber, White, Sandwich variance estimators)
as well as non-parametric(e.g. quantile regression).
[4]

Semiparametric regression[edit]
Semiparametric regression include proportional odds model and the Cox
proportional hazard model where the response is a rank.



Presenting and Using the Results of a Predictive Model[edit]
Predictive models can either be used directly to estimate a response (output)
given a defined set of characteristics (input), or indirectly to drive the choice of
decision rules.
[5]

Depending on the methodology employed for the prediction, it is often possible
to derive a formula that may be used in a spreadsheet software. This has
some advantages for end users or decision makers, the main one being
familiarity with the software itself, hence a lower barrier to adoption.
Nomograms are useful graphical representation of a predictive model. As in
spreadsheet software, their use depends on the methodology chosen. The
advantage of nomograms is the immediacy of computing predictions without
the aid of a computer.
Point estimates tables are one of the simplest form to represent a predictive
tool. Here combination of characteristics of interests can either be represented
via a table or a graph and the associated prediction read off the y-axis or the
table itself.
Tree based methods (e.g. CART, survival trees) provide one of the most
graphically intuitive ways to present predictions. However, their usage is
limited to those methods that use this type of modelling approach which can
have several drawbacks.
[6]
Trees can also be employed to represent decision
rules graphically.
Score charts are graphical tabular or graphical tools to represent either
predictions or decision rules.
A new class of modern tools are represented by web based applications. For
example, Shiny is a web based tool developed by Rstudio, an R IDE. With a
Shiny app, a modeller has the advantage to represent any which way he or she
chooses to represent the predictive model while allowing the user some control.
A user can choose a combination of characteristics of interest via sliders or
input boxes and results can be generated, from graphs to confidence intervals
to tables and various statistics of interests. However, these tools often require a
server installation of Rstudio.
Applications[edit]
Uplift Modelling[edit]
Uplift Modelling is a technique for modelling the change in probability caused
by an action. Typically this is a marketing action such as an offer to buy a
product, to use a product more or to re-sign a contract. For example in a
retention campaign you wish to predict the change in probability that a
customer will remain a customer if they are contacted. A model of the change
in probability allows the retention campaign to be targeted at those customers
on whom the change in probability will be beneficial. This allows the retention
programme to avoid triggering unnecessary churn or customer
attrition without wasting money contacting people who would act anyway.
Archaeology[edit]
Predictive modelling in archaeology gets its foundations from Gordon Willey's
mid-fifties work in the Vir Valley of Peru.
[7]
Complete, intensive surveys were
performed thencovariability between cultural remains and natural features
such as slope, and vegetation were determined. Development of quantitative
methods and a greater availability of applicable data led to growth of the
discipline in the 1960s and by the late 1980s, substantial progress had been
made by major land managers worldwide.
Generally, predictive modelling in archaeology is establishing statistically valid
causal or covariable relationships between natural proxies such as soil types,
elevation, slope, vegetation, proximity to water, geology, geomorphology, etc.,
and the presence of archaeological features. Through analysis of these
quantifiable attributes from land that has undergone archaeological survey,
sometimes the archaeological sensitivity of unsurveyed areas can be
anticipated based on the natural proxies in those areas. Large land managers
in the United States, such as the Bureau of Land Management (BLM), the
Department of Defense (DOD),
[8][9]
and numerous highway and parks agencies,
have successfully employed this strategy. By using predictive modelling in their
cultural resource management plans, they are capable of making more
informed decisions when planning for activities that have the potential to
require ground disturbance and subsequently affect archaeological sites.
Customer relationship management[edit]
Predictive modelling is used extensively in analytical customer relationship
management and data mining to produce customer-level models that describe
the likelihood that a customer will take a particular action. The actions are
usually sales, marketing and customer retention related.
[10]

For example, a large consumer organisation such as a mobile
telecommunications operator will have a set of predictive models for
product cross-sell, product deep-sell and churn. It is also now more common
for such an organisation to have a model of savability using an uplift model.
This predicts the likelihood that a customer can be saved at the end of a
contract period (the change in churn probability) as opposed to the standard
churn prediction model.
Auto insurance[edit]
Predictive Modelling is utilised in vehicle insurance to assign risk of incidents
to policy holders from information obtained from policy holders. This is
extensively employed inusage-based insurance solutions where predictive
models utilise telemetry based data to build a model of predictive risk for claim
likelihood.
[citation needed]
Black-box auto insurance predictive models
utilise GPS or accelerometer sensor input only.
[citation needed]
Some models include
a wide range of predictive input beyond basic telemetry including advanced
driving behaviour, independent crash records, road history, and user profiles to
provide improved risk models.
[citation needed]

Health care[edit]
In 2009 Parkland Health & Hospital System began analyzing electronic medical
records in order to use predictive modeling to help identify patients at high risk
of readmission. Initially the hospital focused on patients with congestive heart
failure, but the program has expanded to include patients with diabetes, acute
myocardial infarction, and pneumonia.
[11]

Notable failures of predictive modeling[edit]
Although not widely discussed by the mainstream predictive modeling
community, predictive modeling is a methodology that has been widely used in
the financial industry in the past and some of the spectacular failures have
contributed to the financial crisis of 2008. These failures exemplify the danger
of relying blindly on models that are essentially backforward looking in nature.
The following examples are by no mean a complete list:
1) Bond rating. S&P, Moody's and Fitch quantify the probability of default of
bonds with discrete variables called rating. The rating can take on discrete
values from AAA down to D. The rating is a predictor of the risk of default
based on a variety of variables associated with the borrower and macro-
economic data that are drawn from historicals. The rating agencies failed
spectacularly with their ratings on the 600 billion USD mortgage backed CDO
market. Almost the entire AAA sector (and the super-AAA sector, a new rating
the rating agencies provided to represent super safe investment) of the CDO
market defaulted or severely downgraded during 2008, many of which obtained
their ratings less than just a year ago.
2) Statistical models that attempt to predict equity market prices based on
historical data. So far, no such model is considered to consistently make
correct predictions over the long term. One particularly memorable failure is
that of Long Term Capital Management, a fund that hired highly qualified
analysts, including a Nobel Prize winner in economics, to develop a
sophisticated statistical model that predicted the price spreads between
different securities. The models produced impressive profits until a spectacular
debacle that caused the then Federal Reserve chairman Alan Greenspan to step
in to broker a rescue plan by the wall street broker dealers in order to prevent
a meltdown of the bond market.
Possible fundamental limitations of predictive model based on data
fitting[edit]
1) History cannot always predict future: using relations derived from historical
data to predict the future implicitly assumes there are certain steady-state
conditions or constants in the complex system. This is almost always wrong
when the system involves people.
2) The issue of unknown unknowns: in all data collection, the collector first
defines the set of variables for which data is collected. However, no matter how
extensive the collector considers his selection of the variables, there is always
the possibility of new variables that have not been considered or even defined,
yet critical to the outcome.
3) Self-defeat of an algorithm: after an algorithm becomes an accepted
standard of measurement, it can be taken advantage of by people who
understand the algorithm and have the incentive to fool or manipulate the
outcome. This is what happened to the CDO rating. The CDO dealers actively
fulfilled the rating agencies input to reach an AAA or super-AAA on the CDO
they are issuing by cleverly manipulating variables that were "unknown" to the
rating agencies' "sophisticated" models.

You might also like