0% found this document useful (0 votes)
37 views10 pages

MFA-106-Unit IV Predictive Modelling and Analysis-21may2024

MBA notes business analytics

Uploaded by

Rishik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views10 pages

MFA-106-Unit IV Predictive Modelling and Analysis-21may2024

MBA notes business analytics

Uploaded by

Rishik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

21-05-2024

Business Analytics
MFA-106
Praveen Kumar, PhD
Assistant Professor,
University School of Management Studies,
Guru Gobind Singh Indraprastha University, New Delhi

Dr. Praveen Kumar MFA - 106 1

Course Outlines
• Introduction to Business Analytics
• Exploring Data and Data Visualization
• Data preparation and Data Warehousing
• Predictive modelling and analysis and Data reduction
techniques

Dr. Praveen Kumar MFA - 106 2

1
21-05-2024

Agenda
• Predictive modelling and analysis:
• Logic driven modelling,
• Strategies for building predictive models,
• Data driven modelling,
• Supervised learning,
• Regression-simple,
• Multiple and logistic regression.
• Data reduction techniques:
• Principal component Analysis,
• Cluster Analysis: K-Nearest neighbours. (10 hours)

Dr. Praveen Kumar MFA - 106 3

Predictive modelling and analysis

Dr. Praveen Kumar MFA - 106 4

2
21-05-2024

Predictive modelling and analysis


• Predictive modeling: It is the process of applying a statistical model or data
mining algorithm to data for the purpose of predicting new or future observations.
• In particular, Non-stochastic prediction, where the goal is to predict the output
value (Y) for new observations given their input values (X).
• This definition also includes temporal forecasting, where observations until
time t (the input) are used to forecast future values at time t + k, k > 0 (the
output).
• Predictions include point or interval predictions, prediction regions,
predictive distributions, or rankings of new observations.
• Predictive model is any method that produces predictions, regardless of its
underlying approach: Bayesian or frequential, parametric or nonparametric,
data mining algorithm or statistical model, etc.
Dr. Praveen Kumar MFA - 106 5

Predictive modelling and analysis

AIC stands for the Akaike Information Criterion, which is one technique used
to select the best model from a class of models using a penalized likelihood. A
smaller AIC implies a better model.

Dr. Praveen Kumar MFA - 106 6

3
21-05-2024

Predictive modelling and analysis


The great game of
science is modeling
the real world. As in
the model-based
learning one’s
actions are
evaluated with
regard to the
expected outcomes
in accordance with a
"world model”

Dr. Praveen Kumar MFA - 106 7

Predictive modelling and analysis


• The development of new theory often goes hand in hand with the development
of new measures.
• Predictive modeling can be used to discover new measures as well as to
compare different operationalizations of constructs and different measurement
instruments.
• By capturing underlying complex patterns and relationships, predictive modeling
can suggest improvements to existing explanatory models.
• Scientific development requires empirically rigorous and relevant research.
Predictive modeling enables assessing the distance between theory and
practice, thereby serving as a "reality check" to the relevance of theories.
• While explanatory power provides information about the strength of an
underlying causal relationship, it does not imply its predictive power
Dr. Praveen Kumar MFA - 106 8

4
21-05-2024

Predictive modelling and analysis


Two types of data are required within the model development sample:
• Predictor data (predictor variables). This type of data is used to make
predictions, i.e. data that could feature in the predictive model – for example
people’s income and age.
• Behavioral (outcome) data. This is the behavior that we want to predict.
• In order to build a predictive model, the development sample needs to contain
both types of data.
• An appropriate mathematical/statistical technique is then applied to determine
what relationships exist between the predictor data and behavior.
• The relationships that are found are captured in the resulting model.

Dr. Praveen Kumar MFA - 106 9

Predictive modelling and analysis


Predictive analysis, though it is statistically distinct from explanatory analyses,
is a valuable tool for building explanatory models.
Predictive analyses can be used to set benchmarks: to measure how much we
know about an outcome, and to measure the improvement that a new
analysis over its predecessors.
How predictive analysis can lead to insights, such as the length of the memory
process involved.

Dr. Praveen Kumar MFA - 106 10

5
21-05-2024

Predictive modelling: Logic driven modelling


• Linear models: The most widely used type of predictive model. The features of
a linear model are:
• The relationship between each predictor variable and behavior is represented
by a weight.
• The contribution that each predictor variable makes to the model score is
calculated by multiplying the predictor variable by the weight.
• The final model score is calculated by adding up the contribution made by each
predictor variable (the sum of the predictor variables multiplied by the weights).

Dr. Praveen Kumar MFA - 106 11

Predictive modelling: Data driven modelling


• Data-Driven Modelling delves into the cutting-edge ideas, methodologies, and
tools for constructing data-driven models.
• Models are the key to understanding reality and making informed decisions.
They transform raw data into information and produce actionable insights.
• From predicting traffic flow to launching satellites, modelling is woven into our
daily lives.
• Modelling real, complex systems is a skill that modern humans excel in, but
our mathematical and statistical tools are still lagging behind, and navigating
complexity rests a challenge.
Case study of COVID Pandemic

Dr. Praveen Kumar MFA - 106 12

6
21-05-2024

Predictive modelling: Supervised learning


• It uses a training set to teach models to yield the desired output.
• This training dataset includes inputs and correct outputs, which allow the
model to learn over time. The algorithm measures its accuracy through the
loss function, adjusting until the error has been sufficiently minimized.
Supervised learning can be of two types: classification and regression:
Classification uses an algorithm to accurately assign test data into
specific categories.
• It recognizes specific entities within the dataset and attempts to draw
some conclusions on how those entities should be labeled or defined, like
linear classifiers, support vector machines (SVM), decision trees, k-
nearest neighbor, and random forest.
Regression is used to understand the relationship between dependent
and independent variables.
• It is commonly used to make projections, such as for sales revenue for a
given business. Linear regression, logistical regression, and polynomial
regression are popular regression
Dr. Praveen Kumar
algorithms.
MFA - 106 13

Predictive modelling: Regression-simple


• Linear regression: Linear regression is used to identify the relationship between
a dependent variable and one or more independent variables and is typically
leveraged to make predictions about future outcomes.
• When there is only one independent variable and one dependent variable, it is
known as simple linear regression.
• As the number of independent variables increases, it is referred to as multiple
linear regression.
• For each type of linear regression, it seeks to plot a line of best fit, which is
calculated through the method of least squares.
• However, unlike other regression models, this line is straight when plotted on a
graph.

Dr. Praveen Kumar MFA - 106 14

7
21-05-2024

Predictive modelling: Multiple and logistic


regression
• Logical regression or the logit model is a probability model, developed by the
statistician D. R. Cox in 1958. It measures the relation between dependent
categorical variables and one or more independent variables, which are
generally, but not mandatory, continuous, by estimating the probabilities of
events associated to categorical variables.
• The logistic regression a special case of generalized linear models.
• Logistic regression: While linear regression is leveraged when dependent
variables are continuous,
• logistic regression is selected when the dependent variable is categorical, they
have binary outputs, such as "true" and "false" or "yes" and "no."
• While both regression models seek to understand relationships between data
inputs, logistic regression is mainly used to solve binary classification problems,
such as spam identification.
Dr. Praveen Kumar MFA - 106 15

Predictive modelling: Multiple and logistic


regression
Logistic Regression: It is a classification algorithm.
• It is used to predict a binary outcome (1/0, Yes/No, True/False) given a set of
independent variables.
• To represent the binary/categorical outcome, we use dummy variables.
• It is a special case of linear regression when the outcome variable is
categorical, where we are using a log of odds as the dependent variable.
• It predicts the probability of occurrence of an event by fitting data to a logit
function.
Logistic Regression sees application in:
Handwriting Recognition Budget Restructuring
Software Cost Prediction Credit Scoring

Dr. Praveen Kumar MFA - 106 16

8
21-05-2024

Data reduction techniques


• Business analytics bridges the gap between information technology and
business by using analytics to provide data-driven recommendations.
• The business part requires deep business understanding, while the analytics
part requires an understanding of data, statistics and computer science.
• The number of input features, variables, or columns present in a given
dataset is known as dimensionality, and the process to reduce these features
is called dimensionality reduction.
• A dataset contains a huge number of input features in various cases, which
makes the predictive modeling task more complicated.
• It is very difficult to visualize or make predictions for the training dataset with a
number of features, dimensionality reduction techniques are required to use.

Dr. Praveen Kumar MFA - 106 17

Data reduction techniques


• K-Means Clustering: It is a type of unsupervised learning, which is used when a
user has unlabeled data (i.e., data without defined categories or groups).
• The goal of this algorithm is to find groups in the data, with the number of groups
represented by the variable K.
• The algorithm works iteratively to assign each data point to one of K groups
based on the features that are provided.
• Data points are clustered based on feature similarity.
This model sees application in:
Market Segmentation Computer Vision
Geostatistics Astronomy
Agriculture Many others
Dr. Praveen Kumar MFA - 106 18

9
21-05-2024

Data reduction techniques


• K-nearest neighbor: K-nearest neighbor, also known as the KNN algorithm, is a
non-parametric algorithm that classifies data points based on their proximity
and association to other available data.
• This algorithm assumes that similar data points can be found near each other.
• As a result, it seeks to calculate the distance between data points, usually
through Euclidean distance, and then it assigns a category based on the
most frequent category or average.
• Its ease of use and low calculation time make it a preferred algorithm by data
scientists, but as the test dataset grows, the processing time lengthens, making
it less appealing for classification tasks.
• KNN is typically used for recommendation engines and image recognition.

Dr. Praveen Kumar MFA - 106 19

Dr. Praveen Kumar MFA - 106 20

10

You might also like