0% found this document useful (0 votes)

168 views36 pages

DADS302 Unit-05

This document discusses predictive analysis, including its objectives, working process, model types, applications and tools. Predictive analysis uses techniques like artificial intelligence, data mining and machine learning to identify patterns in past and present data and make predictions about future outcomes. The key steps involve defining goals, collecting and cleaning data, performing deep data analysis to identify patterns, building and selecting the best predictive model, deploying the model and continuously monitoring its performance. Justifications for using predictive analysis include finding fraud, optimizing marketing campaigns, and improving business operations.

Uploaded by

rubhakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

168 views36 pages

DADS302 Unit-05

Uploaded by

rubhakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

MASTER OF BUSINESS ADMINISTRATION

SEMESTER 3

DADS302
EXPLORATORY DATA ANALYSIS

Unit 5: Predictive Analysis 1

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Unit 5
Predictive Analysis

Table of Contents

SL Fig
Topic SAQ/Activity PG NO
NO No/Table/Graph
1 Introduction - -
3–4
1.1 Objectives - -
2 Working 1 - 5–7
What Justifies The Use Of Predictive
3 - - 8
Analysis
4 Model Types 2 - 9 – 11
5 Applications - 12 – 13
6 Tools For Predictive Analysis 3 - 14
7 Regression 4,5 1 15 – 19
8 Regression Line Fitting – Python Code 6-16 - 20 – 26
9 Activity - - 27
10 Summary - - 27
11 Glossary - - 28
12 Concept Map 17 - 28
13 Study Notes & Did You Know - - 28 – 29
14 Case Study 18-21 - 29 – 32
15 Terminal Questions - - 32
16 Self-Assessment Answers - - 32
17 Terminal Questions Answers - - 32 – 36
18 References - - 36

Unit 5: Predictive Analysis 2

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1. INTRODUCTION

➔ A type of technology called predictive analytics generates forecasts regarding some

future unknowns. It uses a variety of methodologies, including artificial intelligence
(AI), data mining, machine learning, modelling, and statistics to arrive at these
conclusions.

➔ For instance, data mining is analysing big data sets to find patterns in them. The same
is done using text analysis, but not for lengthy passages of text.

➔ With predictive analytics, data patterns in the past and present are examined to see
if they are likely to recur. Additionally, operational savings and risk reduction can be
increased through predictive analysis.

➔ Data mining, predictive modelling, and machine learning are all included in predictive
analytics, which is a combination of several statistical technologies and approaches.
In order to anticipate the future, this process thoroughly examines both historical and
current data. Predictive analytics goal is not merely to understand the past, but also
to make a more accurate assessment of what might occur in the future.

➔ Predictive analytics objective, as we all know, is to identify patterns and trends in the
past and present in order to make predictions about the future. There are companies
that have created proprietary, allegedly industry-specific, predictive analytics
solutions. Other firms, however, rely on outside solutions to be proficient in
predictive analytics.

There is a methodical flow that you may use to carry out predictive analytics projects in
either situation. There are seven steps to this.

Unit 5: Predictive Analysis 3

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1.1 Objectives

Following your study of this chapter, you should be able to:

❖ Define the meaning of Predictive analysis

❖ List out the working, uses, model types, benefits, criticism, challenges, applications and
tools.
❖ Describe Regression and its performance including why it is necessary.
❖ Build regression line fitting - python

Unit 5: Predictive Analysis 4

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1. WORKING

Fig 1: Working

1. Initiative Definition
Understanding the main goal of carrying out a predictive analytics project is a crucial
component of project definition. Clarity is required when answering queries like, "What are
you trying to model?" Organizations will be able to acquire the correct value-driver from this
endeavour by seeking answers to these questions.

2.Collection of Data

Data is the only requirement that predictive analytics needs in order to function effectively.
To execute the appropriate algorithms, however, you require a sizable number of data for
slicing and dicing. There shouldn't be a problem gathering data if a documented method for
doing so already exists.

Unit 5: Predictive Analysis 5

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

On the other hand, firms without such systems must first put up a data aggregation tool to
assist them in gathering raw data from various sources.

3.Cleaning of Data

It is crucial for the company to clean and sanitise the data before you begin your
investigation. Data from many sources must be combined into a single, complete database
with consistent formatting as part of the cleaning process. In order to receive the correct
value-driver from this endeavour, this will guarantee that the analysis utilising the predictive
analytics technology is effective.

4.Deep Data Analysis

The analysis of the data begins with this procedure. The goal of this step is to identify
patterns and trends in the data and use that knowledge to build prediction models that can
show what will happen in the future. There are primarily two techniques for carrying out
such in-depth data analysis.

1. Statistical Regression Methods

2. Machine Learning Techniques

5.Building a Model

Organizations can begin developing a predictive model for forecasting future events
once the cleaned data has been thoroughly reviewed. The software programme will generate
a number of models; therefore, the organisation must choose the best (in terms of accuracy)
to forecast occurrences.

6.Deployment

The chosen model must then be put into use on a daily basis after the models have
been developed and refined. Daily use is connected to the project definition from step 1 once
more. For instance, if the model is used to forecast data security events by examining
computer event logs, it must be used to monitor ongoing operations and produce reports of

Unit 5: Predictive Analysis 6

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

potential security gaps in order to prevent security breaches. Because of the model chosen,
businesses may be able to resolve some challenges proactively.

7.Monitoring
After the models are put into use, it is crucial to continuously monitor them and make any
necessary course modifications. Unchecked over-reliance on this strategy has the potential
to be disastrous.

Unit 5: Predictive Analysis 7

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

3. WHAT JUSTIFIES THE USE OF PREDICTIVE ANALYSIS

Predictive analytics is valued by businesses for a variety of reasons, including its
ability to solve challenging problems and identify new opportunities. Let's examine some of
the main justifications for the importance of predictive analytics.

1. Finding fraud
Combining different models can aid in the identification of any suspicious tendencies
and the prevention of illegal activity. High-performance predictive analytics solutions may
examine the network in real-time to look for any fraud-causing irregularities as
cybersecurity concerns continue to develop.

2.Marketing Campaign Optimization

Businesses utilise it to comprehend customer purchasing patterns, resulting in
chances for upselling and cross-selling.

3.Optimization of Operations
Inventory forecasting and resource management is another example that is used to
illustrate the importance. For example, airlines utilise to determine the cost of their tickets.
In order to determine the maximum occupancy, hotels can forecast their expected
customers, which increases revenue.

Key Lessons :
1. Predictive analytics forecasts future performance using statistics and modelling strategies.
2. Predictive approaches are used in fields and industries like marketing and insurance to
make crucial choices.
3. Predictive models support the creation of investment portfolios, video game development,
voice-to-text message translation, and customer service judgments.
4. Despite the fact that machine learning and predictive analytics are two distinct fields,
people frequently mix them up.
5. Decision trees, regression, and neural networks are a few examples of predictive
modelling types.

Unit 5: Predictive Analysis 8

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

4. MODEL TYPES

Fig 2: Model Types

Decision Tree
• Decision trees may be helpful if you want to understand how someone makes
decisions. This kind of model divides the data into various groups according to
various factors, such as price or market capitalization. It resembles a tree, complete
with individual branches and leaves, as its name suggests. Branches represent the
options, while individual leaves stand for a specific choice.

• The simplest models are decision trees because they are the most straightforward to
comprehend and analyse. When you have to make a decision quickly, they are also
incredibly helpful.

Regression
The most often used model in statistical analysis is this one. Use it when there is a linear
relationship between the inputs and you want to find patterns in vast data sets. The formula
describing the relationship between all the inputs in the dataset is determined by this
method. Regression can be used, for instance, to determine how a security's performance
may be influenced by the price and other important variables.

Unit 5: Predictive Analysis 9

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Neural Networks
As a sort of predictive analytics, neural networks were created by modelling the functioning
of the human brain. Using artificial intelligence and pattern recognition, this model is capable
of handling complex data interactions. Use it if you need to overcome a number of obstacles,
such as when you have an excessive amount of data available, when you lack the necessary
formula to help you identify a relationship between the inputs and outputs in your dataset,
or when you need to make predictions rather than provide an explanation.

Predictive Analysis Benefits :

● The use of predictive analysis has many advantages. Since there are no alternative
(and obvious) solutions accessible, applying this kind of analysis can be helpful to
entities when they need to make predictions regarding outcomes.

● Models can be used by investors, financial experts, and company executives to assist
lower risk. For instance, by taking certain criteria into mind, such as age, capital, and
aspirations, an investor and their advisor can utilise certain models to help create an
investment portfolio with minimal risk to the investor.

● When models are utilised, cost savings is significantly impacted. A product's chance
of success or failure can be predicted by businesses prior to its release. Alternately,
they can reserve funds in front of the manufacturing process by employing predictive
strategies for production enhancements.

Predictive Analytics Criticism :

● Due to perceived disparities in its results, the use of predictive analytics has been
questioned and, in some circumstances, legally limited. Predictive models are most
frequently involved in this, which leads to statistical discrimination against racial or

Unit 5: Predictive Analysis 10

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

ethnic groups in areas like credit scoring, mortgage lending, employment, or

likelihood of criminal conduct.

● The (now prohibited) practise of redlining by banks in home loans is a well-known

illustration of this. Whether or if the conclusions reached via the use of such analytics
are accurate, their usage is typically discouraged, and data that expressly include
information like a person's race are now frequently removed from predictive
analytics.

Challenges

Fig 3: Use cases

Unit 5: Predictive Analysis 11

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

5. APPLICATIONS

1. Early Allergic Reaction Detection in Healthcare

The healthcare sector is another illustration of how algorithms are used for quick, predictive
analyses for prevention. In collaboration with the KeepSmilin4Abbie Foundation, the
Harvard University Wyss Institute created a wearable device that can detect anaphylactic
allergic reactions and provide life-saving epinephrine automatically.

2. Finance : Future Cash Flow Forecasting

Every company must maintain regular financial records, and predictive analytics can be
quite helpful in predicting the future health of your company. You can predict sales, income,
and expenses to create a picture of the future and make decisions by using historical data
from prior financial statements as well as data from a larger industry.

3. Identifying Staffing Needs in the Entertainment & Hospitality Sector

The use of predictive analytics by casino and hotel operator Caesars Entertainment to
ascertain the venue's staffing requirements at various periods is one example covered in
Business Analytics.

Customer inflow and outflow in the entertainment and hospitality industries depend on a
number of variables, all of which affect how many employees a venue or hotel needs at any
given time. Understaffing could lead to a poor customer experience, overworked personnel,
and expensive errors while overstaffing costs money.
A team created a multivariate regression model that took into account several characteristics
to forecast the number of hotel check-ins on a certain day. Caesars was able to staff its hotels
and casinos to the best of its capacity and prevent overstaffing because to this technique.

4. Behavioral Targeting in Marketing

• Consumer data is widely available in marketing and is used to develop content,
adverts, and strategies that are more effective at reaching potential customers where

Unit 5: Predictive Analysis 12

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

they are. Predictive analytics is the process of looking at past behavioural data and
applying it to make future predictions.

• In marketing, predictive analytics can be used to anticipate seasonal sales trends and
organise promotions accordingly.

5. Manufacturing: Preventing Malfunction

• While the aforementioned examples employ predictive analytics to make decisions

based on probable outcomes, you can also use predictive analytics to stop undesirable
or destructive events from happening. For instance, algorithms can be taught using
past data in the manufacturing industry to precisely forecast when a piece of
machinery is going to break down.

• When the conditions for an impending malfunction are satisfied, the algorithm is
activated to notify a worker who can stop the device, possibly save the business
thousands, if not millions, of dollars in lost revenue from damaged goods and repair
expenses. Instead of forecasting malfunction scenarios months or years in advance,
this study provides real-time predictions.

Some algorithms even suggest solutions and improvements to prevent future issues and
boost productivity, saving resources like time, money, and effort. Prescriptive analytics, of
which this is an illustration, are frequently used in conjunction with other forms of analytics
to address problems.

Unit 5: Predictive Analysis 13

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

6. TOOLS FOR PREDICTIVE ANALYSIS

Platform for data science

Through the provision of tools for data preparation and model building anywhere utilizing
open source code or visual modelling, data science platform IBM Watson Studio aids in the
operationalization of AI.

Software for statistical analysis

IBM SPSS Statistics uses ad hoc analysis, hypothesis testing, geographic analysis, and
predictive analytics to address business and research issues.

Tool for visual modelling

With full algorithms and models that are ready for use right away, the IBM SPSS Modeler
solution may assist you in taking advantage of data assets and cutting-edge applications.

Solutions for decision optimization

By providing prescriptive analytics capabilities to supplement predictive insights from
machine learning models, IBM Decision Optimization improves results.

Unit 5: Predictive Analysis 14

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

7. REGRESSION
• Regression looks for connections between different variables. You may, for instance,
watch multiple workers at one company to see how their pay varies depending on factors
like experience, education, role, location, and so on.

• This is a regression problem where each employee's data corresponds to a single

observation. It is assumed that experience, education, role, and city are independent
characteristics whereas income is based upon each of them.

• Regression analysis often involves the consideration of an interesting phenomenon and

a number of observations. There are at least two features in each observation. You
attempt to build a relationship between them under the presumption that at least one of
the traits depends on the others.

To put it another way, you need to find a function that accurately maps some traits or
variables to others.

• The dependent variables, outputs, or reactions are referred to as the dependent

features. Independent variables, inputs, regressors, or predictors are all terms used
to refer to the independent properties.

When Is Regression Necessary?

• Regression is typically used to determine whether and how one phenomenon affects
another or how numerous variables are connected. You can use it, for instance, to
ascertain whether and how much experience and gender affect earnings.

• When attempting to forecast a response using a brand-new set of factors, regression

is also helpful. Consider attempting to forecast a household's electricity usage for the
upcoming hour given the outdoor temperature, the time of day, and the number of
occupants in that household.

Unit 5: Predictive Analysis 15

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Linear Regression
One of the most significant and often employed regression techniques is likely linear
regression. It's one of the easiest regression techniques. The results are simple to interpret,
which is one of its key benefits.

Performance of Regression
• Due in part to the reliance on the predictors xi, the actual answers yi, I = 1,..., n vary.
However, the output also has an additional intrinsic volatility.

• The coefficient of determination, abbreviated as R2, indicates how much of the

variance in y can be accounted for by its dependency on x when a specific regression
model is used. A better fit and the ability of the model to account for how the output
varies with varied inputs are indicated by a bigger R2.

• SSR = 0 is the equivalent as R2 = 1. Given that the values of the projected and actual
answers are a perfect fit, that fits the data perfectly.

Simple Linear Regression

The simplest type of linear regression is simple or single-variate linear regression since only
one independent variable, x = x, is involved.

Unit 5: Predictive Analysis 16

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Simple linear regression is shown in the following figure:

Fig 4: Regression
Underfitting
When a model, typically as a result of its inherent simplicity, is unable to effectively represent
the dependencies among data, underfitting occurs. When used with new data, it frequently
produces a low R2 with known data and poor generalisation skills.

Overfitting
When a model picks up on both random fluctuations and data dependencies, overfitting
results. In other words, a model becomes too adept at learning from the available data.
Overfitting frequently occurs in complex models, which have numerous features or terms.
When used with known data, these models typically produce high R2. However, when
applied to fresh data, they frequently don't generalise well and have considerably lower R2.

Unit 5: Predictive Analysis 17

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

The underfitted, well-fitted, and overfitted models are depicted in the

following figure:

Fig 5: Underfitting, overfitting

PACKAGES FOR PYTHON

• A basic scientific Python library called NumPy enables a wide range of high-
performance operations on both single-dimensional and multidimensional arrays.
Numerous mathematical operations are also provided. Naturally, it is open-source.

Unit 5: Predictive Analysis 18

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

• A popular Python machine learning library called scikit-learn was created on top of
NumPy and a few additional libraries. It offers the tools for data preprocessing,
dimensionality reduction, regression implementation, classification, clustering, and
more. Scikit-Learn is also open-source, just like NumPy.

Self-Assessment Questions - 1

1. __________type of question is answered by predictive analysis.

2. List out any 2 packages for prediction.

Unit 5: Predictive Analysis 19

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

8. REGRESSION LINE FITTING – PYTHON CODE

When using linear regression, there are five fundamental steps to follow:
1. Import the classes and packages you require.
2. Provide data to work with, then do the necessary transformations.
3. Develop a regression model and fit the results to the data.
4. Verify the model fitting findings to see whether it is adequate.
5. Use the model to make forecasts.

For the vast majority of regression methodologies and implementations, these phases are
more or less generic. You'll discover how to carry out these measures throughout the
tutorial's remaining sections for a variety of situations.

Step 1: Import packages and classes

The class LinearRegression from sklearn.linear model and the package numpy must first be
imported

Fig 6: importing Packages

The numpy.ndarray array type is the base data type for NumPy. The remainder of this
tutorial refers to instances of the numpy.ndarray class as arrays.

You'll employ the sklearn.linear model class.

To execute linear and polynomial regression and provide appropriate predictions, use
LinearRegression.

Unit 5: Predictive Analysis 20

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Step 2: Providing Data

Fig 7: Data
The second step is defining data to work with. The inputs (regressors, 𝑥) and output
(response, 𝑦) should be arrays or similar objects. This is the simplest way of providing data
for regression:

The input array (x) and the output array (y) are now both arrays. This array must be two-
dimensional, or more specifically, it must have one column and as many rows as are required.
As a result, you should use the.reshape() method on x. That is exactly what the.reshape()
parameter (-1, 1) specifies.

Currently, x and y look like this,

Fig 8: arrays
Step 3: Create model and fitting it

The following step is to build a linear regression model and fit the data to the model.
Create a LinearRegression class instance to represent the regression model:

Unit 5: Predictive Analysis 21

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

The variable model is created as an instance of LinearRegression by this expression.

LinearRegression accepts a number of optional options, including:

● A Boolean value called fit_intercept determines whether to calculate the intercept b0

or, if False, whether to treat it as zero. Default value is True.
● If the Boolean value of normalise is True, the input variables will be normalised. If it
is set to False by default, the input variables are not normalised.
● The Boolean value copy_X determines whether to copy (True) or replace the input
variables (False). It is automatically True.
● It's either an integer or None for n_jobs. It shows how many jobs are employed in
parallel computation. None, the default value, often denotes one task. -1 indicates that
all processors are being used.

Our model, as described above, makes use of all parameter default settings.
Now is the time to use the model. You must first call.fit() on the model:

Fig 9: Model fitting

Using the current input and output, x and y, as the parameters, you can use the function.fit()
to get the ideal values of the weights b0 and b1. Or to put it another way,.fit() fits the model.
Self, the variable model itself, is the result. Because of this, you can substitute the following
sentence for the previous two:

Unit 5: Predictive Analysis 22

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 10: Regression()

Step 4: Getting results

Once your model has been fitted, you may use the findings to determine whether it functions
as expected and to understand it.

.score() applied on model returns the coefficient of determination, R2, which is:

Fig 11: Results

The predictor x and answer y are also arguments when using.score(), and R2 is the result.

The model's properties are.intercept_, which stands for the coefficient b0, and.coef_, which
stands for b1:

Fig 12: Intercepts

The code above shows how to obtain b0 and b1. You'll see that while.coef_ is an
array,.intercept_ is a scalar.

Unit 5: Predictive Analysis 23

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

B0 has a value of around 5.63. This demonstrates how, when x is zero, your model correctly
predicts the answer of 5.63. When x is increased by one, the projected response increases by
0.54, according to the value of b1 = 0.54.

You'll see that y is also available as a two-dimensional array. You'll receive a similar outcome
in this situation. Here's how it might appear:

Fig 13: Slope

This example, as you can see, is quite similar to the previous one, except in this
case,.intercept_ is a one-dimensional array with a single element, b0, and.coef_ is a two-
dimensional array with a single element, b1.

Step 5: Determine the outcome

Once you have a model that works well, you may use it to make predictions using old or new
data. Use.predict() to get the predicted response.

Fig 14: Predicted response

You supply the regressor as the input when using.predict() to obtain the associated
predicted response. This method of anticipating the response is remarkably similar:

Unit 5: Predictive Analysis 24

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 15: model intercept

In this instance, you would multiply each component of x by model.coef_ and then add
model.intercept_ to the result.

Only the output's dimensions are different from the previous example. The predicted
response has changed from having a single dimension to a two-dimensional array.

Both methods will get the same outcome if you reduce the number of dimensions in x to one.
To achieve this, multiply x by model and then substitute x.reshape(-1), x.flatten(), or x.ravel()
for x. coef_.

Regression models are frequently used in practise to make forecasts. So you may calculate
the outputs based on fresh inputs using fitted models:

Unit 5: Predictive Analysis 25

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 16: Result

Here . The new response, y new, is produced by applying .predict() to the new regressor, x
new. In order to create an array with the entries 0, inclusive, up to but excluding 5, that is,
0, 1, 2, 3, and 4, this example neatly uses numpy's arange() function.

Unit 5: Predictive Analysis 26

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

9. ACTIVITY

ACTIVITY A
In the past ten years, corporations have increasingly used predictive programming. Businesses use
predictive programming to find problems and opportunities, anticipate customer behaviour and
trends, and improve decision-making. One of the frequent use cases that highlights the significance
of predictive modelling in machine learning is fraud detection. Real-time remote analytic capabilities
can enhance fraud detection scenarios and boost security effectiveness.

……………………………………………………………………………………………………………………
………………….

The objective is to merge various data sets, identify anomalies, and stop illegal activity.

10. SUMMARY

A type of technology called predictive analytics generates forecasts regarding some future
unknowns. It uses a variety of methodologies, including artificial intelligence (AI), data
mining, machine learning, modelling, and statistics to arrive at these conclusions. Data
mining, predictive modelling, and machine learning are all included in predictive analytics,
which is a combination of several statistical technologies and approaches. In order to
anticipate the future, this process thoroughly examines both historical and current data.
Predictive analytics is valued by businesses for a variety of reasons, including its ability to
solve challenging problems and identify new opportunities. The simplest type of linear
regression is simple or single-variate linear regression since only one independent variable,
x = x, is involved.

Unit 5: Predictive Analysis 27

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

11. GLOSSARY

Data Mining - Large data sets are sorted through in data mining in order to find patterns and
relationships that may be used in data analysis to assist solve business challenges.

Decision Tree - A decision tree is a graph that use the branching approach to show each
potential result for a certain input.

12. CONCEPT MAP

Fig 17: Predictive analysis

13. STUDY NOTES & DID YOU KNOW

Big data is frequently mentioned in relation to predictive analytics. For instance, engineering
data is gathered through global sensors, instruments, and networked systems. Transactional
data, sales figures, client complaints, and marketing data are examples of business system
data that can be found at a corporation. On the basis of this rich source of data, organisations
make data-driven decisions more frequently.

Unit 5: Predictive Analysis 28

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

90% of the data in the global datasphere is duplicated, and 0% is original. According to one
of the articles posted on CIO, between 80 and 90% of the data in the global digital realm is
unstructured. To download all the data from the internet now, a user would need 181 million
years.

14. CASE STUDY

In this article, to understand more about predictive analysis, I'll walk you through a Python
machine learning project that predicts home prices.

Let's begin by importing the dataset and the required Python libraries:

Fig 18 : Loading dataset

There are 10 attributes in the collection, and each row represents a district. Let's now
employ the info() method to quickly describe the data, including the total number of rows,
the nature of each attribute, and the proportion of non-zero values

Unit 5: Predictive Analysis 29

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 19: Getting the information of the data

The dataset has 20,640 occurrences. Due to the fact that there are only 20,433 non-zero
values for the total bedrooms attribute, 207 districts do not have any values. Later, we will
have to address that.

With the exception of the ocean proximity parameter, all attributes are numeric. Since it is
an object, any sort of Python object may be contained in it. Using the value counts() method,
you can determine which categories are present in that column and how many districts fall
under each category.

housing.ocean_proximity.value_counts()

Plotting a histogram for each number attribute is another easy approach to determine the
type of data you're working with

Unit 5: Predictive Analysis 30

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 20:Visualization

As most median income values cluster between 1.5 and 6, but some median income extends
way beyond 6, let's take a deeper look at the histogram of median income.

It is crucial to have enough instances of each stratum in your dataset; otherwise, the
estimation of a stratum's value may be skewed. This means that each layer should be large
enough and that there shouldn't be too many strata.

Unit 5: Predictive Analysis 31

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 21: Median score

15. TERMINAL QUESTIONS

Short answer Type Questions:

1. List out the types of Predictive Analysis Model.
2. Define Regression.
Long answer type Questions:
1. Explain any four applications of predictive analysis.
2. Compute regression line fitting by python code.

16. SELF-ASSESSMENT ANSWERS

1. What can
2. Numpy and Pandas

17. TERMINAL QUESTIONS ANSWERS

Short Answer Type:
1. List out the types of Predictive Analysis Model.
• Decision Tree
• Regression

Unit 5: Predictive Analysis 32

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

• Neural Networks
2. Define Regression.
A statistical method called regression links a dependent variable to one or more
independent (explanatory) variables. A regression model can demonstrate whether changes
in one or more of the explanatory variables are related to changes in the dependent variable.

Long Answer Type:

1. Explain any four applications of predictive analysis.
Applications :

➔ Forecasting consumer Behaviour :

The ability to forecast consumer behaviour in the retail sector is one of the main applications
of predictive analytics. Businesses utilise cutting-edge analytics to determine customer
buying patterns based on past purchases.
A good illustration is Walmart. It used earliest data to understand purchasing trends under
particular circumstances. Small e-tailers can utilise predictive analytics at the moment of sale
to foresee customer purchasing patterns. It helps to have a deeper and more thorough
understanding of your clients.

➔ Detection of Fraud :
As concerns about cybersecurity rise, there are numerous examples of predictive analytics.
Fraud detection is the most crucial. These models may notice odd behaviour and identify
system anomalies to pinpoint hazards.
For instance, experts can provide the system with previous data regarding cyberthreats and
risks. The appropriate staff will receive a notification when the predictive analytics
programme spots anything similar. It will limit access for hackers and gaps that might put
the system in danger.

Unit 5: Predictive Analysis 33

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

➔ A Medical Diagnosis :
The healthcare industry benefits the most from the predictive analysis module. Health
information is crucial to fully understand the medical past and present of any patient.
Predictive analytics models contribute to the knowledge of the problem by providing an
accurate diagnosis based on historical data.
Using certain health metrics, predictive analytics helps practitioners determine the
underlying causes of diseases. As a result, they have rapid analytics, which enables them to
start making treatments right away. The propagation of negative health effects can be halted
using predictive analytics algorithms.

➔ Content Suggestion :
One of the most accessible and apparent predictive analytics examples is content suggestion.
Entertainment firms can forecast what viewers will watch based on their past behaviour
through algorithms and models.
What businesses employ predictive analytics, you ask? The most appropriate response is
Netflix. The entertainment company makes recommendations to customers for material
based on genre, keywords, ratings, and other factors using predictive algorithms. The
intelligent system predicts user behaviour using extremely sophisticated analytics.

Unit 5: Predictive Analysis 34

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

2.Compute regression line fitting by python code.

Fig 22: Packages & importing

Fig 23: Predicting response vector

Unit 5: Predictive Analysis 35

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

OUTPUT :

Fig 24: Output

18. REFERENCE
• https://www.techfunnel.com/hr-tech/predictive-analytics/
• https://www.ibm.com/analytics/predictive-analytics
• https://www.investopedia.com/terms/p/predictive-analytics.asp
• https://www.geeksforgeeks.org/linear-regression-python-implementation/
• https://thecleverprogrammer.com/2020/12/29/house-price-prediction-with-python/
• https://blogs.sap.com/2021/07/09/7-real-world-use-cases-of-predictive-
analytics/#:~:text=There%20are%20countless%20examples%20of,%2C%20healthcare%2C
%20and%20many%20more.&text=One%20of%20the%20biggest%20uses,learn%20all%20a
bout%20their%20customers.

Unit 5: Predictive Analysis 36

Predictive Analytics Complete Notes
100% (1)
Predictive Analytics Complete Notes
82 pages
Predictive Analytics
100% (2)
Predictive Analytics
344 pages
Chapter 6 Introduction To Predictive Analytics
100% (1)
Chapter 6 Introduction To Predictive Analytics
46 pages
COATES, Joseph - What Futurists Believe - 1989
100% (1)
COATES, Joseph - What Futurists Believe - 1989
335 pages
Updated Lecture Zero Int234 1
No ratings yet
Updated Lecture Zero Int234 1
44 pages
Equity Analysis of SBI Bank-1
No ratings yet
Equity Analysis of SBI Bank-1
69 pages
Project Report Format Predictive Analytics
0% (1)
Project Report Format Predictive Analytics
30 pages
Enterprise Planning Budgeting Cloud Service (EPBCS)
No ratings yet
Enterprise Planning Budgeting Cloud Service (EPBCS)
31 pages
Predictive Analytics Seminar Report
100% (3)
Predictive Analytics Seminar Report
10 pages
Trade Faq Eng PDF
100% (1)
Trade Faq Eng PDF
38 pages
Unit 5
No ratings yet
Unit 5
19 pages
Decision Making - Blabla
100% (1)
Decision Making - Blabla
46 pages
Cma Data 2
No ratings yet
Cma Data 2
16 pages
Pa Digital Notes
No ratings yet
Pa Digital Notes
112 pages
Basics of Predictive Modeling
No ratings yet
Basics of Predictive Modeling
11 pages
North America Contract Furniture Market 2017 - 2030 - Update
No ratings yet
North America Contract Furniture Market 2017 - 2030 - Update
99 pages
What Is Predictive Analytics
No ratings yet
What Is Predictive Analytics
5 pages
Bda Unit 5
No ratings yet
Bda Unit 5
30 pages
Predictive Analytics - Chapter 1 PDF
No ratings yet
Predictive Analytics - Chapter 1 PDF
10 pages
BM PDF
No ratings yet
BM PDF
307 pages
Unit-5 Bda
No ratings yet
Unit-5 Bda
21 pages
Predictive Analytics
No ratings yet
Predictive Analytics
10 pages
175 Wolniak, Grebski 1
No ratings yet
175 Wolniak, Grebski 1
19 pages
Statement of Account: Credit Limit Rs Available Credit Limit Rs
No ratings yet
Statement of Account: Credit Limit Rs Available Credit Limit Rs
3 pages
Video Report
No ratings yet
Video Report
13 pages
MBBSBDS2018 PDF
No ratings yet
MBBSBDS2018 PDF
271 pages
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
No ratings yet
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
10 pages
Ch01 ICS422 01
No ratings yet
Ch01 ICS422 01
42 pages
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
No ratings yet
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
32 pages
Predictive Analytics
No ratings yet
Predictive Analytics
47 pages
Chapter 3 (MGT Concepts and Practices)
No ratings yet
Chapter 3 (MGT Concepts and Practices)
65 pages
2 - The Forecaster's Toolbox-ClassNotes
No ratings yet
2 - The Forecaster's Toolbox-ClassNotes
25 pages
Navan Eeth Seminar
No ratings yet
Navan Eeth Seminar
34 pages
Predictive Modeling
No ratings yet
Predictive Modeling
27 pages
What Are The Steps Involved in Manpower Planning?
100% (2)
What Are The Steps Involved in Manpower Planning?
10 pages
PA Module1.4.1
No ratings yet
PA Module1.4.1
20 pages
Pre Production Planning
No ratings yet
Pre Production Planning
7 pages
Fundamentals of Predictive Analytics A Business Analytics Course
No ratings yet
Fundamentals of Predictive Analytics A Business Analytics Course
36 pages
101-102 Predictive Analytics in Business Decision-Making
No ratings yet
101-102 Predictive Analytics in Business Decision-Making
15 pages
Unit - III - PREDICTIVE ANALYTICS
No ratings yet
Unit - III - PREDICTIVE ANALYTICS
28 pages
Dagne Regassa Assignment 6.1 POM 2015 DDU
No ratings yet
Dagne Regassa Assignment 6.1 POM 2015 DDU
50 pages
Solution Analyzing A Forecasting Data Source
100% (1)
Solution Analyzing A Forecasting Data Source
5 pages
Tools and Techniques For Predictive Analytics For Project Risk Management
No ratings yet
Tools and Techniques For Predictive Analytics For Project Risk Management
25 pages
Ch-3 Master Scheduling
No ratings yet
Ch-3 Master Scheduling
38 pages
Lecture 4
No ratings yet
Lecture 4
18 pages
BA Unit IV
No ratings yet
BA Unit IV
27 pages
Fdsa U 5
No ratings yet
Fdsa U 5
9 pages
PH.D Presentation
No ratings yet
PH.D Presentation
16 pages
Predictive Analytics Chap 1
No ratings yet
Predictive Analytics Chap 1
22 pages
FDS Unit 5 QB
No ratings yet
FDS Unit 5 QB
8 pages
Document
No ratings yet
Document
42 pages
Sample Tech Sem Report
No ratings yet
Sample Tech Sem Report
19 pages
Business Forecasting
No ratings yet
Business Forecasting
18 pages
CA-2 Business Technology, 2812
No ratings yet
CA-2 Business Technology, 2812
10 pages
Staiqc Paper6
No ratings yet
Staiqc Paper6
20 pages
Group 5 - Smsma
No ratings yet
Group 5 - Smsma
17 pages
Seminar: Predictive Analytics
No ratings yet
Seminar: Predictive Analytics
10 pages
Seminar: Predictive Analytics
No ratings yet
Seminar: Predictive Analytics
10 pages
Predictive Analytics
No ratings yet
Predictive Analytics
8 pages
Unit - 4
No ratings yet
Unit - 4
21 pages
Unit 3
No ratings yet
Unit 3
11 pages
A I and Predictive Analytics
No ratings yet
A I and Predictive Analytics
10 pages
Sap Predictive Analytics Certification Training
No ratings yet
Sap Predictive Analytics Certification Training
7 pages
DMBA302 Unit-15 - V1.1
No ratings yet
DMBA302 Unit-15 - V1.1
27 pages
Analyzing The External Environment
No ratings yet
Analyzing The External Environment
26 pages
Lecture 15
No ratings yet
Lecture 15
5 pages
Business Analytics Using Data Mining: Term 6
No ratings yet
Business Analytics Using Data Mining: Term 6
26 pages
Predictive Analytics A Review of Trends and Techni
No ratings yet
Predictive Analytics A Review of Trends and Techni
7 pages
AltaML Ebook - Key Applications For AI in The Supply Chain
No ratings yet
AltaML Ebook - Key Applications For AI in The Supply Chain
23 pages
63 CAIIB Elective Final - 24 01 2023 - Final
No ratings yet
63 CAIIB Elective Final - 24 01 2023 - Final
15 pages
Large-Scale Seasonal Forecasts of River Discharge by Coupling Local and Global Datasets With A Stacked Neural Network Case For The Loire River System
No ratings yet
Large-Scale Seasonal Forecasts of River Discharge by Coupling Local and Global Datasets With A Stacked Neural Network Case For The Loire River System
14 pages
8.demand Forecasting in A Supply Chain
No ratings yet
8.demand Forecasting in A Supply Chain
24 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Ad3491 Fdsa Unit 5 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 5 Notes Eduengg
7 pages
Predictive Modelling, Analytics and Machine Learning SAS UK
No ratings yet
Predictive Modelling, Analytics and Machine Learning SAS UK
5 pages
Aiou 5568
No ratings yet
Aiou 5568
13 pages
HRM
No ratings yet
HRM
13 pages
IV Ai-Ds Ad3491 Fdsa QB Unit5
No ratings yet
IV Ai-Ds Ad3491 Fdsa QB Unit5
4 pages
HarvardBusiness 2018 Chapter9APredictiveAn HBRGuidetoDataAnalyti
No ratings yet
HarvardBusiness 2018 Chapter9APredictiveAn HBRGuidetoDataAnalyti
6 pages
What Is Predictive Analytics and Why Need It
No ratings yet
What Is Predictive Analytics and Why Need It
2 pages
insideBIGDATA Guide To Predictive Analytics
No ratings yet
insideBIGDATA Guide To Predictive Analytics
11 pages
FAQ Part 1 PDF
No ratings yet
FAQ Part 1 PDF
10 pages
Problem Solving and Decision Making PSDM: Assignment: Estimating WAPDA Bill For My House in Nathia Gali
No ratings yet
Problem Solving and Decision Making PSDM: Assignment: Estimating WAPDA Bill For My House in Nathia Gali
12 pages
Predictive Analytics in Operations
No ratings yet
Predictive Analytics in Operations
12 pages
The Predictive Analytics Model
No ratings yet
The Predictive Analytics Model
6 pages
Assessment (Module 2) - Introduction To Predictive Analytics
No ratings yet
Assessment (Module 2) - Introduction To Predictive Analytics
9 pages
Health Fitness Marketing Plan: Corporate Fitness Executive Summary
No ratings yet
Health Fitness Marketing Plan: Corporate Fitness Executive Summary
14 pages
CHAPTER 2c
No ratings yet
CHAPTER 2c
9 pages
Iot Domain Analyst Digital Assignment - 1: Name: Harshith C S Reg No: 18bec0585 Slot: B1
No ratings yet
Iot Domain Analyst Digital Assignment - 1: Name: Harshith C S Reg No: 18bec0585 Slot: B1
6 pages
Exercise 1 - Planning, Budgeting and Forecasting - Attempt Review
No ratings yet
Exercise 1 - Planning, Budgeting and Forecasting - Attempt Review
6 pages
BA Process
No ratings yet
BA Process
3 pages
Recruitment of Generalist Officers in Scale II, III - Project 2023-24
No ratings yet
Recruitment of Generalist Officers in Scale II, III - Project 2023-24
3 pages
Paper 3
No ratings yet
Paper 3
2 pages
Approaches To Forecasting
No ratings yet
Approaches To Forecasting
3 pages
Chapter 27: The Theory of Active Portfolio Management: Problem Sets
No ratings yet
Chapter 27: The Theory of Active Portfolio Management: Problem Sets
1 page
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Business Analytics: A Comprehensive Guide
From Everand
Business Analytics: A Comprehensive Guide
Naila Hina
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet