0% found this document useful (0 votes)
168 views36 pages

DADS302 Unit-05

This document discusses predictive analysis, including its objectives, working process, model types, applications and tools. Predictive analysis uses techniques like artificial intelligence, data mining and machine learning to identify patterns in past and present data and make predictions about future outcomes. The key steps involve defining goals, collecting and cleaning data, performing deep data analysis to identify patterns, building and selecting the best predictive model, deploying the model and continuously monitoring its performance. Justifications for using predictive analysis include finding fraud, optimizing marketing campaigns, and improving business operations.

Uploaded by

rubhakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views36 pages

DADS302 Unit-05

This document discusses predictive analysis, including its objectives, working process, model types, applications and tools. Predictive analysis uses techniques like artificial intelligence, data mining and machine learning to identify patterns in past and present data and make predictions about future outcomes. The key steps involve defining goals, collecting and cleaning data, performing deep data analysis to identify patterns, building and selecting the best predictive model, deploying the model and continuously monitoring its performance. Justifications for using predictive analysis include finding fraud, optimizing marketing campaigns, and improving business operations.

Uploaded by

rubhakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

MASTER OF BUSINESS ADMINISTRATION


SEMESTER 3

DADS302
EXPLORATORY DATA ANALYSIS

Unit 5: Predictive Analysis 1


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Unit 5
Predictive Analysis

Table of Contents

SL Fig
Topic SAQ/Activity PG NO
NO No/Table/Graph
1 Introduction - -
3–4
1.1 Objectives - -
2 Working 1 - 5–7
What Justifies The Use Of Predictive
3 - - 8
Analysis
4 Model Types 2 - 9 – 11
5 Applications - 12 – 13
6 Tools For Predictive Analysis 3 - 14
7 Regression 4,5 1 15 – 19
8 Regression Line Fitting – Python Code 6-16 - 20 – 26
9 Activity - - 27
10 Summary - - 27
11 Glossary - - 28
12 Concept Map 17 - 28
13 Study Notes & Did You Know - - 28 – 29
14 Case Study 18-21 - 29 – 32
15 Terminal Questions - - 32
16 Self-Assessment Answers - - 32
17 Terminal Questions Answers - - 32 – 36
18 References - - 36

Unit 5: Predictive Analysis 2


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1. INTRODUCTION

➔ A type of technology called predictive analytics generates forecasts regarding some


future unknowns. It uses a variety of methodologies, including artificial intelligence
(AI), data mining, machine learning, modelling, and statistics to arrive at these
conclusions.

➔ For instance, data mining is analysing big data sets to find patterns in them. The same
is done using text analysis, but not for lengthy passages of text.

➔ With predictive analytics, data patterns in the past and present are examined to see
if they are likely to recur. Additionally, operational savings and risk reduction can be
increased through predictive analysis.

➔ Data mining, predictive modelling, and machine learning are all included in predictive
analytics, which is a combination of several statistical technologies and approaches.
In order to anticipate the future, this process thoroughly examines both historical and
current data. Predictive analytics goal is not merely to understand the past, but also
to make a more accurate assessment of what might occur in the future.

➔ Predictive analytics objective, as we all know, is to identify patterns and trends in the
past and present in order to make predictions about the future. There are companies
that have created proprietary, allegedly industry-specific, predictive analytics
solutions. Other firms, however, rely on outside solutions to be proficient in
predictive analytics.

There is a methodical flow that you may use to carry out predictive analytics projects in
either situation. There are seven steps to this.

Unit 5: Predictive Analysis 3


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1.1 Objectives

Following your study of this chapter, you should be able to:

❖ Define the meaning of Predictive analysis


❖ List out the working, uses, model types, benefits, criticism, challenges, applications and
tools.
❖ Describe Regression and its performance including why it is necessary.
❖ Build regression line fitting - python

Unit 5: Predictive Analysis 4


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

1. WORKING

Fig 1: Working

1. Initiative Definition
Understanding the main goal of carrying out a predictive analytics project is a crucial
component of project definition. Clarity is required when answering queries like, "What are
you trying to model?" Organizations will be able to acquire the correct value-driver from this
endeavour by seeking answers to these questions.

2.Collection of Data

Data is the only requirement that predictive analytics needs in order to function effectively.
To execute the appropriate algorithms, however, you require a sizable number of data for
slicing and dicing. There shouldn't be a problem gathering data if a documented method for
doing so already exists.

Unit 5: Predictive Analysis 5


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

On the other hand, firms without such systems must first put up a data aggregation tool to
assist them in gathering raw data from various sources.

3.Cleaning of Data

It is crucial for the company to clean and sanitise the data before you begin your
investigation. Data from many sources must be combined into a single, complete database
with consistent formatting as part of the cleaning process. In order to receive the correct
value-driver from this endeavour, this will guarantee that the analysis utilising the predictive
analytics technology is effective.

4.Deep Data Analysis


The analysis of the data begins with this procedure. The goal of this step is to identify
patterns and trends in the data and use that knowledge to build prediction models that can
show what will happen in the future. There are primarily two techniques for carrying out
such in-depth data analysis.

1. Statistical Regression Methods

2. Machine Learning Techniques

5.Building a Model

Organizations can begin developing a predictive model for forecasting future events
once the cleaned data has been thoroughly reviewed. The software programme will generate
a number of models; therefore, the organisation must choose the best (in terms of accuracy)
to forecast occurrences.

6.Deployment

The chosen model must then be put into use on a daily basis after the models have
been developed and refined. Daily use is connected to the project definition from step 1 once
more. For instance, if the model is used to forecast data security events by examining
computer event logs, it must be used to monitor ongoing operations and produce reports of

Unit 5: Predictive Analysis 6


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

potential security gaps in order to prevent security breaches. Because of the model chosen,
businesses may be able to resolve some challenges proactively.

7.Monitoring
After the models are put into use, it is crucial to continuously monitor them and make any
necessary course modifications. Unchecked over-reliance on this strategy has the potential
to be disastrous.

Unit 5: Predictive Analysis 7


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

3. WHAT JUSTIFIES THE USE OF PREDICTIVE ANALYSIS


Predictive analytics is valued by businesses for a variety of reasons, including its
ability to solve challenging problems and identify new opportunities. Let's examine some of
the main justifications for the importance of predictive analytics.

1. Finding fraud
Combining different models can aid in the identification of any suspicious tendencies
and the prevention of illegal activity. High-performance predictive analytics solutions may
examine the network in real-time to look for any fraud-causing irregularities as
cybersecurity concerns continue to develop.

2.Marketing Campaign Optimization


Businesses utilise it to comprehend customer purchasing patterns, resulting in
chances for upselling and cross-selling.

3.Optimization of Operations
Inventory forecasting and resource management is another example that is used to
illustrate the importance. For example, airlines utilise to determine the cost of their tickets.
In order to determine the maximum occupancy, hotels can forecast their expected
customers, which increases revenue.

Key Lessons :
1. Predictive analytics forecasts future performance using statistics and modelling strategies.
2. Predictive approaches are used in fields and industries like marketing and insurance to
make crucial choices.
3. Predictive models support the creation of investment portfolios, video game development,
voice-to-text message translation, and customer service judgments.
4. Despite the fact that machine learning and predictive analytics are two distinct fields,
people frequently mix them up.
5. Decision trees, regression, and neural networks are a few examples of predictive
modelling types.

Unit 5: Predictive Analysis 8


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

4. MODEL TYPES

Fig 2: Model Types

Decision Tree
• Decision trees may be helpful if you want to understand how someone makes
decisions. This kind of model divides the data into various groups according to
various factors, such as price or market capitalization. It resembles a tree, complete
with individual branches and leaves, as its name suggests. Branches represent the
options, while individual leaves stand for a specific choice.

• The simplest models are decision trees because they are the most straightforward to
comprehend and analyse. When you have to make a decision quickly, they are also
incredibly helpful.

Regression
The most often used model in statistical analysis is this one. Use it when there is a linear
relationship between the inputs and you want to find patterns in vast data sets. The formula
describing the relationship between all the inputs in the dataset is determined by this
method. Regression can be used, for instance, to determine how a security's performance
may be influenced by the price and other important variables.

Unit 5: Predictive Analysis 9


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Neural Networks
As a sort of predictive analytics, neural networks were created by modelling the functioning
of the human brain. Using artificial intelligence and pattern recognition, this model is capable
of handling complex data interactions. Use it if you need to overcome a number of obstacles,
such as when you have an excessive amount of data available, when you lack the necessary
formula to help you identify a relationship between the inputs and outputs in your dataset,
or when you need to make predictions rather than provide an explanation.

Predictive Analysis Benefits :

● The use of predictive analysis has many advantages. Since there are no alternative
(and obvious) solutions accessible, applying this kind of analysis can be helpful to
entities when they need to make predictions regarding outcomes.

● Models can be used by investors, financial experts, and company executives to assist
lower risk. For instance, by taking certain criteria into mind, such as age, capital, and
aspirations, an investor and their advisor can utilise certain models to help create an
investment portfolio with minimal risk to the investor.

● When models are utilised, cost savings is significantly impacted. A product's chance
of success or failure can be predicted by businesses prior to its release. Alternately,
they can reserve funds in front of the manufacturing process by employing predictive
strategies for production enhancements.

Predictive Analytics Criticism :

● Due to perceived disparities in its results, the use of predictive analytics has been
questioned and, in some circumstances, legally limited. Predictive models are most
frequently involved in this, which leads to statistical discrimination against racial or

Unit 5: Predictive Analysis 10


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

ethnic groups in areas like credit scoring, mortgage lending, employment, or


likelihood of criminal conduct.

● The (now prohibited) practise of redlining by banks in home loans is a well-known


illustration of this. Whether or if the conclusions reached via the use of such analytics
are accurate, their usage is typically discouraged, and data that expressly include
information like a person's race are now frequently removed from predictive
analytics.

Challenges

Fig 3: Use cases

Unit 5: Predictive Analysis 11


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

5. APPLICATIONS

1. Early Allergic Reaction Detection in Healthcare


The healthcare sector is another illustration of how algorithms are used for quick, predictive
analyses for prevention. In collaboration with the KeepSmilin4Abbie Foundation, the
Harvard University Wyss Institute created a wearable device that can detect anaphylactic
allergic reactions and provide life-saving epinephrine automatically.

2. Finance : Future Cash Flow Forecasting


Every company must maintain regular financial records, and predictive analytics can be
quite helpful in predicting the future health of your company. You can predict sales, income,
and expenses to create a picture of the future and make decisions by using historical data
from prior financial statements as well as data from a larger industry.

3. Identifying Staffing Needs in the Entertainment & Hospitality Sector


The use of predictive analytics by casino and hotel operator Caesars Entertainment to
ascertain the venue's staffing requirements at various periods is one example covered in
Business Analytics.

Customer inflow and outflow in the entertainment and hospitality industries depend on a
number of variables, all of which affect how many employees a venue or hotel needs at any
given time. Understaffing could lead to a poor customer experience, overworked personnel,
and expensive errors while overstaffing costs money.
A team created a multivariate regression model that took into account several characteristics
to forecast the number of hotel check-ins on a certain day. Caesars was able to staff its hotels
and casinos to the best of its capacity and prevent overstaffing because to this technique.

4. Behavioral Targeting in Marketing


• Consumer data is widely available in marketing and is used to develop content,
adverts, and strategies that are more effective at reaching potential customers where

Unit 5: Predictive Analysis 12


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

they are. Predictive analytics is the process of looking at past behavioural data and
applying it to make future predictions.

• In marketing, predictive analytics can be used to anticipate seasonal sales trends and
organise promotions accordingly.

5. Manufacturing: Preventing Malfunction

• While the aforementioned examples employ predictive analytics to make decisions


based on probable outcomes, you can also use predictive analytics to stop undesirable
or destructive events from happening. For instance, algorithms can be taught using
past data in the manufacturing industry to precisely forecast when a piece of
machinery is going to break down.

• When the conditions for an impending malfunction are satisfied, the algorithm is
activated to notify a worker who can stop the device, possibly save the business
thousands, if not millions, of dollars in lost revenue from damaged goods and repair
expenses. Instead of forecasting malfunction scenarios months or years in advance,
this study provides real-time predictions.

Some algorithms even suggest solutions and improvements to prevent future issues and
boost productivity, saving resources like time, money, and effort. Prescriptive analytics, of
which this is an illustration, are frequently used in conjunction with other forms of analytics
to address problems.

Unit 5: Predictive Analysis 13


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

6. TOOLS FOR PREDICTIVE ANALYSIS

Platform for data science


Through the provision of tools for data preparation and model building anywhere utilizing
open source code or visual modelling, data science platform IBM Watson Studio aids in the
operationalization of AI.

Software for statistical analysis


IBM SPSS Statistics uses ad hoc analysis, hypothesis testing, geographic analysis, and
predictive analytics to address business and research issues.

Tool for visual modelling


With full algorithms and models that are ready for use right away, the IBM SPSS Modeler
solution may assist you in taking advantage of data assets and cutting-edge applications.

Solutions for decision optimization


By providing prescriptive analytics capabilities to supplement predictive insights from
machine learning models, IBM Decision Optimization improves results.

Unit 5: Predictive Analysis 14


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

7. REGRESSION
• Regression looks for connections between different variables. You may, for instance,
watch multiple workers at one company to see how their pay varies depending on factors
like experience, education, role, location, and so on.

• This is a regression problem where each employee's data corresponds to a single


observation. It is assumed that experience, education, role, and city are independent
characteristics whereas income is based upon each of them.

• Regression analysis often involves the consideration of an interesting phenomenon and


a number of observations. There are at least two features in each observation. You
attempt to build a relationship between them under the presumption that at least one of
the traits depends on the others.

To put it another way, you need to find a function that accurately maps some traits or
variables to others.

• The dependent variables, outputs, or reactions are referred to as the dependent


features. Independent variables, inputs, regressors, or predictors are all terms used
to refer to the independent properties.

When Is Regression Necessary?


• Regression is typically used to determine whether and how one phenomenon affects
another or how numerous variables are connected. You can use it, for instance, to
ascertain whether and how much experience and gender affect earnings.

• When attempting to forecast a response using a brand-new set of factors, regression


is also helpful. Consider attempting to forecast a household's electricity usage for the
upcoming hour given the outdoor temperature, the time of day, and the number of
occupants in that household.

Unit 5: Predictive Analysis 15


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Linear Regression
One of the most significant and often employed regression techniques is likely linear
regression. It's one of the easiest regression techniques. The results are simple to interpret,
which is one of its key benefits.

Performance of Regression
• Due in part to the reliance on the predictors xi, the actual answers yi, I = 1,..., n vary.
However, the output also has an additional intrinsic volatility.

• The coefficient of determination, abbreviated as R2, indicates how much of the


variance in y can be accounted for by its dependency on x when a specific regression
model is used. A better fit and the ability of the model to account for how the output
varies with varied inputs are indicated by a bigger R2.

• SSR = 0 is the equivalent as R2 = 1. Given that the values of the projected and actual
answers are a perfect fit, that fits the data perfectly.

Simple Linear Regression


The simplest type of linear regression is simple or single-variate linear regression since only
one independent variable, x = x, is involved.

Unit 5: Predictive Analysis 16


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Simple linear regression is shown in the following figure:

Fig 4: Regression
Underfitting
When a model, typically as a result of its inherent simplicity, is unable to effectively represent
the dependencies among data, underfitting occurs. When used with new data, it frequently
produces a low R2 with known data and poor generalisation skills.

Overfitting
When a model picks up on both random fluctuations and data dependencies, overfitting
results. In other words, a model becomes too adept at learning from the available data.
Overfitting frequently occurs in complex models, which have numerous features or terms.
When used with known data, these models typically produce high R2. However, when
applied to fresh data, they frequently don't generalise well and have considerably lower R2.

Unit 5: Predictive Analysis 17


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

The underfitted, well-fitted, and overfitted models are depicted in the


following figure:

Fig 5: Underfitting, overfitting


PACKAGES FOR PYTHON

• A basic scientific Python library called NumPy enables a wide range of high-
performance operations on both single-dimensional and multidimensional arrays.
Numerous mathematical operations are also provided. Naturally, it is open-source.

Unit 5: Predictive Analysis 18


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

• A popular Python machine learning library called scikit-learn was created on top of
NumPy and a few additional libraries. It offers the tools for data preprocessing,
dimensionality reduction, regression implementation, classification, clustering, and
more. Scikit-Learn is also open-source, just like NumPy.

Self-Assessment Questions - 1

1. __________type of question is answered by predictive analysis.


2. List out any 2 packages for prediction.

Unit 5: Predictive Analysis 19


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

8. REGRESSION LINE FITTING – PYTHON CODE


When using linear regression, there are five fundamental steps to follow:
1. Import the classes and packages you require.
2. Provide data to work with, then do the necessary transformations.
3. Develop a regression model and fit the results to the data.
4. Verify the model fitting findings to see whether it is adequate.
5. Use the model to make forecasts.

For the vast majority of regression methodologies and implementations, these phases are
more or less generic. You'll discover how to carry out these measures throughout the
tutorial's remaining sections for a variety of situations.

Step 1: Import packages and classes

The class LinearRegression from sklearn.linear model and the package numpy must first be
imported

Fig 6: importing Packages

The numpy.ndarray array type is the base data type for NumPy. The remainder of this
tutorial refers to instances of the numpy.ndarray class as arrays.

You'll employ the sklearn.linear model class.

To execute linear and polynomial regression and provide appropriate predictions, use
LinearRegression.

Unit 5: Predictive Analysis 20


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Step 2: Providing Data

Fig 7: Data
The second step is defining data to work with. The inputs (regressors, 𝑥) and output
(response, 𝑦) should be arrays or similar objects. This is the simplest way of providing data
for regression:

The input array (x) and the output array (y) are now both arrays. This array must be two-
dimensional, or more specifically, it must have one column and as many rows as are required.
As a result, you should use the.reshape() method on x. That is exactly what the.reshape()
parameter (-1, 1) specifies.

Currently, x and y look like this,

Fig 8: arrays
Step 3: Create model and fitting it

The following step is to build a linear regression model and fit the data to the model.
Create a LinearRegression class instance to represent the regression model:

Unit 5: Predictive Analysis 21


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

The variable model is created as an instance of LinearRegression by this expression.


LinearRegression accepts a number of optional options, including:

● A Boolean value called fit_intercept determines whether to calculate the intercept b0


or, if False, whether to treat it as zero. Default value is True.
● If the Boolean value of normalise is True, the input variables will be normalised. If it
is set to False by default, the input variables are not normalised.
● The Boolean value copy_X determines whether to copy (True) or replace the input
variables (False). It is automatically True.
● It's either an integer or None for n_jobs. It shows how many jobs are employed in
parallel computation. None, the default value, often denotes one task. -1 indicates that
all processors are being used.

Our model, as described above, makes use of all parameter default settings.
Now is the time to use the model. You must first call.fit() on the model:

Fig 9: Model fitting

Using the current input and output, x and y, as the parameters, you can use the function.fit()
to get the ideal values of the weights b0 and b1. Or to put it another way,.fit() fits the model.
Self, the variable model itself, is the result. Because of this, you can substitute the following
sentence for the previous two:

Unit 5: Predictive Analysis 22


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 10: Regression()

Step 4: Getting results


Once your model has been fitted, you may use the findings to determine whether it functions
as expected and to understand it.

.score() applied on model returns the coefficient of determination, R2, which is:

Fig 11: Results


The predictor x and answer y are also arguments when using.score(), and R2 is the result.

The model's properties are.intercept_, which stands for the coefficient b0, and.coef_, which
stands for b1:

Fig 12: Intercepts


The code above shows how to obtain b0 and b1. You'll see that while.coef_ is an
array,.intercept_ is a scalar.

Unit 5: Predictive Analysis 23


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

B0 has a value of around 5.63. This demonstrates how, when x is zero, your model correctly
predicts the answer of 5.63. When x is increased by one, the projected response increases by
0.54, according to the value of b1 = 0.54.

You'll see that y is also available as a two-dimensional array. You'll receive a similar outcome
in this situation. Here's how it might appear:

Fig 13: Slope

This example, as you can see, is quite similar to the previous one, except in this
case,.intercept_ is a one-dimensional array with a single element, b0, and.coef_ is a two-
dimensional array with a single element, b1.

Step 5: Determine the outcome


Once you have a model that works well, you may use it to make predictions using old or new
data. Use.predict() to get the predicted response.

Fig 14: Predicted response


You supply the regressor as the input when using.predict() to obtain the associated
predicted response. This method of anticipating the response is remarkably similar:

Unit 5: Predictive Analysis 24


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 15: model intercept

In this instance, you would multiply each component of x by model.coef_ and then add
model.intercept_ to the result.

Only the output's dimensions are different from the previous example. The predicted
response has changed from having a single dimension to a two-dimensional array.

Both methods will get the same outcome if you reduce the number of dimensions in x to one.
To achieve this, multiply x by model and then substitute x.reshape(-1), x.flatten(), or x.ravel()
for x. coef_.

Regression models are frequently used in practise to make forecasts. So you may calculate
the outputs based on fresh inputs using fitted models:

Unit 5: Predictive Analysis 25


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 16: Result

Here . The new response, y new, is produced by applying .predict() to the new regressor, x
new. In order to create an array with the entries 0, inclusive, up to but excluding 5, that is,
0, 1, 2, 3, and 4, this example neatly uses numpy's arange() function.

Unit 5: Predictive Analysis 26


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

9. ACTIVITY

ACTIVITY A
In the past ten years, corporations have increasingly used predictive programming. Businesses use
predictive programming to find problems and opportunities, anticipate customer behaviour and
trends, and improve decision-making. One of the frequent use cases that highlights the significance
of predictive modelling in machine learning is fraud detection. Real-time remote analytic capabilities
can enhance fraud detection scenarios and boost security effectiveness.

……………………………………………………………………………………………………………………
………………….

The objective is to merge various data sets, identify anomalies, and stop illegal activity.

10. SUMMARY

A type of technology called predictive analytics generates forecasts regarding some future
unknowns. It uses a variety of methodologies, including artificial intelligence (AI), data
mining, machine learning, modelling, and statistics to arrive at these conclusions. Data
mining, predictive modelling, and machine learning are all included in predictive analytics,
which is a combination of several statistical technologies and approaches. In order to
anticipate the future, this process thoroughly examines both historical and current data.
Predictive analytics is valued by businesses for a variety of reasons, including its ability to
solve challenging problems and identify new opportunities. The simplest type of linear
regression is simple or single-variate linear regression since only one independent variable,
x = x, is involved.

Unit 5: Predictive Analysis 27


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

11. GLOSSARY

Data Mining - Large data sets are sorted through in data mining in order to find patterns and
relationships that may be used in data analysis to assist solve business challenges.

Decision Tree - A decision tree is a graph that use the branching approach to show each
potential result for a certain input.

12. CONCEPT MAP

Fig 17: Predictive analysis

13. STUDY NOTES & DID YOU KNOW

Big data is frequently mentioned in relation to predictive analytics. For instance, engineering
data is gathered through global sensors, instruments, and networked systems. Transactional
data, sales figures, client complaints, and marketing data are examples of business system
data that can be found at a corporation. On the basis of this rich source of data, organisations
make data-driven decisions more frequently.

Unit 5: Predictive Analysis 28


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

90% of the data in the global datasphere is duplicated, and 0% is original. According to one
of the articles posted on CIO, between 80 and 90% of the data in the global digital realm is
unstructured. To download all the data from the internet now, a user would need 181 million
years.

14. CASE STUDY


In this article, to understand more about predictive analysis, I'll walk you through a Python
machine learning project that predicts home prices.

Let's begin by importing the dataset and the required Python libraries:

Fig 18 : Loading dataset

There are 10 attributes in the collection, and each row represents a district. Let's now
employ the info() method to quickly describe the data, including the total number of rows,
the nature of each attribute, and the proportion of non-zero values

Unit 5: Predictive Analysis 29


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 19: Getting the information of the data

The dataset has 20,640 occurrences. Due to the fact that there are only 20,433 non-zero
values for the total bedrooms attribute, 207 districts do not have any values. Later, we will
have to address that.

With the exception of the ocean proximity parameter, all attributes are numeric. Since it is
an object, any sort of Python object may be contained in it. Using the value counts() method,
you can determine which categories are present in that column and how many districts fall
under each category.

housing.ocean_proximity.value_counts()

Plotting a histogram for each number attribute is another easy approach to determine the
type of data you're working with

Unit 5: Predictive Analysis 30


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 20:Visualization

As most median income values cluster between 1.5 and 6, but some median income extends
way beyond 6, let's take a deeper look at the histogram of median income.

It is crucial to have enough instances of each stratum in your dataset; otherwise, the
estimation of a stratum's value may be skewed. This means that each layer should be large
enough and that there shouldn't be too many strata.

Unit 5: Predictive Analysis 31


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

Fig 21: Median score

15. TERMINAL QUESTIONS

Short answer Type Questions:


1. List out the types of Predictive Analysis Model.
2. Define Regression.
Long answer type Questions:
1. Explain any four applications of predictive analysis.
2. Compute regression line fitting by python code.

16. SELF-ASSESSMENT ANSWERS

1. What can
2. Numpy and Pandas

17. TERMINAL QUESTIONS ANSWERS


Short Answer Type:
1. List out the types of Predictive Analysis Model.
• Decision Tree
• Regression

Unit 5: Predictive Analysis 32


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

• Neural Networks
2. Define Regression.
A statistical method called regression links a dependent variable to one or more
independent (explanatory) variables. A regression model can demonstrate whether changes
in one or more of the explanatory variables are related to changes in the dependent variable.

Long Answer Type:


1. Explain any four applications of predictive analysis.
Applications :

➔ Forecasting consumer Behaviour :


The ability to forecast consumer behaviour in the retail sector is one of the main applications
of predictive analytics. Businesses utilise cutting-edge analytics to determine customer
buying patterns based on past purchases.
A good illustration is Walmart. It used earliest data to understand purchasing trends under
particular circumstances. Small e-tailers can utilise predictive analytics at the moment of sale
to foresee customer purchasing patterns. It helps to have a deeper and more thorough
understanding of your clients.

➔ Detection of Fraud :
As concerns about cybersecurity rise, there are numerous examples of predictive analytics.
Fraud detection is the most crucial. These models may notice odd behaviour and identify
system anomalies to pinpoint hazards.
For instance, experts can provide the system with previous data regarding cyberthreats and
risks. The appropriate staff will receive a notification when the predictive analytics
programme spots anything similar. It will limit access for hackers and gaps that might put
the system in danger.

Unit 5: Predictive Analysis 33


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

➔ A Medical Diagnosis :
The healthcare industry benefits the most from the predictive analysis module. Health
information is crucial to fully understand the medical past and present of any patient.
Predictive analytics models contribute to the knowledge of the problem by providing an
accurate diagnosis based on historical data.
Using certain health metrics, predictive analytics helps practitioners determine the
underlying causes of diseases. As a result, they have rapid analytics, which enables them to
start making treatments right away. The propagation of negative health effects can be halted
using predictive analytics algorithms.

➔ Content Suggestion :
One of the most accessible and apparent predictive analytics examples is content suggestion.
Entertainment firms can forecast what viewers will watch based on their past behaviour
through algorithms and models.
What businesses employ predictive analytics, you ask? The most appropriate response is
Netflix. The entertainment company makes recommendations to customers for material
based on genre, keywords, ratings, and other factors using predictive algorithms. The
intelligent system predicts user behaviour using extremely sophisticated analytics.

Unit 5: Predictive Analysis 34


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

2.Compute regression line fitting by python code.

Fig 22: Packages & importing

Fig 23: Predicting response vector

Unit 5: Predictive Analysis 35


DADS302: Exploratory Data Analysis Manipal University Jaipur (MUJ)

OUTPUT :

Fig 24: Output

18. REFERENCE
• https://www.techfunnel.com/hr-tech/predictive-analytics/
• https://www.ibm.com/analytics/predictive-analytics
• https://www.investopedia.com/terms/p/predictive-analytics.asp
• https://www.geeksforgeeks.org/linear-regression-python-implementation/
• https://thecleverprogrammer.com/2020/12/29/house-price-prediction-with-python/
• https://blogs.sap.com/2021/07/09/7-real-world-use-cases-of-predictive-
analytics/#:~:text=There%20are%20countless%20examples%20of,%2C%20healthcare%2C
%20and%20many%20more.&text=One%20of%20the%20biggest%20uses,learn%20all%20a
bout%20their%20customers.

Unit 5: Predictive Analysis 36

You might also like