0% found this document useful (0 votes)
11 views30 pages

Unit-2 Notes

Multiple regression is a statistical technique used to analyze the relationship between a dependent variable and multiple independent variables for prediction, explanation, and theory building. The document discusses the assumptions of multiple regression, differences between simple and multiple linear regression, and logistic regression, as well as how to conduct multiple regression analysis in Excel. Additionally, it covers artificial neural networks, their structure, learning process, and applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

Unit-2 Notes

Multiple regression is a statistical technique used to analyze the relationship between a dependent variable and multiple independent variables for prediction, explanation, and theory building. The document discusses the assumptions of multiple regression, differences between simple and multiple linear regression, and logistic regression, as well as how to conduct multiple regression analysis in Excel. Additionally, it covers artificial neural networks, their structure, learning process, and applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Multiple Regression and Model Building

Multiple regression is a statistical technique that can be used to analyze the relationship between a
single dependent variable and several independent variables. The objective of multiple regression
analysis is to use the independent variables whose values are known to predict the value of the single
dependent value.

• Purposes:

1. Prediction
2. Explanation
3. Theory building

Assumptions/Conditions
• Independence: the scores of any particular subject are independent of the scores of all other
subjects

• Normality: in the population, the scores on the dependent variable are normally distributed for
each of the possible combinations of the level of the X variables; each of the variables is normally
distributed

• Homoscedasticity: in the population, the variances of the dependent variable for each of the
possible combinations of the levels of the X variables are equal.

• Linearity: In the population, the relation between the dependent variable and the independent
variable is linear when all the other independent variables are held constant.
Multiple regression analysis allows researchers to assess the strength of the relationship between an
outcome (the dependent variable) and several predictor variables as well as the importance of each of the
predictors to the relationship, often with the effect of other predictors statistically eliminated.

Multiple regression is a broader class of regressions that encompasses linear and nonlinear regressions
with multiple explanatory variables. Whereas linear regress only has one independent variable
impacting the slope of the relationship, multiple regression incorporates multiple independent
variables.

What is difference between simple linear and multiple linear regressions? Simple linear regression has
only one x and one y variable. Multiple linear regression has one y and two or more x variables. For
instance, when we predict rent based on square feet alone that is simple linear regression.

What is the difference between logistic regression and multiple regression?


Multiple linear regression can find one or more possible correlations between variables, such as in cause-
and-effect relationships. In logistic regression, independent variables share no correlations, as they're all
independent of one another and have no dependent variables.
In SPSS, multiple regression is conducted by the researcher by selecting “regression” from the “analyze
menu.” From regression, the researcher selects the “linear” option.

Any disadvantage of using a multiple regression model usually comes down to the data being used. Two
examples of this are using incomplete data and falsely concluding that a correlation is a causation.

The best known estimation method of linear regression is the least squares method.

Multiple Regression and Model Building in Excel


Step 1: Load the Analysis ToolPak in Excel

1. Windows:
Click the File tab, click Options, and then click the Add-Ins category.

If you're using Excel 2007, click the Microsoft Office Button , and then
click Excel Options

2. In the Manage box, select Excel Add-ins and then click Go.

If you're using Excel for Mac, in the file menu go to Tools > Excel Add-ins.

3. In the Add-Ins box, check the Analysis ToolPak check box, and then
click OK.

 If Analysis ToolPak is not listed in the Add-Ins available box,


click Browse to locate it.
 If you are prompted that the Analysis ToolPak is not currently installed on
your computer, click Yes to install it.

Building a Multiple Linear Regression Model

Most outcomes in real situations are affected by multiple input variables. To


understand such relationships, we use models that use more than one input
(independent variables) to linearly model a single output (dependent variable).

A multiple linear regression model is a linear equation that has the general form: y
= b1x1 + b2x2 + … + c where y is the dependent variable, x1, x2… are the
independent variable, and c is the (estimated) intercept. The intercept of the
regression line is just the predicted value for y, when x is 0. Any line has an
equation, in terms of its slope and intercept: y = slope x x + intercept.
In the above data, the ‘Number of weekly riders’ is a dependent variable that
depends on the ‘Price per week ($)’, ‘Population of city’, ‘Monthly income of
riders ($)’, ‘Average parking rates per month ($)’.
Let us assign the variables:
 Price per week ($) – x1
 Population of city – x2
 Monthly income of riders ($) – x3
 Average parking rates per month ($)- x4
 Number of weekly riders – y
The linear model would be of the form: y = ax1 + bx2 + cx3 + dx4 + e where a, b,
c, d are the respective coefficients and e is the intercept.
There are a two different ways to create the linear model on Microsoft Excel. In
this article, we will take a look at the Regression function included in the Data
Analysis ToolPak.
After the Data Analysis ToolPak has been enabled, you will be able to see it on the
Ribbon, under the Data tab:

Click Data Analysis to open the Data Analysis ToolPak, and select Regression
from the Analysis tools that are displayed.

Select the data ranges in the options:


The output looks like this:
Right on top are the Regression Statistics. Here we are interested in the following
measures:
 Multiple R, which is the coefficient of linear correlation
 Adjusted R Square, which is the R Square (coefficient of determination) adjusted
for more than one independent variable
We are also interested in the coefficients at the bottom. We are most interested in
the Coefficients column. The Lower 95% and Upper 95% columns give the lower
and upper limits for the coefficients. We see that the following are the coefficients:
 Price per week ($): -689.5227
 Population of city: 0.0549
 Monthly income of riders ($): -1.3014
 Average parking rates per month ($): 152.4563
 Intercept: 100222.5607
The linear equation is:
Logistic Regression
Logistic regression is commonly used for prediction and classification problems.
Some of these use cases include: Fraud detection: Logistic regression models can
help teams identify data anomalies, which are predictive of fraud.
Logistic regression is a statistical analysis method to predict a binary outcome,
such as yes or no, based on prior observations of a data set. A logistic
regression model predicts a dependent data variable by analyzing the relationship
between one or more existing independent variables.

Logistic Regression is one of the basic and popular algorithms to solve a


classification problem. It is named 'Logistic Regression' because its underlying
technique is quite the same as Linear Regression. The term “Logistic” is taken
from the Logit function that is used in this method of classification.
In the real world, you can see logistic regression applied across multiple areas and
fields. In health care, logistic regression can be used to predict if a tumor is
likely to be benign or malignant. In the financial industry, logistic regression can
be used to predict if a transaction is fraudulent or not.
Logistic regression is easier to implement, interpret, and very efficient to train.
It is very fast at classifying unknown records. It performs well when the dataset is
linearly separable. It can interpret model coefficients as indicators of feature
importance.
Logistic regression is also known as Binomial logistics regression. It is based on
sigmoid function where output is probability and input can be from -infinity to
+infinity. Let’s discuss some advantages and disadvantages of Linear Regression.

Advantages Disadvantages

If the number of observations is lesser


than the number of features, Logistic
Logistic regression is easier to implement, Regression should not be used,
interpret, and very efficient to train. otherwise, it may lead to overfitting.
Advantages Disadvantages

It makes no assumptions about distributions


of classes in feature space. It constructs linear boundaries.

It can easily extend to multiple The major limitation of Logistic


classes(multinomial regression) and a Regression is the assumption of linearity
natural probabilistic view of class between the dependent variable and the
predictions. independent variables.

It not only provides a measure of how It can only be used to predict discrete
appropriate a predictor(coefficient size)is, functions. Hence, the dependent
but also its direction of association (positive variable of Logistic Regression is bound
or negative). to the discrete number set.

Non-linear problems can’t be solved


with logistic regression because it has a
linear decision surface. Linearly
It is very fast at classifying unknown separable data is rarely found in real-
records. world scenarios.

Good accuracy for many simple data sets Logistic Regression requires average or
and it performs well when the dataset is no multicollinearity between
linearly separable. independent variables.

It is tough to obtain complex


relationships using logistic regression.
More powerful and compact algorithms
It can interpret model coefficients as such as Neural Networks can easily
indicators of feature importance. outperform this algorithm.

Logistic regression is less inclined to over- In Linear Regression independent and


fitting but it can overfit in high dimensional dependent variables are related linearly.
datasets.One may consider Regularization But Logistic Regression needs that
(L1 and L2) techniques to avoid over- independent variables are linearly
fittingin these scenarios. related to the log odds (log(p/(1-p)).
Instead of predicting exactly 0 or 1, logistic regression generates a probability—a
value between 0 and 1, exclusive. For example, consider a logistic regression
model for spam detection. If the model infers a value of 0.932 on a particular email
message, it implies a 93.2% probability that the email message is spam. More
precisely, it means that in the limit of infinite training examples, the set of
examples for which the model predicts 0.932 will actually be spam 93.2% of the
time and the remaining 6.8% will not.
Neural Networks
Artificial Neural Networks and their Biological Motivation

Artificial Neural Network (ANN)

There is no universally accepted definition of an NN. But perhaps most people in the field would agree
that an NN is a network of many simple processors (“units”), each possibly having a small amount of
local memory. The units are connected by communication channels (“connections”) which usually carry
numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their
local data and on the inputs they receive via the connections. The restriction to local operations is often
relaxed during training.

Some NNs are models of biological neural networks and some are not, but historically, much of the
inspiration for the field of NNs came from the desire to produce artificial systems capable of
sophisticated, perhaps “intelligent”, computations similar to those that the human brain routinely
performs, and thereby possibly to enhance our understanding of the human brain.

Most NNs have some sort of “training” rule whereby the weights of connections are adjusted on the basis
of data. In other words, NNs “learn” from examples (as children learn to recognize dogs from examples
of dogs) and exhibit some capability for generalization beyond the training data.

NNs normally have great potential for parallelism, since the computations of the components are largely
independent of each other. Some people regard massive parallelism and high connectivity to be defining
characteristics of NNs, but such requirements rule out various simple models, such as simple linear
regression (a minimal feed forward net with only two units plus bias), which are usefully regarded as
special cases of NNs.

According to Haykin, Neural Networks: “A Comprehensive Foundation: A neural network is a


massively parallel distributed processor that has a natural propensity for storing experimental knowledge
and making it available for use.”

It resembles the brain in two respects:

1. Knowledge is acquired by the network through a learning process.

2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.

We can also say that: Neural networks are parameterised computational nonlinear algorithms for
(numerical) data/signal/image processing. These algorithms are either implemented on a general-purpose
computer or are built into a dedicated hardware.

Taxonomy of neural networks

From the point of view of their active or decoding phase, artificial neural networks can be classified into
feed forward (static) and feedback (dynamic, recurrent) systems.
From the point of view of their learning or encoding phase, artificial neural networks can be classified
into supervised and unsupervised systems.

Feed forward supervised networks: This network is typically used for function approximation tasks.
Specific examples include:

• Linear recursive least-mean-square (LMS) networks

• Back propagation networks

• Radial Basis networks

Feed forward unsupervised networks: These networks are used to extract important properties of the input
data and to map input data into a “representation” domain. Two basic groups of methods belong to this
category.

Perceptron:

The perceptron was introduced by McCulloch and Pitts in 1943 as an artificial neuron with a hard-
limiting activation function. Recently the term multilayer perceptron has often been used as a synonym
for the term multilayer feedforward neural network.

Neural Network
A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a
process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, artificial in
nature.

Neural networks can help computers make intelligent decisions with limited human assistance. This is because they can learn
and model the relationships between input and output data that are nonlinear and complex.

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine
learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking
the way that biological neurons signal to one another. Examples of various types of neural networks are Hopfield network, the
multilayer perceptron, the Boltzmann machine, and the Kohonen network. The most commonly used and successful neural
network is the multilayer perceptron and will be discussed in detail.

ANNs have the ability to learn and model non-linear and complex relationships, which is really important because in real-
life, many of the relationships between inputs and outputs are non-linear as well as complex.

Where is Ann used in real life?


Google makes use of artificial neural networks in recurrent connection to power voice search. Microsoft also claims to have
developed a speech-recognition system – using Neural Networks, which can transcribe conversations slightly more accurately
than humans.
Neural networks (NN) are parallel information processing systems consisting of a number of simple neurons (also called
nodes or units), which are organized in layers and which are connected by links.

Similarity with biological network

Fundamental processing elements of a neural network is a neuron


1.Receives inputs from other source

2.Combines them in someway

3.Performs a generally nonlinear operation on the result

4.Outputs the final result

 Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after
testing phase, to classify unknown data.

 Neural Network needs long time for training.

 Neural Network has a high tolerance to noisy and incomplete data


Example of Max-Min
Normalization
Max- Min normalization formula
v  min A
v'  ( new _ max A  new _ min A)  new _ min A
max A  min A
Example: We want to normalize data to range of the interval [0,1].
We put: new_max A= 1, new_minA =0.

Say, max A was 100 and min A was 20 ( That means maximum and minimum
values for the attribute ).

Now, if v = 40 ( If for this particular pattern , attribute value is 40 ), v’


will be calculated as , v’ = (40-20) x (1-0) / (100-20) + 0
=> v’ = 20 x 1/80
=> v’ = 0.4
Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are
fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster
data at a high velocity. Tasks in speech recognition or image recognition can take minutes versus hours when compared to the
manual identification by human experts. One of the most well-known neural networks is Google’s search algorithm.

Specific types of artificial neural networks include:

 Feed-forward neural networks: one of the simplest variants of neural networks. They pass information in one direction,

through various input nodes, until it makes it to the output node. The network may or may not have hidden node layers,
making their functioning more interpretable. It is prepared to process large amounts of noise. This type of ANN

computational model is used in technologies such as facial recognition and computer vision.

 Recurrent neural networks: more complex. They save the output of processing nodes and feed the result back into the model.

This is how the model is said to learn to predict the outcome of a layer. Each node in the RNN model acts as a memory cell,

continuing the computation and implementation of operations. This neural network starts with the same front propagation as a

feed-forward network, but then goes on to remember all processed information in order to reuse it in the future. If the

network's prediction is incorrect, then the system self-learns and continues working towards the correct prediction during

backpropagation. This type of ANN is frequently used in text-to-speech conversions.

 Convolutional neural networks: one of the most popular models used today. This neural network computational model uses a

variation of multilayer perceptronsand contains one or more convolutional layers that can be either entirely connected or

pooled. These convolutional layers create feature maps that record a region of image which is ultimately broken into

rectangles and sent out for nonlinear The CNN model is particularly popular in the realm of image recognition; it has been

used in many of the most advanced applications of AI, including facial recognition, text digitization and natural language

processing. Other uses include paraphrase detection, signal processing and image classification.

 Deconvolutional neural networks: utilize a reversed CNN model process. They aim to find lost features or signals that may

have originally been considered unimportant to the CNN system's task. This network model can be used in image synthesis

and analysis.

 Modular neural networks: contain multiple neural networks working separately from one another. The networks do not

communicate or interfere with each other's activities during the computation process. Consequently, complex or big

computational processes can be performed more efficiently.

Advantages of artificial neural networks

Advantages of artificial neural networks include:

 Parallel processing abilities mean the network can perform more than one job at a time.

 Information is stored on an entire network, not just a database.

 The ability to learn and model nonlinear, complex relationships helps model the real-life relationships between input and

output.

 Fault tolerance means the corruption of one or more cells of the ANN will not stop the generation of output.

 Gradual corruption means the network will slowly degrade over time, instead of a problem destroying the network instantly.
 The ability to produce output with incomplete knowledge with the loss of performance being based on how important the

missing information is.

 No restrictions are placed on the input variables, such as how they should be distributed.

 Machine learning means the ANN can learn from events and make decisions based on the observations.

 The ability to learn hidden relationships in the data without commanding any fixed relationship means an ANN can better

model highly volatile data and non-constant variance.

 The ability to generalize and infer unseen relationships on unseen data means ANNs can predict the output of unseen data.

Disadvantages of artificial neural networks

The disadvantages of ANNs include:

 The lack of rules for determining the proper network structure means the appropriate artificial neural network architecture can

only be found through trial and error and experience.

 The requirement of processors with parallel processing abilities makes neural networks hardware-dependent.

 The network works with numerical information, therefore all problems must be translated into numerical values before they

can be presented to the ANN.

 The lack of explanation behind probing solutions is one of the biggest disadvantages in ANNs. The inability to explain the

why or how behind the solution generates a lack of trust in the network.

Applications of artificial neural networks

Image recognition was one of the first areas to which neural networks were successfully applied, but the technology uses have

expanded to many more areas, including:

 Chatbots

 Natural language processing, translation and language generation

 Stock market prediction

 Delivery driver route planning and optimization

 Drug discovery and development

These are just a few specific areas to which neural networks are being applied today. Prime uses involve any process that

operates according to strict rules or patterns and has large amounts of data. If the data involved is too large for a human to make

sense of in a reasonable amount of time, the process is likely a prime candidate for automation through artificial neural networks.
Naïve Bayes and Bayesian Networks
A naive Bayesian network is a Bayesian network with a single root, all other nodes are children of the root,
and there are no edges between the other nodes.

Naive Bayes assumes conditional independence, P(X|Y,Z)=P(X|Z), Whereas more general Bayes Nets (sometimes
called Bayesian Belief Networks) will allow the user to specify which attributes are, in fact, conditionally independent.

Model Evaluation Techniques


Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context
of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate.
Model evaluation plays a crucial role while developing a predictive machine learning model. Building just a predictive model
without checking does not count as a fit model but a model which gives maximum accuracy surely does count a good one. For
this, you need to check on the metrics and make improvements accordingly until you get your desired accuracy rate. Following
are the model evaluation techniques:

1| Chi-Square

The χ2 test is a method which is used to test the hypothesis between two or more groups in order to check the independence
between the two variables. It is basically used to analyse the categorical data and evaluate Tests of Independence when using a
bivariate table. Some examples of Chi-Square tests are Fisher’s exact test, Binomial test, etc. The formula for calculating a Chi-
Square statistic is given as

2| Confusion Matrix

The confusion matrix is also known as Error matrix and is represented by a table which describes the performance of a
classification model on a set of test data in machine learning. In the above table, Class 1 is depicted as the positive table and
Class 2 is depicted as the negative table. It is a two-dimensional matrix where each row represents the instances in predictive
class while each column represents the instances in the actual class or you put the values in the other way. Here, TP (True
Positive) means the observation is positive and is predicted as positive, FP (False Positive) means observation is positive but is
predicted as negative, TN (True Negative) means the observation is negative and is predicted as negative and FN (False
Negative) means the observation is negative but it is predicted as positive.
3| Concordant-Discordant Ratio

In a pair of cases when one case is higher on both the variables than the other cases, it is known as a concordant pair. On the
other hand, in a pair of cases where one case is higher on one variable than the other case but lower on the other variable, it is
known as a discordant pair.

Suppose, there are a pair of observations (Xa, Ya) and (Xb, Yb)

Then, the pair is concordant if Xa>Xb and Ya>Yb or Xa<Xb and Ya<Yb

And the pair is discordant if Xa>Xb and Ya<Yb or Xa<Xb and Ya>Yb.

4| Confidence Interval

Confidence Interval or CI is the range of values which is required to meet a certain confidence level in order to estimate the
features of the total population. In the domain of machine learning, Confidence Intervals basically consist of a range of potential
values of an unknown population parameter and the factors which are affecting the width of the confidence interval are the
confidence level, size as well as variability of the sample.

5| Gini Co-efficient

The Gini coefficient or Gini Index is a popular metric for imbalanced class values. It is a statistical measure of distribution
developed by the Italian statistician Corrado Gini in 1912. The coefficient ranges from 0 to 1 where 0 represent perfect equality
and 1 represents perfect inequality. Here, if the value of an index is higher, then the data will be more dispersed.

6| Gain and Lift Chart

This method is generally used to evaluate the performance of the classification model in machine learning and is calculated as the
ratio between the results obtained with and without the model. Here, the gain is defined as the ratio of the cumulative number of
targets to the total number of targets in the entire dataset and lift is defined as for how many times the model is better than the
random choice of cases.

7| Kolmogorov-Smirnov Chart

This non-parametric statistical test measures the performance of classification models where it is defined as the measure of the
degree of separation between the positive and negative distributions. The KS test is generally used to compare the equality of a
single sample with another.

8| Predictive Power

Predictive Power is a synthetic metric which satisfies interesting properties like it is always been 0 and 1 where 0 represents that
the feature subset has no predictive power and 1 represents that the feature subset has maximum predictive power.and is used to
select a good subset of features in any machine learning project.

9| AUC-ROC Curve

ROC or Receiver Operating Characteristics Curve is one of the most popular evaluation metrics for checking the performance of
a classification model. The curve plots two parameters, True Positive Rate (TPR) and False Positive Rate (FPR). Area Under
ROC curve is basically used as a measure of the quality of a classification model. Hence, the AUC-ROC curve is the performance
measurement for the classification problem at various threshold settings.

The True Positive Rate or Recall is defined as


The False Positive Rate is defined as

10| Root Mean Square Error

Root Mean Squared Error or RMSE is defined as the measure of the differences between the values predicted by a model and the
values actually observed. It is basically the square root of MSE, Mean Squared Error which is the average of the squared error
used as the loss function for least squares regression.

Specifically, the RMSE is defined as

Predictive models are proving to be quite helpful in predicting the future growth of businesses, as it predicts outcomes using
data mining and probability, where each model consists of a number of predictors or variables. A statistical model can,
therefore, be created by collecting the data for relevant variables.

You might also like