Unit-2 Notes
Unit-2 Notes
Multiple regression is a statistical technique that can be used to analyze the relationship between a
single dependent variable and several independent variables. The objective of multiple regression
analysis is to use the independent variables whose values are known to predict the value of the single
dependent value.
• Purposes:
1. Prediction
2. Explanation
3. Theory building
Assumptions/Conditions
• Independence: the scores of any particular subject are independent of the scores of all other
subjects
• Normality: in the population, the scores on the dependent variable are normally distributed for
each of the possible combinations of the level of the X variables; each of the variables is normally
distributed
• Homoscedasticity: in the population, the variances of the dependent variable for each of the
possible combinations of the levels of the X variables are equal.
• Linearity: In the population, the relation between the dependent variable and the independent
variable is linear when all the other independent variables are held constant.
Multiple regression analysis allows researchers to assess the strength of the relationship between an
outcome (the dependent variable) and several predictor variables as well as the importance of each of the
predictors to the relationship, often with the effect of other predictors statistically eliminated.
Multiple regression is a broader class of regressions that encompasses linear and nonlinear regressions
with multiple explanatory variables. Whereas linear regress only has one independent variable
impacting the slope of the relationship, multiple regression incorporates multiple independent
variables.
What is difference between simple linear and multiple linear regressions? Simple linear regression has
only one x and one y variable. Multiple linear regression has one y and two or more x variables. For
instance, when we predict rent based on square feet alone that is simple linear regression.
Any disadvantage of using a multiple regression model usually comes down to the data being used. Two
examples of this are using incomplete data and falsely concluding that a correlation is a causation.
The best known estimation method of linear regression is the least squares method.
1. Windows:
Click the File tab, click Options, and then click the Add-Ins category.
If you're using Excel 2007, click the Microsoft Office Button , and then
click Excel Options
2. In the Manage box, select Excel Add-ins and then click Go.
If you're using Excel for Mac, in the file menu go to Tools > Excel Add-ins.
3. In the Add-Ins box, check the Analysis ToolPak check box, and then
click OK.
A multiple linear regression model is a linear equation that has the general form: y
= b1x1 + b2x2 + … + c where y is the dependent variable, x1, x2… are the
independent variable, and c is the (estimated) intercept. The intercept of the
regression line is just the predicted value for y, when x is 0. Any line has an
equation, in terms of its slope and intercept: y = slope x x + intercept.
In the above data, the ‘Number of weekly riders’ is a dependent variable that
depends on the ‘Price per week ($)’, ‘Population of city’, ‘Monthly income of
riders ($)’, ‘Average parking rates per month ($)’.
Let us assign the variables:
Price per week ($) – x1
Population of city – x2
Monthly income of riders ($) – x3
Average parking rates per month ($)- x4
Number of weekly riders – y
The linear model would be of the form: y = ax1 + bx2 + cx3 + dx4 + e where a, b,
c, d are the respective coefficients and e is the intercept.
There are a two different ways to create the linear model on Microsoft Excel. In
this article, we will take a look at the Regression function included in the Data
Analysis ToolPak.
After the Data Analysis ToolPak has been enabled, you will be able to see it on the
Ribbon, under the Data tab:
Click Data Analysis to open the Data Analysis ToolPak, and select Regression
from the Analysis tools that are displayed.
Advantages Disadvantages
It not only provides a measure of how It can only be used to predict discrete
appropriate a predictor(coefficient size)is, functions. Hence, the dependent
but also its direction of association (positive variable of Logistic Regression is bound
or negative). to the discrete number set.
Good accuracy for many simple data sets Logistic Regression requires average or
and it performs well when the dataset is no multicollinearity between
linearly separable. independent variables.
There is no universally accepted definition of an NN. But perhaps most people in the field would agree
that an NN is a network of many simple processors (“units”), each possibly having a small amount of
local memory. The units are connected by communication channels (“connections”) which usually carry
numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their
local data and on the inputs they receive via the connections. The restriction to local operations is often
relaxed during training.
Some NNs are models of biological neural networks and some are not, but historically, much of the
inspiration for the field of NNs came from the desire to produce artificial systems capable of
sophisticated, perhaps “intelligent”, computations similar to those that the human brain routinely
performs, and thereby possibly to enhance our understanding of the human brain.
Most NNs have some sort of “training” rule whereby the weights of connections are adjusted on the basis
of data. In other words, NNs “learn” from examples (as children learn to recognize dogs from examples
of dogs) and exhibit some capability for generalization beyond the training data.
NNs normally have great potential for parallelism, since the computations of the components are largely
independent of each other. Some people regard massive parallelism and high connectivity to be defining
characteristics of NNs, but such requirements rule out various simple models, such as simple linear
regression (a minimal feed forward net with only two units plus bias), which are usefully regarded as
special cases of NNs.
2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.
We can also say that: Neural networks are parameterised computational nonlinear algorithms for
(numerical) data/signal/image processing. These algorithms are either implemented on a general-purpose
computer or are built into a dedicated hardware.
From the point of view of their active or decoding phase, artificial neural networks can be classified into
feed forward (static) and feedback (dynamic, recurrent) systems.
From the point of view of their learning or encoding phase, artificial neural networks can be classified
into supervised and unsupervised systems.
Feed forward supervised networks: This network is typically used for function approximation tasks.
Specific examples include:
Feed forward unsupervised networks: These networks are used to extract important properties of the input
data and to map input data into a “representation” domain. Two basic groups of methods belong to this
category.
Perceptron:
The perceptron was introduced by McCulloch and Pitts in 1943 as an artificial neuron with a hard-
limiting activation function. Recently the term multilayer perceptron has often been used as a synonym
for the term multilayer feedforward neural network.
Neural Network
A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a
process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, artificial in
nature.
Neural networks can help computers make intelligent decisions with limited human assistance. This is because they can learn
and model the relationships between input and output data that are nonlinear and complex.
Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine
learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking
the way that biological neurons signal to one another. Examples of various types of neural networks are Hopfield network, the
multilayer perceptron, the Boltzmann machine, and the Kohonen network. The most commonly used and successful neural
network is the multilayer perceptron and will be discussed in detail.
ANNs have the ability to learn and model non-linear and complex relationships, which is really important because in real-
life, many of the relationships between inputs and outputs are non-linear as well as complex.
Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after
testing phase, to classify unknown data.
Say, max A was 100 and min A was 20 ( That means maximum and minimum
values for the attribute ).
Feed-forward neural networks: one of the simplest variants of neural networks. They pass information in one direction,
through various input nodes, until it makes it to the output node. The network may or may not have hidden node layers,
making their functioning more interpretable. It is prepared to process large amounts of noise. This type of ANN
computational model is used in technologies such as facial recognition and computer vision.
Recurrent neural networks: more complex. They save the output of processing nodes and feed the result back into the model.
This is how the model is said to learn to predict the outcome of a layer. Each node in the RNN model acts as a memory cell,
continuing the computation and implementation of operations. This neural network starts with the same front propagation as a
feed-forward network, but then goes on to remember all processed information in order to reuse it in the future. If the
network's prediction is incorrect, then the system self-learns and continues working towards the correct prediction during
Convolutional neural networks: one of the most popular models used today. This neural network computational model uses a
variation of multilayer perceptronsand contains one or more convolutional layers that can be either entirely connected or
pooled. These convolutional layers create feature maps that record a region of image which is ultimately broken into
rectangles and sent out for nonlinear The CNN model is particularly popular in the realm of image recognition; it has been
used in many of the most advanced applications of AI, including facial recognition, text digitization and natural language
processing. Other uses include paraphrase detection, signal processing and image classification.
Deconvolutional neural networks: utilize a reversed CNN model process. They aim to find lost features or signals that may
have originally been considered unimportant to the CNN system's task. This network model can be used in image synthesis
and analysis.
Modular neural networks: contain multiple neural networks working separately from one another. The networks do not
communicate or interfere with each other's activities during the computation process. Consequently, complex or big
Parallel processing abilities mean the network can perform more than one job at a time.
The ability to learn and model nonlinear, complex relationships helps model the real-life relationships between input and
output.
Fault tolerance means the corruption of one or more cells of the ANN will not stop the generation of output.
Gradual corruption means the network will slowly degrade over time, instead of a problem destroying the network instantly.
The ability to produce output with incomplete knowledge with the loss of performance being based on how important the
No restrictions are placed on the input variables, such as how they should be distributed.
Machine learning means the ANN can learn from events and make decisions based on the observations.
The ability to learn hidden relationships in the data without commanding any fixed relationship means an ANN can better
The ability to generalize and infer unseen relationships on unseen data means ANNs can predict the output of unseen data.
The lack of rules for determining the proper network structure means the appropriate artificial neural network architecture can
The requirement of processors with parallel processing abilities makes neural networks hardware-dependent.
The network works with numerical information, therefore all problems must be translated into numerical values before they
The lack of explanation behind probing solutions is one of the biggest disadvantages in ANNs. The inability to explain the
why or how behind the solution generates a lack of trust in the network.
Image recognition was one of the first areas to which neural networks were successfully applied, but the technology uses have
Chatbots
These are just a few specific areas to which neural networks are being applied today. Prime uses involve any process that
operates according to strict rules or patterns and has large amounts of data. If the data involved is too large for a human to make
sense of in a reasonable amount of time, the process is likely a prime candidate for automation through artificial neural networks.
Naïve Bayes and Bayesian Networks
A naive Bayesian network is a Bayesian network with a single root, all other nodes are children of the root,
and there are no edges between the other nodes.
Naive Bayes assumes conditional independence, P(X|Y,Z)=P(X|Z), Whereas more general Bayes Nets (sometimes
called Bayesian Belief Networks) will allow the user to specify which attributes are, in fact, conditionally independent.
1| Chi-Square
The χ2 test is a method which is used to test the hypothesis between two or more groups in order to check the independence
between the two variables. It is basically used to analyse the categorical data and evaluate Tests of Independence when using a
bivariate table. Some examples of Chi-Square tests are Fisher’s exact test, Binomial test, etc. The formula for calculating a Chi-
Square statistic is given as
2| Confusion Matrix
The confusion matrix is also known as Error matrix and is represented by a table which describes the performance of a
classification model on a set of test data in machine learning. In the above table, Class 1 is depicted as the positive table and
Class 2 is depicted as the negative table. It is a two-dimensional matrix where each row represents the instances in predictive
class while each column represents the instances in the actual class or you put the values in the other way. Here, TP (True
Positive) means the observation is positive and is predicted as positive, FP (False Positive) means observation is positive but is
predicted as negative, TN (True Negative) means the observation is negative and is predicted as negative and FN (False
Negative) means the observation is negative but it is predicted as positive.
3| Concordant-Discordant Ratio
In a pair of cases when one case is higher on both the variables than the other cases, it is known as a concordant pair. On the
other hand, in a pair of cases where one case is higher on one variable than the other case but lower on the other variable, it is
known as a discordant pair.
Suppose, there are a pair of observations (Xa, Ya) and (Xb, Yb)
Then, the pair is concordant if Xa>Xb and Ya>Yb or Xa<Xb and Ya<Yb
And the pair is discordant if Xa>Xb and Ya<Yb or Xa<Xb and Ya>Yb.
4| Confidence Interval
Confidence Interval or CI is the range of values which is required to meet a certain confidence level in order to estimate the
features of the total population. In the domain of machine learning, Confidence Intervals basically consist of a range of potential
values of an unknown population parameter and the factors which are affecting the width of the confidence interval are the
confidence level, size as well as variability of the sample.
5| Gini Co-efficient
The Gini coefficient or Gini Index is a popular metric for imbalanced class values. It is a statistical measure of distribution
developed by the Italian statistician Corrado Gini in 1912. The coefficient ranges from 0 to 1 where 0 represent perfect equality
and 1 represents perfect inequality. Here, if the value of an index is higher, then the data will be more dispersed.
This method is generally used to evaluate the performance of the classification model in machine learning and is calculated as the
ratio between the results obtained with and without the model. Here, the gain is defined as the ratio of the cumulative number of
targets to the total number of targets in the entire dataset and lift is defined as for how many times the model is better than the
random choice of cases.
7| Kolmogorov-Smirnov Chart
This non-parametric statistical test measures the performance of classification models where it is defined as the measure of the
degree of separation between the positive and negative distributions. The KS test is generally used to compare the equality of a
single sample with another.
8| Predictive Power
Predictive Power is a synthetic metric which satisfies interesting properties like it is always been 0 and 1 where 0 represents that
the feature subset has no predictive power and 1 represents that the feature subset has maximum predictive power.and is used to
select a good subset of features in any machine learning project.
9| AUC-ROC Curve
ROC or Receiver Operating Characteristics Curve is one of the most popular evaluation metrics for checking the performance of
a classification model. The curve plots two parameters, True Positive Rate (TPR) and False Positive Rate (FPR). Area Under
ROC curve is basically used as a measure of the quality of a classification model. Hence, the AUC-ROC curve is the performance
measurement for the classification problem at various threshold settings.
Root Mean Squared Error or RMSE is defined as the measure of the differences between the values predicted by a model and the
values actually observed. It is basically the square root of MSE, Mean Squared Error which is the average of the squared error
used as the loss function for least squares regression.
Predictive models are proving to be quite helpful in predicting the future growth of businesses, as it predicts outcomes using
data mining and probability, where each model consists of a number of predictors or variables. A statistical model can,
therefore, be created by collecting the data for relevant variables.