0% found this document useful (0 votes)

13 views

Cap8 Predicting Continuous Target Variables with Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science

Uploaded by

Richard Murcia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Cap8 Predicting Continuous Target Variables with Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science

Uploaded by

Richard Murcia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Predicting Continuous

Target Variables with

Regression Analysis
Throughout the previous chapters, you learned a lot about the main concepts
behind supervised learning and trained many different models for classification tasks
to predict group memberships or categorical variables. In this chapter, we will take
a dive into another subcategory of supervised learning: regression analysis.

Regression models are used to predict target variables on a continuous scale,

which makes them attractive for addressing many questions in science as well as
applications in industry, such as understanding relationships between variables,
evaluating trends, or making forecasts. One example would be predicting the sales
of a company in future months.

In this chapter, we will discuss the main concepts of regression models and cover
the following topics:

• Exploring and visualizing datasets

• Looking at different approaches to implement linear regression models
• Training regression models that are robust to outliers
• Evaluating regression models and diagnosing common problems
• Fitting regression models to nonlinear data
Copyright 2016. Packt Publishing.

[ 1181 ]
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES
AN: 1250754 ; Thakur, Ankita.; Python: Real-World Data Science
Account: undeloan.main.ehost
Predicting Continuous Target Variables with Regression Analysis

Introducing a simple linear regression

model
The goal of simple (univariate) linear regression is to model the relationship between
a single feature (explanatory variable x) and a continuous valued response (target
variable y). The equation of a linear model with one explanatory variable is defined
as follows:

y = w0 + w1 x

Here, the weight w0 represents the y axis intercepts and w1 is the coefficient of
the explanatory variable. Our goal is to learn the weights of the linear equation to
describe the relationship between the explanatory variable and the target variable,
which can then be used to predict the responses of new explanatory variables that
were not part of the training dataset.

Based on the linear equation that we defined previously, linear regression can be
understood as finding the best-fitting straight line through the sample points, as
shown in the following figure:

This best-fitting line is also called the regression line, and the vertical lines from the
regression line to the sample points are the so-called offsets or residuals—the errors
of our prediction.

[ 1182 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

The special case of one explanatory variable is also called simple linear regression,
but of course we can also generalize the linear regression model to multiple
explanatory variables. Hence, this process is called multiple linear regression:

m
y = w0 x0 + w1 x1 +…+ wm xm = ∑wi xi = wT x
i =0

Here, w0 is the y axis intercept with x0 = 1 .

Exploring the Housing Dataset

Before we implement our first linear regression model, we will introduce a new
dataset, the Housing Dataset, which contains information about houses in the
suburbs of Boston collected by D. Harrison and D.L. Rubinfeld in 1978. The Housing
Dataset has been made freely available and can be downloaded from the UCI machine
learning repository at https://archive.ics.uci.edu/ml/datasets/Housing.

The features of the 506 samples may be summarized as shown in the excerpt of the
dataset description:

• CRIM: This is the per capita crime rate by town

• ZN: This is the proportion of residential land zoned for lots larger than
25,000 sq.ft.
• INDUS: This is the proportion of non-retail business acres per town
• CHAS: This is the Charles River dummy variable (this is equal to 1 if tract
bounds river; 0 otherwise)
• NOX: This is the nitric oxides concentration (parts per 10 million)
• RM: This is the average number of rooms per dwelling
• AGE: This is the proportion of owner-occupied units built prior to 1940
• DIS: This is the weighted distances to five Boston employment centers
• RAD: This is the index of accessibility to radial highways
• TAX: This is the full-value property-tax rate per $10,000
• PTRATIO: This is the pupil-teacher ratio by town
• B: This is calculated as 1000(Bk - 0.63)^2, where Bk is the proportion of
people of African American descent by town
• LSTAT: This is the percentage lower status of the population
• MEDV: This is the median value of owner-occupied homes in $1000s

[ 1183 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Predicting Continuous Target Variables with Regression Analysis

For the rest of this chapter, we will regard the housing prices (MEDV) as our
target variable—the variable that we want to predict using one or more of the 13
explanatory variables. Before we explore this dataset further, let's fetch it from the
UCI repository into a pandas DataFrame:
>>> import pandas as pd
>>> df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/housing/housing.data',
... header=None, sep='\s+')
>>> df.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS',
... 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
... 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
>>> df.head()

To confirm that the dataset was loaded successfully, we displayed the first five lines
of the dataset, as shown in the following screenshot:

Visualizing the important characteristics of a

dataset
Exploratory Data Analysis (EDA) is an important and recommended first step prior
to the training of a machine learning model. In the rest of this section, we will use
some simple yet useful techniques from the graphical EDA toolbox that may help
us to visually detect the presence of outliers, the distribution of the data, and the
relationships between features.

First, we will create a scatterplot matrix that allows us to visualize the pair-wise
correlations between the different features in this dataset in one place. To plot the
scatterplot matrix, we will use the pairplot function from the seaborn library
(http://stanford.edu/~mwaskom/software/seaborn/), which is a Python library
for drawing statistical plots based on matplotlib:
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> sns.set(style='whitegrid', context='notebook')

[ 1184 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

>>> cols = ['LSTAT', 'INDUS', 'NOX', 'RM', 'MEDV']

>>> sns.pairplot(df[cols], size=2.5)
>>> plt.show()

As we can see in the following figure, the scatterplot matrix provides us with a
useful graphical summary of the relationships in a dataset:

Importing the seaborn library modifies the default aesthetics of

matplotlib for the current Python session. If you do not want to
use seaborn's style settings, you can reset the matplotlib settings
by executing the following command:
>>> sns.reset_orig()

[ 1185 ]

Due to space constraints and for purposes of readability, we only plotted five
columns from the dataset: LSTAT, INDUS, NOX, RM, and MEDV. However,
you are encouraged to create a scatterplot matrix of the whole DataFrame to
further explore the data.

Using this scatterplot matrix, we can now quickly eyeball how the data is distributed
and whether it contains outliers. For example, we can see that there is a linear
relationship between RM and the housing prices MEDV (the fifth column of the
fourth row). Furthermore, we can see in the histogram (the lower right subplot in
the scatter plot matrix) that the MEDV variable seems to be normally distributed
but contains several outliers.

Note that in contrast to common belief, training a linear regression model

does not require that the explanatory or target variables are normally
distributed. The normality assumption is only a requirement for certain
statistical tests and hypothesis tests that are beyond the scope of this book
(Montgomery, D. C., Peck, E. A., and Vining, G. G. Introduction to linear
regression analysis. John Wiley and Sons, 2012, pp.318–319).

To quantify the linear relationship between the features, we will now create a
correlation matrix. A correlation matrix is closely related to the covariance matrix
that we have seen in the section about principal component analysis (PCA) in
Chapter 4, Building Good Training Sets – Data Preprocessing. Intuitively, we can
interpret the correlation matrix as a rescaled version of the covariance matrix.
In fact, the correlation matrix is identical to a covariance matrix computed from
standardized data.

The correlation matrix is a square matrix that contains the Pearson product-moment
correlation coefficients (often abbreviated as Pearson's r), which measure the linear
dependence between pairs of features. The correlation coefficients are bounded
to the range -1 and 1. Two features have a perfect positive correlation if r = 1 , no
correlation if r = 0 , and a perfect negative correlation if r = −1 , respectively. As
mentioned previously, Pearson's correlation coefficient can simply be calculated as
the covariance between two features x and y (numerator) divided by the product
of their standard deviations (denominator):

∑ ( x( ) − µ ) ( y ( ) − µ )
n i i
i =1 x y σ xy
r= =
σ xσ y
∑ ( x( ) − µ ) ∑ ( y ( ) − µ )
n 2 n 2
i i
i =1 x i =1 y

[ 1186 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

Here, µ denotes the sample mean of the corresponding feature, σ xy is the

covariance between the features x and y , and σ x and σ y are the features'
standard deviations, respectively.

We can show that the covariance between standardized features is in

fact equal to their linear correlation coefficient.
Let's first standardize the features x and y , to obtain their z-scores
which we will denote as x′ and y ′ , respectively:
x − µx y − µy
x′ = , y′ =
σx σy
Remember that we calculate the (population) covariance between two
features as follows:
1 n (i )
σ xy = ∑ x − µx
n i
( ) ( y( ) − µ )
i
y

Since standardization centers a feature variable at mean 0, we can now

calculate the covariance between the scaled features as follows:
1 n
σ xy′ = ∑ ( x '− 0 )( y '− 0 )
n i
Through resubstitution, we get the following result:

1 n  x − µx  y − µy 
∑
n i  σx
  
 σ y 
1 n

∑ ( x( ) − µ ) ( y ( ) − µ )
i i
x y
n ⋅ σ xσ y i

We can simplify it as follows:

σ xy
σ 'xy =
σ xσ y

[ 1187 ]

In the following code example, we will use NumPy's corrcoef function on the five
feature columns that we previously visualized in the scatterplot matrix, and we will
use seaborn's heatmap function to plot the correlation matrix array as a heat map:
>>> import numpy as np
>>> cm = np.corrcoef(df[cols].values.T)
>>> sns.set(font_scale=1.5)
>>> hm = sns.heatmap(cm,
... cbar=True,
... annot=True,
... square=True,
... fmt='.2f',
... annot_kws={'size': 15},
... yticklabels=cols,
... xticklabels=cols)
>>> plt.show()

As we can see in the resulting figure, the correlation matrix provides us with another
useful summary graphic that can help us to select features based on their respective
linear correlations:

To fit a linear regression model, we are interested in those features that have a high
correlation with our target variable MEDV. Looking at the preceding correlation
matrix, we see that our target variable MEDV shows the largest correlation with
the LSTAT variable (-0.74). However, as you might remember from the scatterplot
matrix, there is a clear nonlinear relationship between LSTAT and MEDV. On the
other hand, the correlation between RM and MEDV is also relatively high (0.70) and
given the linear relationship between those two variables that we observed in the
scatterplot, RM seems to be a good choice for an explanatory variable to introduce
the concepts of a simple linear regression model in the following section.
[ 1188 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

Implementing an ordinary least squares

linear regression model
At the beginning of this chapter, we discussed that linear regression can be
understood as finding the best-fitting straight line through the sample points of
our training data. However, we have neither defined the term best-fitting nor have
we discussed the different techniques of fitting such a model. In the following
subsections, we will fill in the missing pieces of this puzzle using the Ordinary
Least Squares (OLS) method to estimate the parameters of the regression line that
minimizes the sum of the squared vertical distances (residuals or errors) to the
sample points.

Solving regression for regression parameters

with gradient descent
Consider our implementation of the ADAptive LInear NEuron (Adaline) from
Chapter 2, Training Machine Learning Algorithms for Classification; we remember that
the artificial neuron uses a linear activation function and we defined a cost function
J ( ⋅) , which we minimized to learn the weights via optimization algorithms, such as
Gradient Descent (GD) and Stochastic Gradient Descent (SGD). This cost function
in Adaline is the Sum of Squared Errors (SSE). This is identical to the OLS cost
function that we defined:

1 n (i )
( )
2
J ( w) = ∑ y − yˆ ( )
i

2 i =1

Here, ŷ is the predicted value yˆ = wT x (note that the term 1/2 is just used for
convenience to derive the update rule of GD). Essentially, OLS linear regression
can be understood as Adaline without the unit step function so that we obtain
continuous target values instead of the class labels -1 and 1. To demonstrate the
similarity, let's take the GD implementation of Adaline from Chapter 2, Training
Machine Learning Algorithms for Classification, and remove the unit step function to
implement our first linear regression model:
class LinearRegressionGD(object):

def init(self, eta=0.001, n_iter=20):

self.eta = eta

[ 1189 ]

self.n_iter = n_iter

def fit(self, X, y):

self.w_ = np.zeros(1 + X.shape[1])
self.cost_ = []

for i in range(self.n_iter):
output = self.net_input(X)
errors = (y - output)
self.w_[1:] += self.eta * X.T.dot(errors)
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self

def net_input(self, X):

return np.dot(X, self.w_[1:]) + self.w_[0]

def predict(self, X):

return self.net_input(X)

If you need a refresher about how the weights are being updated—taking a step in
the opposite direction of the gradient—please revisit the Adaline section in Chapter 2,
Training Machine Learning Algorithms for Classification.

To see our LinearRegressionGD regressor in action, let's use the RM (number of

rooms) variable from the Housing Data Set as the explanatory variable to train a
model that can predict MEDV (the housing prices). Furthermore, we will standardize
the variables for better convergence of the GD algorithm. The code is as follows:
>>> X = df[['RM']].values
>>> y = df['MEDV'].values
>>> from sklearn.preprocessing import StandardScaler
>>> sc_x = StandardScaler()
>>> sc_y = StandardScaler()
>>> X_std = sc_x.fit_transform(X)
>>> y_std = sc_y.fit_transform(y)
>>> lr = LinearRegressionGD()
>>> lr.fit(X_std, y_std)

[ 1190 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

We discussed in Chapter 2, Training Machine Learning Algorithms for Classification,

that it is always a good idea to plot the cost as a function of the number of epochs
(passes over the training dataset) when we are using optimization algorithms, such
as gradient descent, to check for convergence. To cut a long story short, let's plot the
cost against the number of epochs to check if the linear regression has converged:
>>> plt.plot(range(1, lr.n_iter+1), lr.cost_)
>>> plt.ylabel('SSE')
>>> plt.xlabel('Epoch')
>>> plt.show()

As we can see in the following plot, the GD algorithm converged after the fifth epoch:

Next, let's visualize how well the linear regression line fits the training data. To do
so, we will define a simple helper function that will plot a scatterplot of the training
samples and add the regression line:
>>> def lin_regplot(X, y, model):
... plt.scatter(X, y, c='blue')
... plt.plot(X, model.predict(X), color='red')
... return None

Now, we will use this lin_regplot function to plot the number of rooms against
house prices:
>>> lin_regplot(X_std, y_std, lr)
>>> plt.xlabel('Average number of rooms [RM] (standardized)')
>>> plt.ylabel('Price in $1000\'s [MEDV] (standardized)')
>>> plt.show()

[ 1191 ]

As we can see in the following plot, the linear regression line reflects the general
trend that house prices tend to increase with the number of rooms:

Although this observation makes intuitive sense, the data also tells us that the
number of rooms does not explain the house prices very well in many cases. Later
in this chapter, we will discuss how to quantify the performance of a regression
model. Interestingly, we also observe a curious line y = 3 , which suggests that the
prices may have been clipped. In certain applications, it may also be important to
report the predicted outcome variables on their original scale. To scale the predicted
price outcome back on the Price in $1000's axes, we can simply apply the
inverse_transform method of the StandardScaler:

>>> num_rooms_std = sc_x.transform([5.0])

>>> price_std = lr.predict(num_rooms_std)
>>> print("Price in $1000's: %.3f" % \
... sc_y.inverse_transform(price_std))
Price in $1000's: 10.840

In the preceding code example, we used the previously trained linear regression
model to predict the price of a house with five rooms. According to our model,
such a house is worth $10,840.

[ 1192 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

On a side note, it is also worth mentioning that we technically don't have to update
the weights of the intercept if we are working with standardized variables since the
y axis intercept is always 0 in those cases. We can quickly confirm this by printing
the weights:
>>> print('Slope: %.3f' % lr.w_[1])
Slope: 0.695
>>> print('Intercept: %.3f' % lr.w_[0])
Intercept: -0.000

Estimating the coefficient of a regression

model via scikit-learn
In the previous section, we implemented a working model for regression
analysis. However, in a real-world application, we may be interested in more
efficient implementations, for example, scikit-learn's LinearRegression object
that makes use of the LIBLINEAR library and advanced optimization algorithms
that work better with unstandardized variables. This is sometimes desirable for
certain applications:
>>> from sklearn.linear_model import LinearRegression
>>> slr = LinearRegression()
>>> slr.fit(X, y)
>>> print('Slope: %.3f' % slr.coef_[0])
Slope: 9.102
>>> print('Intercept: %.3f' % slr.intercept_)
Intercept: -34.671

As we can see by executing the preceding code, scikit-learn's LinearRegression

model fitted with the unstandardized RM and MEDV variables yielded different
model coefficients. Let's compare it to our own GD implementation by plotting
MEDV against RM:
>>> lin_regplot(X, y, slr)
>>> plt.xlabel('Average number of rooms [RM]')
>>> plt.ylabel('Price in $1000\'s [MEDV]')
>>> plt.show()w

[ 1193 ]

Now, when we plot the training data and our fitted model by executing the code
above, we can see that the overall result looks identical to our GD implementation:

As an alternative to using machine learning libraries, there is also

a closed-form solution for solving OLS involving a system of linear
equations that can be found in most introductory statistics textbooks:

w =(XT X ) XT y
−1

We can implement it in Python as follows:

# adding a column vector of "ones"
>>> Xb = np.hstack((np.ones((X.shape[0], 1)), X))
>>> w = np.zeros(X.shape[1])
>>> z = np.linalg.inv(np.dot(Xb.T, Xb))
>>> w = np.dot(z, np.dot(Xb.T, y))
>>> print('Slope: %.3f' % w[1])
Slope: 9.102
>>> print('Intercept: %.3f' % w[0])
Intercept: -34.671
The advantage of this method is that it is guaranteed to find the optimal
solution analytically. However, if we are working with very large
datasets, it can be computationally too expensive to invert the matrix in
this formula (sometimes also called the normal equation) or the sample
matrix may be singular (non-invertible), which is why we may prefer
iterative methods in certain cases.
If you are interested in more information on how to obtain the normal
equations, I recommend you take a look at Dr. Stephen Pollock's chapter,
The Classical Linear Regression Model from his lectures at the University
of Leicester, which are available for free at http://www.le.ac.uk/
users/dsgp1/COURSES/MESOMET/ECMETXT/06mesmet.pdf.

[ 1194 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

Fitting a robust regression model using

RANSAC
Linear regression models can be heavily impacted by the presence of outliers.
In certain situations, a very small subset of our data can have a big effect on the
estimated model coefficients. There are many statistical tests that can be used to
detect outliers, which are beyond the scope of the book. However, removing
outliers always requires our own judgment as a data scientist, as well as our
domain knowledge.

As an alternative to throwing out outliers, we will look at a robust method of

regression using the RANdom SAmple Consensus (RANSAC) algorithm,
which fits a regression model to a subset of the data, the so-called inliers.

We can summarize the iterative RANSAC algorithm as follows:

1. Select a random number of samples to be inliers and fit the model.

2. Test all other data points against the fitted model and add those points
that fall within a user-given tolerance to the inliers.
3. Refit the model using all inliers.
4. Estimate the error of the fitted model versus the inliers.
5. Terminate the algorithm if the performance meets a certain user-defined
threshold or if a fixed number of iterations has been reached; go back to
step 1 otherwise.

Let's now wrap our linear model in the RANSAC algorithm using scikit-learn's
RANSACRegressor object:

>>> from sklearn.linear_model import RANSACRegressor

>>> ransac = RANSACRegressor(LinearRegression(),
... max_trials=100,
... min_samples=50,
... residual_metric=lambda x: np.sum(np.abs(x), axis=1),
... residual_threshold=5.0,
... random_state=0)
>>> ransac.fit(X, y)

[ 1195 ]

We set the maximum number of iterations of the RANSACRegressor to 100, and using
min_samples=50, we set the minimum number of the randomly chosen samples to
be at least 50. Using the residual_metric parameter, we provided a callable lambda
function that simply calculates the absolute vertical distances between the fitted line
and the sample points. By setting the residual_threshold parameter to 5.0, we
only allowed samples to be included in the inlier set if their vertical distance to the
fitted line is within 5 distance units, which works well on this particular dataset. By
default, scikit-learn uses the MAD estimate to select the inlier threshold, where MAD
stands for the Median Absolute Deviation of the target values y. However, the choice
of an appropriate value for the inlier threshold is problem-specific, which is one
disadvantage of RANSAC. Many different approaches have been developed over the
recent years to select a good inlier threshold automatically. You can find a detailed
discussion in R. Toldo and A. Fusiello's. Automatic Estimation of the Inlier Threshold in
Robust Multiple Structures Fitting (in Image Analysis and Processing–ICIAP 2009,
pages 123–131. Springer, 2009).

After we have fitted the RANSAC model, let's obtain the inliers and outliers from the
fitted RANSAC linear regression model and plot them together with the linear fit:
>>> inlier_mask = ransac.inlier_mask_
>>> outlier_mask = np.logical_not(inlier_mask)
>>> line_X = np.arange(3, 10, 1)
>>> line_y_ransac = ransac.predict(line_X[:, np.newaxis])
>>> plt.scatter(X[inlier_mask], y[inlier_mask],
... c='blue', marker='o', label='Inliers')
>>> plt.scatter(X[outlier_mask], y[outlier_mask],
... c='lightgreen', marker='s', label='Outliers')
>>> plt.plot(line_X, line_y_ransac, color='red')
>>> plt.xlabel('Average number of rooms [RM]')
>>> plt.ylabel('Price in $1000\'s [MEDV]')
>>> plt.legend(loc='upper left')
>>> plt.show()

[ 1196 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

As we can see in the following scatterplot, the linear regression model was fitted on
the detected set of inliers shown as circles:

When we print the slope and intercept of the model executing the following code,
we can see that the linear regression line is slightly different from the fit that we
obtained in the previous section without RANSAC:
>>> print('Slope: %.3f' % ransac.estimator_.coef_[0])
Slope: 9.621
>>> print('Intercept: %.3f' % ransac.estimator_.intercept_)
Intercept: -37.137

Using RANSAC, we reduced the potential effect of the outliers in this dataset,
but we don't know if this approach has a positive effect on the predictive
performance for unseen data. Thus, in the next section we will discuss how to
evaluate a regression model for different approaches, which is a crucial part of
building systems for predictive modeling.

[ 1197 ]

Evaluating the performance of linear

regression models
In the previous section, we discussed how to fit a regression model on training data.
However, you learned in previous chapters that it is crucial to test the model on data
that it hasn't seen during training to obtain an unbiased estimate of its performance.

As we remember from Chapter 6, Learning Best Practices for Model Evaluation and
Hyperparameter Tuning, we want to split our dataset into separate training and
test datasets where we use the former to fit the model and the latter to evaluate its
performance to generalize to unseen data. Instead of proceeding with the simple
regression model, we will now use all variables in the dataset and train a multiple
regression model:
>>> from sklearn.cross_validation import train_test_split
>>> X = df.iloc[:, :-1].values
>>> y = df['MEDV'].values
>>> X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.3, random_state=0)
>>> slr = LinearRegression()
>>> slr.fit(X_train, y_train)
>>> y_train_pred = slr.predict(X_train)
>>> y_test_pred = slr.predict(X_test)

Since our model uses multiple explanatory variables, we can't visualize the linear
regression line (or hyperplane to be precise) in a two-dimensional plot, but we
can plot the residuals (the differences or vertical distances between the actual and
predicted values) versus the predicted values to diagnose our regression model.
Those residual plots are a commonly used graphical analysis for diagnosing
regression models to detect nonlinearity and outliers, and to check if the errors
are randomly distributed.

[ 1198 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

Using the following code, we will now plot a residual plot where we simply subtract
the true target variables from our predicted responses:
>>> plt.scatter(y_train_pred, y_train_pred - y_train,
... c='blue', marker='o', label='Training data')
>>> plt.scatter(y_test_pred, y_test_pred - y_test,
... c='lightgreen', marker='s', label='Test data')
>>> plt.xlabel('Predicted values')
>>> plt.ylabel('Residuals')
>>> plt.legend(loc='upper left')
>>> plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color='red')
>>> plt.xlim([-10, 50])
>>> plt.show()

After executing the code, we should see a residual plot with a line passing through
the x axis origin as shown here:

In the case of a perfect prediction, the residuals would be exactly zero, which we will
probably never encounter in realistic and practical applications. However, for a good
regression model, we would expect that the errors are randomly distributed and
the residuals should be randomly scattered around the centerline. If we see patterns
in a residual plot, it means that our model is unable to capture some explanatory
information, which is leaked into the residuals as we can slightly see in our preceding
residual plot. Furthermore, we can also use residual plots to detect outliers, which are
represented by the points with a large deviation from the centerline.

[ 1199 ]

Another useful quantitative measure of a model's performance is the so-called

Mean Squared Error (MSE), which is simply the average value of the SSE cost
function that we minimize to fit the linear regression model. The MSE is useful
to for comparing different regression models or for tuning their parameters via
a grid search and cross-validation:

1 n (i )
( )
2
MSE = ∑
n i =1
y − yˆ (i )

Execute the following code:

>>> from sklearn.metrics import mean_squared_error
>>> print('MSE train: %.3f, test: %.3f' % (
mean_squared_error(y_train, y_train_pred),
mean_squared_error(y_test, y_test_pred)))

We will see that the MSE on the training set is 19.96, and the MSE of the test set is
much larger with a value of 27.20, which is an indicator that our model is overfitting
the training data.

Sometimes it may be more useful to report the coefficient of determination ( R 2 ), which

can be understood as a standardized version of the MSE, for better interpretability of
the model performance. In other words, R 2 is the fraction of response variance that is
captured by the model. The R 2 value is defined as follows:

SSE
R2 = 1 −
SST

Here, SSE is the sum of squared errors and SST is the total sum of squares
( )
2
SST = ∑ i =1 y ( ) − µ y
n i
, or in other words, it is simply the variance of the response.

Let's quickly show that R 2 is indeed just a rescaled version of the MSE:

SSE
R2 = 1 −
SST

1 n
( )
2
∑ y ( ) − yˆ ( )
i i

1− n
i =1

1 n
( )
2
∑ y( ) − µ y
i

n i =1

[ 1200 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

MSE
1−
Var ( y )

For the training dataset, R 2 is bounded between 0 and 1, but it can become
negative for the test set. If R 2 = 1 , the model fits the data perfectly with a
corresponding MSE = 0 .

Evaluated on the training data, the R 2 of our model is 0.765, which doesn't sound
too bad. However, the R 2 on the test dataset is only 0.673, which we can compute
by executing the following code:
>>> from sklearn.metrics import r2_score
>>> print('R^2 train: %.3f, test: %.3f' %
... (r2_score(y_train, y_train_pred),
... r2_score(y_test, y_test_pred)))

Using regularized methods for regression

As we discussed in Chapter 3, A Tour of Machine Learning Classifiers Using
Scikit-learn, regularization is one approach to tackle the problem of overfitting by
adding additional information, and thereby shrinking the parameter values of the
model to induce a penalty against complexity. The most popular approaches to
regularized linear regression are the so-called Ridge Regression, Least Absolute
Shrinkage and Selection Operator (LASSO) and Elastic Net method.

Ridge regression is an L2 penalized model where we simply add the squared sum of
the weights to our least-squares cost function:

( )
2
J ( w ) Ridge = ∑ y (i ) − yˆ (i ) +λ w 2
2
i =1

Here:

m
L2 : λ w 22 = λ ∑w j 2
j =1

By increasing the value of the hyperparameter λ , we increase the regularization

strength and shrink the weights of our model. Please note that we don't regularize
the intercept term w0 .

[ 1201 ]

An alternative approach that can lead to sparse models is the LASSO. Depending
on the regularization strength, certain weights can become zero, which makes the
LASSO also useful as a supervised feature selection technique:

( )
2
J ( w ) LASSO = ∑ y (i ) − yˆ (i ) +λ w 1
i =1

Here:

m
L1: λ w 1 = λ ∑ w j
j =1

However, a limitation of the LASSO is that it selects at most n variables if m > n . A

compromise between Ridge regression and the LASSO is the Elastic Net, which has a
L1 penalty to generate sparsity and a L2 penalty to overcome some of the limitations
of the LASSO, such as the number of selected variables.
n m m

( )
2
J ( w ) ElasticNet = ∑ y (i ) − yˆ (i ) + λ1 ∑w j 2 + λ2 ∑ w j
i =1 j =1 j =1

Those regularized regression models are all available via scikit-learn, and the
usage is similar to the regular regression model except that we have to specify the
regularization strength via the parameter λ , for example, optimized via k-fold
cross-validation.

A Ridge Regression model can be initialized as follows:

>>> from sklearn.linear_model import Ridge
>>> ridge = Ridge(alpha=1.0)

Note that the regularization strength is regulated by the parameter alpha, which is
similar to the parameter λ . Likewise, we can initialize a LASSO regressor from the
linear_model submodule:

>>> from sklearn.linear_model import Lasso

>>> lasso = Lasso(alpha=1.0)

Lastly, the ElasticNet implementation allows us to vary the L1 to L2 ratio:

>>> from sklearn.linear_model import ElasticNet
>>> lasso = ElasticNet(alpha=1.0, l1_ratio=0.5)

[ 1202 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

For example, if we set l1_ratio to 1.0, the ElasticNet regressor would be

equal to LASSO regression. For more detailed information about the different
implementations of linear regression, please see the documentation at
http://scikit-learn.org/stable/modules/linear_model.html.

Turning a linear regression model into a

curve – polynomial regression
In the previous sections, we assumed a linear relationship between explanatory and
response variables. One way to account for the violation of linearity assumption is
to use a polynomial regression model by adding polynomial terms:

y = w0 + w1 x + w2 x 2 x 2 + ... + wd x d

Here, d denotes the degree of the polynomial. Although we can use polynomial
regression to model a nonlinear relationship, it is still considered a multiple
linear regression model because of the linear regression coefficients w .

We will now discuss how to use the PolynomialFeatures transformer class from
scikit-learn to add a quadratic term ( d = 2 ) to a simple regression problem with
one explanatory variable, and compare the polynomial to the linear fit. The steps
are as follows:

1. Add a second degree polynomial term:

from sklearn.preprocessing import PolynomialFeatures
>>> X = np.array([258.0, 270.0, 294.0,
… 320.0, 342.0, 368.0,
… 396.0, 446.0, 480.0,
… 586.0])[:, np.newaxis]

>>> y = np.array([236.4, 234.4, 252.8,

… 298.6, 314.2, 342.2,
… 360.8, 368.0, 391.2,
… 390.8])
>>> lr = LinearRegression()
>>> pr = LinearRegression()
>>> quadratic = PolynomialFeatures(degree=2)
>>> X_quad = quadratic.fit_transform(X)

[ 1203 ]

2. Fit a simple linear regression model for comparison:

>>> lr.fit(X, y)
>>> X_fit = np.arange(250,600,10)[:, np.newaxis]
>>> y_lin_fit = lr.predict(X_fit)

3. Fit a multiple regression model on the transformed features for

polynomial regression:
>>> pr.fit(X_quad, y)
>>> y_quad_fit = pr.predict(quadratic.fit_transform(X_fit))
Plot the results:
>>> plt.scatter(X, y, label='training points')
>>> plt.plot(X_fit, y_lin_fit,
... label='linear fit', linestyle='--')
>>> plt.plot(X_fit, y_quad_fit,
... label='quadratic fit')
>>> plt.legend(loc='upper left')
>>> plt.show()

In the resulting plot, we can see that the polynomial fit captures the relationship
between the response and explanatory variable much better than the linear fit:

>>> y_lin_pred = lr.predict(X)

>>> y_quad_pred = pr.predict(X_quad)
>>> print('Training MSE linear: %.3f, quadratic: %.3f' % (
... mean_squared_error(y, y_lin_pred),
... mean_squared_error(y, y_quad_pred)))
Training MSE linear: 569.780, quadratic: 61.330
>>> print('Training R^2 linear: %.3f, quadratic: %.3f' % (

[ 1204 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

... r2_score(y, y_lin_pred),

... r2_score(y, y_quad_pred)))
Training R^2 linear: 0.832, quadratic: 0.982

As we can see after executing the preceding code, the MSE decreased from 570
(linear fit) to 61 (quadratic fit), and the coefficient of determination reflects a closer
fit to the quadratic model ( R 2 = 0.982 ) as opposed to the linear fit ( R 2 = 0.832 ) in
this particular toy problem.

Modeling nonlinear relationships in the

Housing Dataset
After we discussed how to construct polynomial features to fit nonlinear relationships
in a toy problem, let's now take a look at a more concrete example and apply those
concepts to the data in the Housing Dataset. By executing the following code, we will
model the relationship between house prices and LSTAT (percent lower status of the
population) using second degree (quadratic) and third degree (cubic) polynomials
and compare it to a linear fit.

The code is as follows:

>>> X = df[['LSTAT']].values
>>> y = df['MEDV'].values
>>> regr = LinearRegression()

# create polynomial features

>>> quadratic = PolynomialFeatures(degree=2)
>>> cubic = PolynomialFeatures(degree=3)
>>> X_quad = quadratic.fit_transform(X)
>>> X_cubic = cubic.fit_transform(X)

# linear fit
>>> X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis]
>>> regr = regr.fit(X, y)
>>> y_lin_fit = regr.predict(X_fit)
>>> linear_r2 = r2_score(y, regr.predict(X))

# quadratic fit
>>> regr = regr.fit(X_quad, y)
>>> y_quad_fit = regr.predict(quadratic.fit_transform(X_fit))
>>> quadratic_r2 = r2_score(y, regr.predict(X_quad))

# cubic fit

[ 1205 ]

>>> regr = regr.fit(X_cubic, y)

>>> y_cubic_fit = regr.predict(cubic.fit_transform(X_fit))
>>> cubic_r2 = r2_score(y, regr.predict(X_cubic))

# plot results
>>> plt.scatter(X, y,
... label='training points',
... color='lightgray')
>>> plt.plot(X_fit, y_lin_fit,
... label='linear (d=1), $R^2=%.2f$'
... % linear_r2,
... color='blue',
... lw=2,
... linestyle=':')
>>> plt.plot(X_fit, y_quad_fit,
... label='quadratic (d=2), $R^2=%.2f$'
... % quadratic_r2,
... color='red',
... lw=2,
... linestyle='-')
>>> plt.plot(X_fit, y_cubic_fit,
... label='cubic (d=3), $R^2=%.2f$'
... % cubic_r2,
... color='green',
... lw=2,
... linestyle='--')
>>> plt.xlabel('% lower status of the population [LSTAT]')
>>> plt.ylabel('Price in $1000\'s [MEDV]')
>>> plt.legend(loc='upper right')
>>> plt.show()

As we can see in the resulting plot, the cubic fit captures the relationship between
the house prices and LSTAT better than the linear and quadratic fit. However, we
should be aware that adding more and more polynomial features increases the
complexity of a model and therefore increases the chance of overfitting. Thus, in
practice, it is always recommended that you evaluate the performance of the model
on a separate test dataset to estimate the generalization performance:

[ 1206 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

In addition, polynomial features are not always the best choice for modeling nonlinear
relationships. For example, just by looking at the MEDV-LSTAT scatterplot, we could
propose that a log transformation of the LSTAT feature variable and the square root of
MEDV may project the data onto a linear feature space suitable for a linear regression
fit. Let's test this hypothesis by executing the following code:
# transform features
>>> X_log = np.log(X)
>>> y_sqrt = np.sqrt(y)

# fit features
>>> X_fit = np.arange(X_log.min()-1,
... X_log.max()+1, 1)[:, np.newaxis]
>>> regr = regr.fit(X_log, y_sqrt)
>>> y_lin_fit = regr.predict(X_fit)
>>> linear_r2 = r2_score(y_sqrt, regr.predict(X_log))

# plot results
>>> plt.scatter(X_log, y_sqrt,
... label='training points',
... color='lightgray')
>>> plt.plot(X_fit, y_lin_fit,
... label='linear (d=1), $R^2=%.2f$' % linear_r2,
... color='blue',
... lw=2)
>>> plt.xlabel('log(% lower status of the population [LSTAT])')

[ 1207 ]

>>> plt.ylabel('$\sqrt{Price \; in \; \$1000\'s [MEDV]}$')

>>> plt.legend(loc='lower left')
>>> plt.show()

After transforming the explanatory onto the log space and taking the square root
of the target variables, we were able to capture the relationship between the two
variables with a linear regression line that seems to fit the data better ( R 2 = 0.69 )
than any of the polynomial feature transformations previously:

Dealing with nonlinear relationships using

random forests
In this section, we are going to take a look at random forest regression, which is
conceptually different from the previous regression models in this chapter. A random
forest, which is an ensemble of multiple decision trees, can be understood as the sum
of piecewise linear functions in contrast to the global linear and polynomial regression
models that we discussed previously. In other words, via the decision tree algorithm,
we are subdividing the input space into smaller regions that become more manageable.

[ 1208 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

Decision tree regression

An advantage of the decision tree algorithm is that it does not require any
transformation of the features if we are dealing with nonlinear data. We remember
from Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn, that we grow
a decision tree by iteratively splitting its nodes until the leaves are pure or a stopping
criterion is satisfied. When we used decision trees for classification, we defined
entropy as a measure of impurity to determine which feature split maximizes the
Information Gain (IG), which can be defined as follows for a binary split:

N left N right
IG ( D p , xi ) = I ( D p ) − I ( Dleft ) − I ( Dright )
Np Np

Here, x is the feature to perform the split, N p is the number of samples in the
parent node, I is the impurity function, Dp is the subset of training samples in the
parent node, and Dleft and Dright are the subsets of training samples in the left and
right child node after the split. Remember that our goal is to find the feature split
that maximizes the information gain, or in other words, we want to find the feature
split that reduces the impurities in the child nodes. In Chapter 3, A Tour of Machine
Learning Classifiers Using Scikit-learn, we used entropy as a measure of impurity,
which is a useful criterion for classification. To use a decision tree for regression,
we will replace entropy as the impurity measure of a node t by the MSE:

1
∑ ( y ( ) − yˆ )
2
I ( t ) = MSE ( t ) = i
t
Nt i∈Dt

Here, Nt is the number of training samples at node t , Dt is the training subset

at node t , y (i ) is the true target value, and yˆ t is the predicted target value
(sample mean):

1
yˆt =
N
∑ y( )
i∈Dt
i

[ 1209 ]

In the context of decision tree regression, the MSE is often also referred to as
within-node variance, which is why the splitting criterion is also better known
as variance reduction. To see what the line fit of a decision tree looks like, let's use
the DecisionTreeRegressor implemented in scikit-learn to model the nonlinear
relationship between the MEDV and LSTAT variables:
>>> from sklearn.tree import DecisionTreeRegressor
>>> X = df[['LSTAT']].values
>>> y = df['MEDV'].values
>>> tree = DecisionTreeRegressor(max_depth=3)
>>> tree.fit(X, y)
>>> sort_idx = X.flatten().argsort()
>>> lin_regplot(X[sort_idx], y[sort_idx], tree)
>>> plt.xlabel('% lower status of the population [LSTAT]')
>>> plt.ylabel('Price in $1000\'s [MEDV]')
>>> plt.show()

As we can see from the resulting plot, the decision tree captures the general
trend in the data. However, a limitation of this model is that it does not capture
the continuity and differentiability of the desired prediction. In addition, we
need to be careful about choosing an appropriate value for the depth of the tree
to not overfit or underfit the data; here, a depth of 3 seems to be a good choice:

[ 1210 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

In the next section, we will take a look at a more robust way for fitting regression
trees: random forests.

Random forest regression

As we discussed in Chapter 3, A Tour of Machine Learning Classifiers Using
Scikit-learn, the random forest algorithm is an ensemble technique that combines
multiple decision trees. A random forest usually has a better generalization
performance than an individual decision tree due to randomness that helps to
decrease the model variance. Other advantages of random forests are that they are
less sensitive to outliers in the dataset and don't require much parameter tuning.
The only parameter in random forests that we typically need to experiment with
is the number of trees in the ensemble. The basic random forests algorithm for
regression is almost identical to the random forest algorithm for classification that
we discussed in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn.
The only difference is that we use the MSE criterion to grow the individual decision
trees, and the predicted target variable is calculated as the average prediction over all
decision trees.

Now, let's use all the features in the Housing Dataset to fit a random forest
regression model on 60 percent of the samples and evaluate its performance
on the remaining 40 percent. The code is as follows:
>>> X = df.iloc[:, :-1].values
>>> y = df['MEDV'].values
>>> X_train, X_test, y_train, y_test =\
... train_test_split(X, y,
... test_size=0.4,
... random_state=1)

>>> from sklearn.ensemble import RandomForestRegressor

>>> forest = RandomForestRegressor( ..
. n_estimators=1000,
... criterion='mse',
... random_state=1,
... n_jobs=-1)
>>> forest.fit(X_train, y_train)
>>> y_train_pred = forest.predict(X_train)
>>> y_test_pred = forest.predict(X_test)

[ 1211 ]

>>> print('MSE train: %.3f, test: %.3f' % (

... mean_squared_error(y_train, y_train_pred),
... mean_squared_error(y_test, y_test_pred)))
>>> print('R^2 train: %.3f, test: %.3f' % (
... r2_score(y_train, y_train_pred),
... r2_score(y_test, y_test_pred)))
MSE train: 1.642, test: 11.635
R^2 train: 0.960, test: 0.871

Unfortunately, we see that the random forest tends to overfit the training data.
However, it's still able to explain the relationship between the target and
explanatory variables relatively well ( R 2 = 0.871 on the test dataset).

Lastly, let's also take a look at the residuals of the prediction:

>>> plt.scatter(y_train_pred,
... y_train_pred - y_train,
... c='black',
... marker='o',
... s=35,
... alpha=0.5,
... label='Training data')
>>> plt.scatter(y_test_pred,
... y_test_pred - y_test,
... c='lightgreen',
... marker='s',
... s=35,
... alpha=0.7,
... label='Test data')
>>> plt.xlabel('Predicted values')
>>> plt.ylabel('Residuals')
>>> plt.legend(loc='upper left')
>>> plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color='red')
>>> plt.xlim([-10, 50])
>>> plt.show()

[ 1212 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

As it was already summarized by the R 2 coefficient, we can see that the model
fits the training data better than the test data, as indicated by the outliers in the y
axis direction. Also, the distribution of the residuals does not seem to be completely
random around the zero center point, indicating that the model is not able to
capture all the exploratory information. However, the residual plot indicates a
large improvement over the residual plot of the linear model that we plotted
earlier in this chapter:

In Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn,

we also discussed the kernel trick that can be used in combination
with support vector machine (SVM) for classification, which is useful
if we are dealing with nonlinear problems. Although a discussion is
beyond of the scope of this book, SVMs can also be used in nonlinear
regression tasks. The interested reader can find more information
about Support Vector Machines for regression in an excellent report
by S. R. Gunn: S. R. Gunn et al. Support Vector Machines for Classification
and Regression. (ISIS technical report, 14, 1998). An SVM regressor is
also implemented in scikit-learn, and more information about its usage
can be found at http://scikit-learn.org/stable/modules/
generated/sklearn.svm.SVR.html#sklearn.svm.SVR.

[ 1213 ]

[ 1214 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use
Chapter 8

[ 1215 ]

[ 1216 ]

EBSCOhost - printed on 2/3/2024 10:14 AM via UNIVERSIDAD DE LOS ANDES. All use subject to https://www.ebsco.com/terms-of-use

Setup and Administration For SAP Cloud ALM
No ratings yet
Setup and Administration For SAP Cloud ALM
122 pages
10 20 Pressure Rasj RGSJ Manual Rev1110
No ratings yet
10 20 Pressure Rasj RGSJ Manual Rev1110
36 pages
GARMIN GNS 430W/530W: Sample Training Syllabus and Flight Lessons For Use by Flight Schools & Flying Clubs
100% (2)
GARMIN GNS 430W/530W: Sample Training Syllabus and Flight Lessons For Use by Flight Schools & Flying Clubs
26 pages
data mining
No ratings yet
data mining
2 pages
Item-Level Factor Analysis
No ratings yet
Item-Level Factor Analysis
15 pages
Myp Math Extended Unit 02
No ratings yet
Myp Math Extended Unit 02
6 pages
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
No ratings yet
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
3 pages
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
No ratings yet
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
3 pages
Analysis Method
No ratings yet
Analysis Method
1 page
SLM CO2 Session12
No ratings yet
SLM CO2 Session12
6 pages
POM Module3 Unit2
No ratings yet
POM Module3 Unit2
4 pages
Ch07 Measurement and Scaling
No ratings yet
Ch07 Measurement and Scaling
11 pages
Chapter 3 تلخيص شامل مشمل
No ratings yet
Chapter 3 تلخيص شامل مشمل
10 pages
Computational Models of Developmental Psychology
No ratings yet
Computational Models of Developmental Psychology
26 pages
Ch8 - 1533638090646993.pdf 2
No ratings yet
Ch8 - 1533638090646993.pdf 2
42 pages
2106.02039v4
No ratings yet
2106.02039v4
17 pages
Brauer & Curtin - Linear Mixed Effects Models and the Analysis of Nonindependent Data
No ratings yet
Brauer & Curtin - Linear Mixed Effects Models and the Analysis of Nonindependent Data
24 pages
Myp Math Standard Unit 02
No ratings yet
Myp Math Standard Unit 02
4 pages
DSC_3.1_BS_NS
No ratings yet
DSC_3.1_BS_NS
5 pages
Response to Reviewer Comments
No ratings yet
Response to Reviewer Comments
6 pages
2310.14336v3
No ratings yet
2310.14336v3
17 pages
Unit 4 Activities With Lessons
100% (1)
Unit 4 Activities With Lessons
24 pages
Lecture 2
No ratings yet
Lecture 2
13 pages
MANSCI Midterm Exam Notes: Chapter 1
No ratings yet
MANSCI Midterm Exam Notes: Chapter 1
3 pages
PR2 M2 Quantitative Research Designs
No ratings yet
PR2 M2 Quantitative Research Designs
8 pages
Predictive Analytics: Course Syllabus
No ratings yet
Predictive Analytics: Course Syllabus
8 pages
BCSE334L_PREDICTIVE-ANALYTICS_TH_1.0_71_BCSE334L_66 ACP (2)
No ratings yet
BCSE334L_PREDICTIVE-ANALYTICS_TH_1.0_71_BCSE334L_66 ACP (2)
2 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Ism 211 2023 Course Outline
No ratings yet
Ism 211 2023 Course Outline
6 pages
Data Analytivs-Unit-2
No ratings yet
Data Analytivs-Unit-2
24 pages
How To Choose Among Three Forecasting Methods
No ratings yet
How To Choose Among Three Forecasting Methods
9 pages
Multivariable Feedback Control: Analysis and Design: January 2005
No ratings yet
Multivariable Feedback Control: Analysis and Design: January 2005
3 pages
Preface
No ratings yet
Preface
4 pages
Practical Research 2
No ratings yet
Practical Research 2
6 pages
Patterns Mined +frequent Patterns
No ratings yet
Patterns Mined +frequent Patterns
18 pages
Syllabus: Stat656 - Applied Analytics
No ratings yet
Syllabus: Stat656 - Applied Analytics
1 page
Curriculum-PGP in Big Data Analytics and Optimization
No ratings yet
Curriculum-PGP in Big Data Analytics and Optimization
16 pages
Correlation and Regression - Interview Questions in Business Analytics
No ratings yet
Correlation and Regression - Interview Questions in Business Analytics
5 pages
Chapter 5
No ratings yet
Chapter 5
42 pages
Q Methodology A Technique To Assess Perceptions.29
No ratings yet
Q Methodology A Technique To Assess Perceptions.29
8 pages
Social Work Research
No ratings yet
Social Work Research
2 pages
Reflective Journal Writing 6_1733814927 (1)
No ratings yet
Reflective Journal Writing 6_1733814927 (1)
4 pages
ML Paper
No ratings yet
ML Paper
21 pages
Brainstorming Ideas - 241130 - 155309
No ratings yet
Brainstorming Ideas - 241130 - 155309
18 pages
The shape of learning curve
No ratings yet
The shape of learning curve
20 pages
lulu-0012_test-ml
No ratings yet
lulu-0012_test-ml
3 pages
Qualitative
No ratings yet
Qualitative
5 pages
AI Learning
No ratings yet
AI Learning
43 pages
Keeping Up With Quants
No ratings yet
Keeping Up With Quants
13 pages
PCK 304 W3 Output 2.1 (Lyka)
No ratings yet
PCK 304 W3 Output 2.1 (Lyka)
3 pages
Corrected 1 Chapter 3
No ratings yet
Corrected 1 Chapter 3
16 pages
Multivariable Feedback Control Analysis and Design
No ratings yet
Multivariable Feedback Control Analysis and Design
3 pages
Fundamentals of Predictive Analytics: Part One
No ratings yet
Fundamentals of Predictive Analytics: Part One
39 pages
Book 2.0 - Python
100% (1)
Book 2.0 - Python
143 pages
5C002 Learner Assignment Guidance - CIPD - 5CO02 - 23 - 01
No ratings yet
5C002 Learner Assignment Guidance - CIPD - 5CO02 - 23 - 01
6 pages
Introduction and Data Collection
No ratings yet
Introduction and Data Collection
3 pages
PR Notes
No ratings yet
PR Notes
7 pages
L5 Theoretical Background & Conceptual Framework
No ratings yet
L5 Theoretical Background & Conceptual Framework
4 pages
JMB Meta Failure
No ratings yet
JMB Meta Failure
35 pages
Unit 7 - Introduction To Predictive Analytics
No ratings yet
Unit 7 - Introduction To Predictive Analytics
10 pages
Design Patterns Made Easy: A Practical Guide with Examples
From Everand
Design Patterns Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Multiplying and Dividing Fractions, Grades 5 - 8
From Everand
Multiplying and Dividing Fractions, Grades 5 - 8
Schyrlet Cameron
No ratings yet
Adding and Subtracting Fractions, Grades 5 - 8
From Everand
Adding and Subtracting Fractions, Grades 5 - 8
Schyrlet Cameron
No ratings yet
X3 Performance Tuning: Improving Read Performance Leveraging The ADXFTL Parameter
No ratings yet
X3 Performance Tuning: Improving Read Performance Leveraging The ADXFTL Parameter
2 pages
DLL-WK 2-LC 2
No ratings yet
DLL-WK 2-LC 2
15 pages
KODAK PREPS Imposition Software 9.0-V7-20211123 - 205319
No ratings yet
KODAK PREPS Imposition Software 9.0-V7-20211123 - 205319
222 pages
English Sample Unit: Pictures Tell The Story! Stage 2
No ratings yet
English Sample Unit: Pictures Tell The Story! Stage 2
18 pages
Wiggins Vr300 Non Pressurising Ultra Fast Fil Diesel Refuelling Systems
No ratings yet
Wiggins Vr300 Non Pressurising Ultra Fast Fil Diesel Refuelling Systems
3 pages
PNL38 WGS On MiSeq 508
No ratings yet
PNL38 WGS On MiSeq 508
20 pages
A2- DECAL- BOX
No ratings yet
A2- DECAL- BOX
2 pages
Unit 1
No ratings yet
Unit 1
23 pages
SBFP Program Terminal Report 2019
100% (1)
SBFP Program Terminal Report 2019
4 pages
2022 GKS-G Application Guidelines (English)
No ratings yet
2022 GKS-G Application Guidelines (English)
38 pages
Error_Handling_in_Compiler_Design
No ratings yet
Error_Handling_in_Compiler_Design
3 pages
Blacked Out: 5-str. Electric Bass
No ratings yet
Blacked Out: 5-str. Electric Bass
1 page
Ventilador Sechrist - Millenniun - Service Manual
0% (1)
Ventilador Sechrist - Millenniun - Service Manual
64 pages
Python Class7
No ratings yet
Python Class7
18 pages
Trueblend sb4
No ratings yet
Trueblend sb4
168 pages
Lean & Six Sigma For Clinical Laboratory by DR Annabel DSouza Sekar
No ratings yet
Lean & Six Sigma For Clinical Laboratory by DR Annabel DSouza Sekar
27 pages
SOP-M-003 Device Master Record Rev A
100% (1)
SOP-M-003 Device Master Record Rev A
3 pages
Mini Project Topics
No ratings yet
Mini Project Topics
15 pages
2016-07 EN Standard Classification
No ratings yet
2016-07 EN Standard Classification
68 pages
Department of Education: Bids and Awards Committee (BAC) Members
No ratings yet
Department of Education: Bids and Awards Committee (BAC) Members
3 pages
Mans. CV
No ratings yet
Mans. CV
3 pages
Department of Education: Intervention Program Monitoring Tool
No ratings yet
Department of Education: Intervention Program Monitoring Tool
3 pages
Jane Resumgo: Graphic Designer
No ratings yet
Jane Resumgo: Graphic Designer
2 pages
Change MGMT
No ratings yet
Change MGMT
15 pages
Smart Irrigation System
100% (1)
Smart Irrigation System
14 pages
Introduction To Control System PDF
No ratings yet
Introduction To Control System PDF
36 pages
Ipad 8 Gen Info
No ratings yet
Ipad 8 Gen Info
2 pages