0% found this document useful (0 votes)
53 views5 pages

Department of Computer Science Engineering: Solution

The document discusses polynomial regression for predicting a continuous target variable from an input variable. It aims to identify the underlying polynomial relationship as well as the polynomial coefficients and noise variance. Experimentally, coefficients are obtained for polynomials up to degree 4. The models are evaluated using RMSE and R-squared on training and test data, with degree 3 polynomial showing the best fit to the data. Goodness of fit is also evaluated on test data examples.

Uploaded by

Anuj gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views5 pages

Department of Computer Science Engineering: Solution

The document discusses polynomial regression for predicting a continuous target variable from an input variable. It aims to identify the underlying polynomial relationship as well as the polynomial coefficients and noise variance. Experimentally, coefficients are obtained for polynomials up to degree 4. The models are evaluated using RMSE and R-squared on training and test data, with degree 3 polynomial showing the best fit to the data. Goodness of fit is also evaluated on test data examples.

Uploaded by

Anuj gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Department of Computer Science Engineering

Objective :
The objective here is to implement the concepts of regression learnt in class via
polynomial curve fitting. To recap, polynomial curve fitting is an example of
regression. In regression, the objective is to learn a function that maps an
input variable x to a continuous target variable t. we provide a personalized
input file that contains data of the form: {(x1,t1),(x2,t2)……..(x100,t100)}.The
relation between x and t is of the form
t=w0+w1x +...+wM x^M +∈
W here ∈ the noise is drawn from a normal distribution with mean 0 and
unknown (but fixed, for a given file variance. M is also unknown. The end goal
is to identify the underlying polynomial (both the degree and the coefficients),
as well as to obtain an estimate of the noise variance.

Solution :
Polynomial Regression can be represented as where

pic 1
Where y = Dependent variable(target)
B0,B1,B2= Coefficients x,x^2,..x^n =
Independent variable n = Nth degree
polynomial
Important facts for Polynomial regression :
• Polynomial Regression is still called Linear because here we are not
talking about x variables. X is non linear here. We are talking about the
coefficients, which are can be expressed as linear combination of
coefficients, therefore we call it Linear.
• Polynomial regression is used to describe how diseases spread or
pandemics and epidemics spread across territorial across population,
also has other uses.
• Polynomial Curve fitting is used to find the best fit, under fit and the
overfit curve.
• Regularization is a technique for reducing error in Over fitting.

Findings
As given in the question , experimentally the coefficients are obtained for
different values till M = 4.
In this solution Least Square error estimation is used.
LSE is a method that builds a model and MSE is a metric that evaluates the
models performance.

The model is further evaluated using the RMSE technique.

Further fitting the polynomials to the training data, the coefficients are found.
(Table 1)
Table 1 : EXPERIMENTALLY OBTAINED COEFFICIENTS FOR DIFFERENT
VALUES OF M
DEGREE COEFFICIENTS

M=1 [-4.81383993e-21 -7.5755367 ... -1.82483381e-03


5.31393124e-05 3.11003571e-04]

M=2 [-7.11524978e-18 -1.45126818e-05 ...


4.85781810e-07 1.58807832e-06 2.31198562e-07]

[ 1.07691389e-13 -2.52892631e-03 -7.87032954e-04


M=3 ... -3.54634909e-09
4.62198448e-09 1.24360283e-09]

[ 1.88515543e-15 -1.42420827e-15 -1.01562280e-15


M=4 ... 2.36028066e-12
-2.46217932e-12 -8.51801310e-13]

Now considering the training data points. The model is evaluated using the
RMSE technique and R Squared Technique
R SQUARED: R-squared (R2) is a statistical measure that represents the proportion
of the variance for a dependent variable that's explained by an independent variable
or variables in a regression model.

Table 2.1 : ERROR ESTIMATION FOR DIFFERENT VALUES OF M AND MODEL


EVALUATION USING R SQUARED FOR TRAINING SET
Train
M=1 M=2 M=3 M=4

RMSE 2.034431137417738 1.247245931264856 0.610076340457083 7.972459358201153


6 3 e-14
R 0.512182479063938 0.816652244346515 0.956132814407133 1.0
9 8 3
SQUARE
D

Table 2.2: ERROR ESTIMATION FOR DIFFERENT VALUES OF M AND MODEL


EVALUATION USING R SQUARED FOR TRAINING SET
Test
M=1 M=2 M=3 M=4

RMSE 2.054915475901217 1.304866994024351 1.305689558622443 2.581719904421525


4 8

R 0.495880867119885 0.796727912689395 0.796471553803971 0.204272087511442


3 9 5
SQUARE
D

Goodness of Fit
The goodness of fit of a statistical model which describes how well it fits a set
of observations. Measures of goodness of fit typically summarize the
discrepancy between observed values and the values expected under the
model in question.

TABLE 3 : GOODNESS OF FIT ERROR FOR M = 3


Input Data Error
lin_reg_2.predict(poly_reg.fit_transform([[516.5,-329.3,87.1,-864.9,-
11.0,508.8,-231.9,-312.2,-351.8,-
421.9,505.0,16.6,46.6,217.2,-104.9,7.6,-44.6,139.38,-
0.11710891999999973
198.21,89.50,-166.7,-191.5,150.17,-7.65,245.06]]))

lin_reg_2.predict(poly_reg.fit_transform([[-
117.556,596.327,42.323,387.934,1021.922,408.283,8.105,-
159.560,47.261,-8.599,190.262,-361.792,-46.713,-
0.49721042000000004
420.427,26.256,30.433,243.693,256.705,-
168.689,125.353,351.223,88.779,-176.928,-195.086,-93.879]]))
lin_reg_2.predict(poly_reg.fit_transform([[-173.9,-
576.5,371.3,221.3,-258.6,-684.0,-382.6,198.7,-267.2,-
0.26411222000000034
345.8,209.3,633.2,52.8,262.0,-61.7,-91.7,-236.2,82.3,135.3,-115.0,-
128.8,-18.3,225.3,4.0,-68.2]]))

lin_reg_2.predict(poly_reg.fit_transform([[157.8,287.2,388.7,-
269.9,250.5,756.9,-
0.29160071
656.7,666.0,60.74,40.57,259.92,378.2,204.9,173.0,287.2,13.6,162.3,-
261.2,22.3,43.3,-81.8,21.4,-
259.17,205.5,282.4]]))
lin_reg_2.predict(poly_reg.fit_transform([[25.8,-
29.1,593.7,763.6,211.19,12.17,350.58,-87.98,47.8,182.6,-
0.09018786000000034
180.6,-262.6,309.78,-295.3,-204.09,115.79,-181.38,176.3,-
226.1,148.6,-439.7,-76.7,-441.67,-57.07,352.2]]))

Conclusion:
• If model complexity is greater doesn’t means it has lower estimation
error.
• The greater it has model complexity, the higher variance it has.

You might also like