0% found this document useful (0 votes)
26 views

Lab 1: Model Selection

This lab aims to illustrate model selection by estimating the impulse response of two systems using finite impulse response (FIR) models of varying orders. The document: 1) Describes generating simulated input/output data from a discrete-time system and collecting experimental measurement data from an analog bandpass filter. 2) Explains estimating the impulse response coefficients using least squares to minimize the cost function, and how the model quality depends on factors like the model order, data length, and noise level. 3) Tasks involve generating data, estimating models of increasing orders, and evaluating cost functions like estimation error and AIC to determine the optimal model order that balances fit and complexity. Reducing the noise level is found to

Uploaded by

thomasverbeke
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lab 1: Model Selection

This lab aims to illustrate model selection by estimating the impulse response of two systems using finite impulse response (FIR) models of varying orders. The document: 1) Describes generating simulated input/output data from a discrete-time system and collecting experimental measurement data from an analog bandpass filter. 2) Explains estimating the impulse response coefficients using least squares to minimize the cost function, and how the model quality depends on factors like the model order, data length, and noise level. 3) Tasks involve generating data, estimating models of increasing orders, and evaluating cost functions like estimation error and AIC to determine the optimal model order that balances fit and complexity. Reducing the noise level is found to

Uploaded by

thomasverbeke
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lab 1: Model selection

Lab objectives
the selection of a good model (number of model parameters), the eect of the number of model parameters on the value of the cost functions, that the optimal model order can depend on the signal-to-noise ratio of the data and the number of data points.

The goal of this lab is to illustrate:

Introduction

In this lab we will be estimating a nite impulse response of two systems, both, in the presence of output noise. First, we will perform a simulation on a discrete-time system to generate the input (u0 (t)) and output (y (t)) time samples (Section 3) and then we will perform experimental measurements on a bandpass analog lter (Section 4). The general simulation and experimental layout is shown in Figure 1.

Figure 1: Measurement setup. The input u0 (t) is computer generated and ltered through the system to obtain a noise free output y0 (t) to which noise (v (t)) is added giving the nal noisy output y (t). Through a convolution summation, the input and noisy output can be related via the systems impulse response (g0 (t)) as given below:

y0 (t) =
=0

u0 (t )g0 ( ) t = 0, 1, 2, . . .

(1)

y (t) = y0 (t) + v (t) The systems we will be working with have in theory an impulse response that is innitely long (IIR-system), which is why the summation in equation (1) goes to innity. We will approximate this with a nite impulse response (FIR-system) of length I (the model order) and investigate how an optimum value for I can be obtained. Therefore, our FIR model is as given below:
I

y (t) =
=0

u0 (t ) g ( ) t = 0, 1, 2, . . .

(2)

Equation (2) is now the model that we will t on to the input and output data. Notice that the summation goes up to I and this acts as the model order and g ( ), the impulse response coecients, as the model parameters to be estimated. Equation (2) can be written in a compact matrix form (work this out yourself ) as: = H Y (3)

: The vector of model output values Y size N 1 With: H : The regressor or observation matrix size N (I + 1) : The parameter vector. = [ g (0), g (1), . . . , g (I )] size (I + 1) 1 Models that can be written as in equation (3) are said to be linear-in-the-parameters. Starting from the input-output measurements, an estimate of the impulse response coecients can be obtained by minimizing the following least squares cost function: 1 V (, N ) = N
N 1

|y (t) y (t)|2
t=0

(4)

However, since this model is linear-in-the-parameters, the optimum parameter vector ) that minimises the cost function can be calculated explicitly as follows: ( = (H H )1 H Y (5)

In equation (5), Y is the vector (size N 1) of measured noisy output samples, and in practice a numerically stable method is used to evaluate equation (5), (see Appendix 2.C of the lecture notes). will strongly depend on the model order I , the The quality of the estimates length of the data N , and the signal-to-noise-ratio of the data. Increasing the order I will improve the quality of the approximation at a cost of using more parameters 2

to describe the system. This will increase the uncertainty on the estimated model. In order to nd the best model, a balance should be made between the model errors and the noise errors. To achieve this, we will use the robustied AIC cost function given as: N) 1 + 2 I VAIC = V (, (6) N Remarks: 1. In practice, if starting the measurements at t = 0, the inputs before t = 0 are unknown (while they also appear in H ). To circumvent this problem, Nt additional samples (called transient points ) are measured before t = 0. In total, Nt + N points are measured, from which only the last N output points are used to construct the matrix equation (3) and the Nt rst output samples are discarded. 2. To assess the model quality and the eectiveness of the AIC method, in this lab, a set of validation data is measured for comparison purposes. These validation data are not used for estimation of the parameters, but for validating the model. The data used to estimate the model are called estimation data. The lengths of these data sets are denoted as Nt + Nv and Nt + Ne , respectively.

Simulation
nb k k=0 bk z , na k k=0 ak z

The transfer function (G0 (z )) of the discrete-time system we will be simulating is: G0 (z ) = (7)

with a = [a0 , a1 , ..., ana ] and similarly for b the lter coecients and they are:
a = [2.765e-001 -3.464e-001 6.141e-001 -4.371e-001 4.410e-001 -1.645e-001 9.9619e-002] b = [8.002e-004 1.9427e-002 -4.5489e-002 1.245e-002 3.050e-002 -1.928e-002 1.902e-003]

We will generate the input samples and noise samples via the Matlab randn function and will generate two input-output data sets, one for model estimation and the other for model validation. Note: Make sure that in your Matlab code, signal lengths, standard deviations and model order are set as variables and not hard coded, so that you can examine dierent settings easily and comment your code suciently for later reference.

type help randn to get help on randn function, or similarly on any other function

Task 1: data generation Using the randn function generate two input signals ue (t) (estimation) and uv (t) (validation), of lengths 2000 and 11000, respectively. These lengths include Nt = 1000 transient points. The standard deviation of both signals is u = 1. Similarly, using the randn function generate two noise signals ve (t) and vv (t) of the same lengths but with a standard deviation of v = 0.5. Filter both ue (t) and uv (t) via the filter function to generate the noise free outputs y0e (t) and y0v (t). y0e = filter(b, a, ue ), similarly for y0v . Add the corresponding noise signals to y0e (t) and y0v (t) to obtain the nal noisy outputs ye (t) and yv (t). ye = y0e + ve , similarly for yv (t). Note: In practical problems the validation set is selected (much) smaller than the estimation set. Here we selected many more validation points than estimation points in order to illustrate that the method works ne. Task 2: impulse response estimation Using the linear least squares solution (equation 5), we will now estimate the parameter vector (impulse response coecients) with varying model orders, I = 0, 1, . . . , 100. Set I in your Matlab code as a variable so that it can be changed as required. For a given I , using the estimation input ue (t) set up the observation matrix H and using the noisy output ye (t) set up the output vector Y . Dont forget to take the transient points into account. (equation 5). To avoid numerical ill Obtain the linear least squares solution conditioning problems, it is recommended to use the Matlab backslash operator = H \ Y . This is an estimate of the (\) to obtain a least squares estimate optimum impulse response coecients for the given model order I . Task 3: model selection ), we can now simulate outputs Having obtained an estimate of the model parameters ( for both inputs ue (t) and uv (t) and evaluate the corresponding cost functions. This will give an indication on the quality of our model. Using the filter function obtain model outputs y e (t) and y v (t) as follows: 1, ue ) and y 1, uv ). With these commands, y e (t) = filter(, v (t) = filter(, you perform the convolution in (2).
Hint: You can construct the observation matrix H for I = 100 and when you need H for some model order e.g. I = 10, just use the rst 10 columns of the full matrix.

For both the estimation and validation data, evaluate the cost function (4): Ne 1 Ne ) = 1 Ve (, e (t)|2 t=0 |ye (t) y Ne Nv 1 Nv ) = 1 |yv (t) y v (t)|2 Vv (,
Nv t=0

Also obtain the AIC cost function which is based on the estimation cost function. Ne ) 1 + 2 I VAIC = Ve (, Ne Now evaluate the three cost functions for an increased model order. In your Matlab code, this should be done via a for loop where I goes from 0 to 100.
2 and plot Normalise Ve , Vv and VAIC by dividing them with the noise variance v the normalised cost functions against the model order.

Task 4: reduced noise level The above analysis has been performed for a noise standard deviation of v = 0.5. We will reduce the noise level by setting v = 0.1 to investigate how the cost functions behave and the choice of our optimum model order. Repeat tasks 1, 2 and 3 for v = 0.1. Task 5: analysis We now have three cost functions (Ve , Vv , VAIC ) which can be used to evaluate the goodness of our model. Which of the cost functions decrease monotonically with the model order? Is this expected? Which of the cost functions decrease in value and then begin to increase above a certain model order? This indicates that there is an optimum model order. Why might this be? In real life applications, we may at times not have a validation data set. Which cost function can we then use to evaluate an optimum model order? What was the eect of reducing the noise level on the optimum model order?

Experimental measurement

We will now perform experimental measurements on a passive analogue bandpass lter . The central bandpass frequency can be adjusted and the data acquisition has a sampling frequency of 8kHz.

This experiment will be done with the help of a lab assistant

Task 6: data generation and acquisition Similar to the simulation task the experimental task is to acquire an estimation (ye (t) and ue (t)) and validation (yv (t) and uv (t)) data set and then estimate and validate an optimum nite impulse response model for the bandpass lter. The input will again be a random input signal generated using the randn Matlab function. You will do the Matlab coding on the computer by the measuring set-up. Using the randn function generate two (zero mean) input signals ue (t) and uv (t), of lengths 2000 and 11000, respectively. These lengths include Nt = 1000 transient points. Set the standard deviation of both signals as u = 1 (you might have to increase the u value if the noise level is too high). Save each input signal as a column vector on the local drive with some le name, e.g. input1.csv and input2.csv. To create a csv (comma separated value ) le use the following code: save(input1.csv,u,-ascii) where the input signal is variable u. Apply ue (t) and uv (t) on the bandpass lter with some output noise and acquire the corresponding noisy output signals ye (t) and yv (t). Adjust (increase or decrease) the noise level on the noise generator and measure a new estimation and validation data set for this noise setting. Task 7: data processing Repeat tasks 2, 3 and 5 for the acquired data. The normalisation of the cost functions need not be applied since the noise standard deviation is unknown.

For any questions please email and arrange a date and time with one of the lab assistants. Anne Van Mulders [email protected] Philippe Dreesen [email protected] Konstantin Usevich [email protected] 6

You might also like