Fourier Transf Example
Fourier Transf Example
Objectives
Introduction
Time series analysis is concerned with data which consist of time-ordered sequences
of measurements on some phenomenon of interest. These phenomena include, rainfall
and other weather measurements (with a strong emphasis on forecasting), signal
processing (physical and engineering sciences), financial (e.g. stock market prices)
and biological data sets (e.g. incidence of disease in a population).
Spectral analysis can be applied to a time series to explore cyclical patterns within
these data. The purpose of the analysis is to decompose a complex time series with
cyclical components into a few underlying sinusoidal (sine and cosine) functions of
particular wavelengths. The term “spectrum” provides an appropriate metaphor for
the nature of this analysis: Suppose you study a beam of white light, which at first
looks like a random (white noise) accumulation of light at different wavelengths.
However, when you put through a prism, we can separate the different wavelengths or
cyclical components that make up “white” sunlight. In fact, via this technique we can
now identify and distinguish between different sources of light. Thus, by identifying
the important underlying cyclical components, we have learnt something about the
phenomenon of interest. In essence performance of spectral analysis on a time series
is like putting the series through a prism in order to identify the wavelengths and
importance of underlying cyclical components. As a result of a successful analysis
one might uncover just a few recurring cycles of different lengths in the time series of
interest, which at first looked more or less like random noise.
Periodic Function
(1)
where:
In 1807, Fourier shows that any periodic signal can be decomposed as a series of
sinusoidal curves such as the one above, corresponding to each possible frequency in
the signal. The signal in Figure 2 is the sum of the 4 sinusoidal curves in figure 3 plus
a constant term.
2
0.2
0.1
Amplitude
0
0 52 104 156 208
-0.1
-0.2
Time
0.08
0.04
Amplitude
0
0 52 104 156 208
-0.04
-0.08
Time
Since:
cos(a+b) = cos(a)cos(b)-sin(a)sin(b)
Y(t) = Rcos(t+)
3
Y(t) = R(cos(t)cos() - sin(t)sin())
By setting:
In this form, a and b hold information about the phase and amplitude. It is under this
form that the Fourier transform outputs results for each frequency. The output is
usually expressed as a complex number of the form a + bi. It is also worth
remembering that R2 = a2 + b2.
(i) Prove the statement in the last sentence mathematically in your lab book.
(3)
where a0 is an offset (the mean value of Y(t)) and 2 =21 and 3 =31 etc.
Fortunately, time series data are usually not infinite. Hence for a 256 point time
series, the Fourier transform has 128 harmonics (i.e. it returns 128 complex numbers).
Y(t+T) = Y(t)
The Fourier series for Y(t) is given by equation (3) above. The Fourier coefficient a n and bn are:
and,
where .
4
Part 1. Manual calculation of a single Fourier coefficient in Excel
Here we will calculate the Fourier coefficients of the first harmonic of a time series
(1996-2001) of of Cryptosporidium cases in England and Wales.
(ii) Plot a graph in Excel of cases per week against time in weeks
For simplification of calculation, the time series includes 256 data points whereas in 5
years there are really 260 weeks. (Later when you use the Excel Fast Fourier
Transform option it requires the number of datapoints to be to the power of 2.).
Calculation of a single Fourier coefficient requires 4 steps: generating a sinusoidal
curve over the period to be tested, multiplication of the signal by the sine curve,
getting the area under the product curve and adjusting the phase to maximise the area
under the curve.
Y(t) = cost
= 2t/T
Y(t) = cos(2t/T)
5
Visual review of the data should indicate an annual variation. We start to determine
the Fourier coefficient that corresponds to 1 oscillation every year. Thus the period is
52 weeks. We define the period as a parameter in the spreadsheet, so that we can
modify it later.
The cosine curve is used here simply to have the curve starting with the crest at t=1.
The cell G4 being a parameter, we enter it using absolute references: $G$4
The area under the curve is an estimation of the contribution of oscillations at this
frequency in the signal. The integral of the product curve is the area underneath it, and
it can be approximated by summing the values of the entire series in column D;
6
Figure 3. A Graph showing the Product together with the area under the product
shaded black.
The black area is the area under the product curve. Since the cosine curve oscillates
between –1 and +1, the product curve also oscillates.
We arbitarily started the sinusoidal curve at the origin. This step involves moving the
curve along the time axis in order to maximise the area under the product curve.
When the area is maximum the two signals are “in phase.” This can be done by trial
or error, or by using the Excel solver.
The Microsoft Excel Solver:
Use Solver to determine the maximum or minimum value of one cell by changing other cells. For
example, the maximum profit you can generate by changing advertising expenditures. The cells you
select must be related through formulas on the worksheet. If not related, changing one cell will not
change the other.
b
To maximise the area we need to introduce into the formulas a new parameter for
displacing the sinusoidal curve, the lag, expressed in weeks.
To maximise the area manually, increase gradually (remember 1 unit is 1 week) the value in cell G3
until you get the largest possible value in cell G4.
However we can use the solver to let Excel get the optimum value for the lag to
maximise the area.
Excel will return the value for the area as 3703.9 with lag of 14.14. The area is the
value of R in equation (1).
7
Lets change the period to 26 weeks to find out the Fourier coefficients at this
frequency.
This yields a frequency of 9.8 (approximately 2 ctcles per year for the 10 years of
data) and an area of –169.44. We need to maximise the area by varying the lag using
the solver.
(v) Report the area (R value) and lag for the 26 week contribution. Discuss the
differences between the 26 and 52 week contributions.
Summary
Here we have calculated the Fourier coefficient , R, and the lag, , for two sinusoids
with periods 26 and 52 weeks respectively. This calculation is straightforward but
would become rather cumbersome if we had to repeat it 128 times, for each frequency
256 data points. In the next part we will use excel to do all the calculations at once.
8
Part 2: Computation of all Fourier coefficients using Excel Fast Fourier
Transform (FFT) add-in macro.
Computation
The analysis add-in macro should be loaded in order to use the FFT.
On the Tools menu select ‘Data Analysis’ then select ‘Fourier Analysis’
The Fourier transform dialog takes only 2 parameters: the input and output range:
Enter ‘$B$3:$B$258’ in ‘Input range’
‘$C$3’ in ‘Output range’
Click on OK
The output lists the coefficients expressed as complex numbers, starting at the cell
indicated for output range. The first coefficient (cell C3) does not have an imaginary
component. It corresponds to the sum of the data over the entire range of values. (a0
in equation ???). The next coefficient (cell C4) corresponds to 1 oscillation over the
256 data points (Period of 256 weeks). The next one (cell C5) corresponds to 2
oscillations (Period of 128 weeks). There are 256 coefficients, but the last 128
coefficients are mirror images of the first 128. Hence we will just consider the first
128.
The information that we want to get is for which period in weeks (or which
frequency) do we get the strongest oscillations in the signal. The periodogram is a
graph of the Magnitude of the Fourier coefficient (R) against the period (T) of the
oscillations. In order to get the period and R we need to enter a couple of additional
values and formulas.
Enter:
‘Period’ in D2
‘Energy’ in E2
‘F’ in F2
‘1’ in cell F4
‘=F4+1’ in cell F5
Copy formula dow to cell F131 (Cell F131 should have value 128)
‘=256/F4’ in cell D4
The magnitude is the sum of the square of the imaginary and real components of the
complex number.
Fourier Complex representation: a + bi
Magnitude is:
9
Excel provides a function IMABS() which extracts the magnitude R directly.
Looking at the E column shows that the ???? Crypto analysis here
Note 51.2 is closest to yearly cycle. It is not exactly 52 because the Fourier Transform
uses fractions of 256. This explains the discrepancy but does not cause problems in
interpreting the periodogram.
Compare the magnitude R with the value we got for R in the previous section.
Plot the periodogram (Magnitude on vertical axis and Period on horizontal axes)
Report what the periodogram tells us about the relative importance of each of the
sinusoidal components.
Summary
Here, we have used Excel to perform a spectral analysis of our signal. We have built
the periodogram and shown which sinusoids are important.
10
Part 3: Applications of Fourier transform in disease surveillance
To get the best model fit to the data we minimise the sum of squares of the difference
between the signal (raw data) and the model. This is the variance which we will call
“S2”. The regression coefficient R2 is the ratio of the covariance of the raw data with
the model divided by the product of the individual standard deviations:
The value of R2 gives the proportion of the overall variance (variation in the data)
accounted for by the model.
11
Enter ‘Model’ in C2
‘Diff^2’ in D2
‘Parameters’ in G2
‘a0’ in G3
‘R2’ in G5
‘S2’ in G6
‘Amp1’ in I3
‘Lag1’ in I4
‘Period1’ in I5
‘Amp2’ in I6
‘Lag2’ in I7
‘Period 2’ in I8
‘Amp3’ in I9
‘Lag3’ in I10
‘Period 3’ in I11
‘Amp4’ in I12
‘Lag4’ in I13
‘Period4’ in I14
Set cells H3, J3, J4, J6, J7, J9, J10, J12, J13 equal to ‘0’
We call the solver to optimise the parameters to get the best fit to the model.
Call the ‘Solver’ in the Tool menu
Enter $H$6 for ‘Set target cell’
Set ‘Equal to ‘Min’
Enter ‘H3;J3;J4;J6;J7;J9;J10;J12;J13’ in ‘by changing cell’
Click on ‘Solve’ button
Click ‘OK’ to keep solver solution
Look at the R-squared value and note the variance that is explained by the model
12
We can set up confidence intervals to identify epidemic increases and decreases in the
raw data. The 95% confidence intervals are approximately the mean model value at a
particular time 1.96*(sum of squares)/(number of observations).
Set F3 ‘=C3+1.96*$H$6/256’
Copy formula from F4 to F258
Plot a graph showing the raw data together with the model and the confidence
intervals.
Report on any epidemic increases or decreases that you find
Is there a way we could do this directly from the Fourier coefficients that would not
require the use of the Excel solver? If there is then explain how you would do it.
Finally generate data from the Fourier model to predict infection rates in the
following year (2002). Plot this on a graph.
13