0% found this document useful (0 votes)
20 views13 pages

Fourier Transf Example

The document outlines a practical exercise in advanced physics focusing on time series data analysis using Fourier techniques. It aims to teach modeling of time series data, advanced Excel skills, and application of these skills to determine epidemic thresholds in disease data. The exercise includes manual calculations of Fourier coefficients, the use of Excel's Fast Fourier Transform, and building a periodogram to analyze the significance of sinusoidal components in the data.

Uploaded by

Ovidiu Rotariu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Fourier Transf Example

The document outlines a practical exercise in advanced physics focusing on time series data analysis using Fourier techniques. It aims to teach modeling of time series data, advanced Excel skills, and application of these skills to determine epidemic thresholds in disease data. The exercise includes manual calculations of Fourier coefficients, the use of Excel's Fast Fourier Transform, and building a periodogram to analyze the significance of sinusoidal components in the data.

Uploaded by

Ovidiu Rotariu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

PX3506 Advanced Practical Physics

Analysis of time series data using Fourier Analysis

Objectives

 To learn how to model time series data using Fourier techniques


 To gain advanced Excel skills using the Fourier Transform and Solver tools
 To calculate the Fourier coefficients and generate the periodogram of a periodic
time series
 To apply these skills to determine epidemic thresholds in human disease data

AS YOU PERFORM THIS PRACTICAL, THERE ARE A NUMBER OF


GRAPHS YOU NEED TO GENERATE AND QUESTIONS TO ANSWER.
THESE MUST BE ENTERED IN YOUR LABORATORY BOOK. THESE
HAVE BEEN REFERENCED IN THIS SCRIPT WITH ROMAN NUMERALS
AND WRITTEN IN ITALICS.

Introduction

Time series analysis is concerned with data which consist of time-ordered sequences
of measurements on some phenomenon of interest. These phenomena include, rainfall
and other weather measurements (with a strong emphasis on forecasting), signal
processing (physical and engineering sciences), financial (e.g. stock market prices)
and biological data sets (e.g. incidence of disease in a population).

Spectral analysis can be applied to a time series to explore cyclical patterns within
these data. The purpose of the analysis is to decompose a complex time series with
cyclical components into a few underlying sinusoidal (sine and cosine) functions of
particular wavelengths. The term “spectrum” provides an appropriate metaphor for
the nature of this analysis: Suppose you study a beam of white light, which at first
looks like a random (white noise) accumulation of light at different wavelengths.
However, when you put through a prism, we can separate the different wavelengths or
cyclical components that make up “white” sunlight. In fact, via this technique we can
now identify and distinguish between different sources of light. Thus, by identifying
the important underlying cyclical components, we have learnt something about the
phenomenon of interest. In essence performance of spectral analysis on a time series
is like putting the series through a prism in order to identify the wavelengths and
importance of underlying cyclical components. As a result of a successful analysis
one might uncover just a few recurring cycles of different lengths in the time series of
interest, which at first looked more or less like random noise.

Periodic Function

The natural function to model a periodic component in a time series is:

(1)
where:

 is the frequency of the periodic variation = 2/period


R is the amplitude
 is the phase

Figure 1. Cosine curve representation of the periodic component of a time series

In 1807, Fourier shows that any periodic signal can be decomposed as a series of
sinusoidal curves such as the one above, corresponding to each possible frequency in
the signal. The signal in Figure 2 is the sum of the 4 sinusoidal curves in figure 3 plus
a constant term.

Joseph Fourier (1768-1830)


French mathematician, known
also as an Egyptologist and
administrator, who exerted strong
influence on mathematical physics
through his Théorie analytique de
la chaleur (1822; The Analytical
Theory of Heat ). He showed how
the conduction of heat in solid
bodies may be analyzed in terms
of infinite mathematical series
now called by his name.

2
0.2

0.1
Amplitude

0
0 52 104 156 208

-0.1

-0.2
Time

Figure 2. Random time series or structured data?

0.08

0.04
Amplitude

0
0 52 104 156 208
-0.04

-0.08
Time

Figure 3. Sinusoidal decomposition of figure 2.

Generally, a signal of n data points can be perfectly described by n/2 sinusoidal


curves (Fourier series).

The Fourier transform converts a signal expressed as a function of time to a function


of frequency. The Fourier transform enables transfer between time and frequency
without loss of information. The Fourier transform returns for each frequency 2
Fourier coefficients which enable us to get the corresponding amplitude and phase.

Since:

cos(a+b) = cos(a)cos(b)-sin(a)sin(b)

Y(t) = Rcos(t+)

can be expressed as the form

3
Y(t) = R(cos(t)cos() - sin(t)sin())

By setting:

a = Rcos() and b=Rsin()

The expression becomes:

Y(t) = acos(t) + bsin(t) (2)

In this form, a and b hold information about the phase and amplitude. It is under this
form that the Fourier transform outputs results for each frequency. The output is
usually expressed as a complex number of the form a + bi. It is also worth
remembering that R2 = a2 + b2.

(i) Prove the statement in the last sentence mathematically in your lab book.

In general for an infinite time series, it is made up of an infinite number of


“harmonics”:

(3)

where a0 is an offset (the mean value of Y(t)) and 2 =21 and 3 =31 etc.

Fortunately, time series data are usually not infinite. Hence for a 256 point time
series, the Fourier transform has 128 harmonics (i.e. it returns 128 complex numbers).

NEED TO PUT A BIT HERE ABOUT THE FOURIER INTEGRALS


Mathematical Annex:

Assuming Y(t) is a continuous, periodic function with period T. Then

Y(t+T) = Y(t)

The Fourier series for Y(t) is given by equation (3) above. The Fourier coefficient a n and bn are:

and,

where .

The Fourier Transform

4
Part 1. Manual calculation of a single Fourier coefficient in Excel

AS YOU PERFORM THIS PRACTICAL, THERE ARE A NUMBER OF


GRAPHS YOU NEED TO GENERATE AND QUESTIONS TO ANSWER.
THESE MUST BE ENTERED IN YOUR LABORATORY BOOK. THESE
HAVE BEEN REFERENCED WITH ROMAN NUMERALS AND WRITTEN IN
ITALICS IN THIS INSTRUCTION SHEET.

Here we will calculate the Fourier coefficients of the first harmonic of a time series
(1996-2001) of of Cryptosporidium cases in England and Wales.

Cryptosporidium is a parasite which causes


stomach infections. The usual route of
infection is by drinking contaminated water.
In Aberdeen there was an outbreak alleged to
be caused by contaminated water in January
– March 2002.

Load the file rawcrypto.xls into Excel


Activate Sheet 1 of the worksheet

(ii) Plot a graph in Excel of cases per week against time in weeks

For simplification of calculation, the time series includes 256 data points whereas in 5
years there are really 260 weeks. (Later when you use the Excel Fast Fourier
Transform option it requires the number of datapoints to be to the power of 2.).
Calculation of a single Fourier coefficient requires 4 steps: generating a sinusoidal
curve over the period to be tested, multiplication of the signal by the sine curve,
getting the area under the product curve and adjusting the phase to maximise the area
under the curve.

1. Generating a sinusoidal curve of a given period to test

The equation of a simple sinusoidal curve is:

Y(t) = cost

The frequency , can be expressed as:

 = 2t/T

Where T is the period. Expressing the frequency in periods yields:

Y(t) = cos(2t/T)

5
Visual review of the data should indicate an annual variation. We start to determine
the Fourier coefficient that corresponds to 1 oscillation every year. Thus the period is
52 weeks. We define the period as a parameter in the spreadsheet, so that we can
modify it later.

Enter the following cells in the spreadsheet:


‘Parameter’ in cell F2
‘Period’ in cell F4
‘52’ in cell G4
‘Frequency’ in cell F5.
The frequency is the number of oscillations over the entire period
‘=256/$G$4’ in cell G5
‘Cosine in cell C2
‘=COS(2*PI()*$A3/$G$4)’ in cell C3
Copy the formula in C3 to the range C4 to C258

The cosine curve is used here simply to have the curve starting with the crest at t=1.
The cell G4 being a parameter, we enter it using absolute references: $G$4

(iii) Plot a graph showing the cosine curve.

2. Get the product of the signal with the sinusoidal curve

We enter a formula in column D to get the product of the two curves:

Enter the following values in the spreadsheet:


‘Product’ in cell D2
‘=C3*B3’ in cell D3
Copy the formula in D3 to the range D4 to D258

(iv) Superimpose on the graph plotted in (i) a graph of the product.

3. Get the area under the product curve

The area under the curve is an estimation of the contribution of oscillations at this
frequency in the signal. The integral of the product curve is the area underneath it, and
it can be approximated by summing the values of the entire series in column D;

6
Figure 3. A Graph showing the Product together with the area under the product
shaded black.

The black area is the area under the product curve. Since the cosine curve oscillates
between –1 and +1, the product curve also oscillates.

Enter the following values in the spreadsheet:


‘Area’ in cell F6
‘=SUM(D3:D258)’ in cell G6
A value of –512.9 should be returned for the area

3. Adjust the phase to maximise the area

We arbitarily started the sinusoidal curve at the origin. This step involves moving the
curve along the time axis in order to maximise the area under the product curve.
When the area is maximum the two signals are “in phase.” This can be done by trial
or error, or by using the Excel solver.
The Microsoft Excel Solver:
Use Solver to determine the maximum or minimum value of one cell by changing other cells. For
example, the maximum profit you can generate by changing advertising expenditures. The cells you
select must be related through formulas on the worksheet. If not related, changing one cell will not
change the other.
b

To maximise the area we need to introduce into the formulas a new parameter for
displacing the sinusoidal curve, the lag, expressed in weeks.

Enter the following values in the spreadsheet:


‘Lag’ in cell F3
‘0’ in cell G3
Modify the formula in cell C3
‘=COS(2*PI()*($A$3+$G$3)/$G$4)’ in cell C3
Since theformula
Copy the lag is 0, thetovalue
in C3 of the
the range C4 area was not modified.
to C258

To maximise the area manually, increase gradually (remember 1 unit is 1 week) the value in cell G3
until you get the largest possible value in cell G4.

However we can use the solver to let Excel get the optimum value for the lag to
maximise the area.

Call the ‘solver’ in the TOOL menu


Enter $G$6 for ‘Set the target cell’
Set ‘Equal to’ to Max
Enter ‘$G$3’ in ‘By changing cells’
Click on ‘Solve’ button
Click ‘OK’ to keep solver solution

Excel will return the value for the area as 3703.9 with lag of 14.14. The area is the
value of R in equation (1).

7
Lets change the period to 26 weeks to find out the Fourier coefficients at this
frequency.

Reset the lag to ‘0’ in cell G3


Set the period to ‘26’ in cell G4

This yields a frequency of 9.8 (approximately 2 ctcles per year for the 10 years of
data) and an area of –169.44. We need to maximise the area by varying the lag using
the solver.

Call the ‘solver’ in the TOOL menu


Enter $G$6 for ‘Set the target cell’
Set ‘Equal to’ to Max
Enter ‘$G$3’ in ‘By changing cells’
Click on ‘Solve’ button
Click ‘OK’ to keep solver solution

(v) Report the area (R value) and lag for the 26 week contribution. Discuss the
differences between the 26 and 52 week contributions.

Summary

Here we have calculated the Fourier coefficient , R, and the lag, , for two sinusoids
with periods 26 and 52 weeks respectively. This calculation is straightforward but
would become rather cumbersome if we had to repeat it 128 times, for each frequency
256 data points. In the next part we will use excel to do all the calculations at once.

8
Part 2: Computation of all Fourier coefficients using Excel Fast Fourier
Transform (FFT) add-in macro.

Computation

Load the file rawcrypto.xls into Excel


Activate Sheet 2 of the worksheet

The analysis add-in macro should be loaded in order to use the FFT.
On the Tools menu select ‘Data Analysis’ then select ‘Fourier Analysis’

The Fourier transform dialog takes only 2 parameters: the input and output range:
Enter ‘$B$3:$B$258’ in ‘Input range’
‘$C$3’ in ‘Output range’
Click on OK

The output lists the coefficients expressed as complex numbers, starting at the cell
indicated for output range. The first coefficient (cell C3) does not have an imaginary
component. It corresponds to the sum of the data over the entire range of values. (a0
in equation ???). The next coefficient (cell C4) corresponds to 1 oscillation over the
256 data points (Period of 256 weeks). The next one (cell C5) corresponds to 2
oscillations (Period of 128 weeks). There are 256 coefficients, but the last 128
coefficients are mirror images of the first 128. Hence we will just consider the first
128.

Building the periodogram

The information that we want to get is for which period in weeks (or which
frequency) do we get the strongest oscillations in the signal. The periodogram is a
graph of the Magnitude of the Fourier coefficient (R) against the period (T) of the
oscillations. In order to get the period and R we need to enter a couple of additional
values and formulas.

Enter:
‘Period’ in D2
‘Energy’ in E2
‘F’ in F2
‘1’ in cell F4
‘=F4+1’ in cell F5
Copy formula dow to cell F131 (Cell F131 should have value 128)
‘=256/F4’ in cell D4

The magnitude is the sum of the square of the imaginary and real components of the
complex number.
Fourier Complex representation: a + bi

Magnitude is:

9
Excel provides a function IMABS() which extracts the magnitude R directly.

Enter ‘=IMABS(C4)’ in cell E4


Copy the formula in the range E4:E131

Looking at the E column shows that the ???? Crypto analysis here
Note 51.2 is closest to yearly cycle. It is not exactly 52 because the Fourier Transform
uses fractions of 256. This explains the discrepancy but does not cause problems in
interpreting the periodogram.

Compare the magnitude R with the value we got for R in the previous section.

Plot the periodogram (Magnitude on vertical axis and Period on horizontal axes)
Report what the periodogram tells us about the relative importance of each of the
sinusoidal components.

Summary

Here, we have used Excel to perform a spectral analysis of our signal. We have built
the periodogram and shown which sinusoids are important.

10
Part 3: Applications of Fourier transform in disease surveillance

In this part we will


(1) use the important oscillations in the Fourier series to model the Cryptosporidium
disease data.
(2) Produce confidence intervals on the data
(3) Identify epidemic increases or decreases in the data
(4) Use the Fourier model to predict what the incidence of the disease will be in the
following year.

Load the file rawcrypto.xls into Excel


Activate Sheet 3 of the worksheet

1. Use the important oscillations in the Fourier series to model the


Cryptosporidium disease data.

At this point we need to introduce cyclical contributions identified on the


periodogram in part 2. From the periodogram, we can decide to introduce 4 cyclical
terms, the 52, 26, 17.3 and 13 week cyclical variations. Each are defined by their
amplitude and lag.

To get the best model fit to the data we minimise the sum of squares of the difference
between the signal (raw data) and the model. This is the variance which we will call
“S2”. The regression coefficient R2 is the ratio of the covariance of the raw data with
the model divided by the product of the individual standard deviations:

The value of R2 gives the proportion of the overall variance (variation in the data)
accounted for by the model.

11
Enter ‘Model’ in C2
‘Diff^2’ in D2
‘Parameters’ in G2
‘a0’ in G3
‘R2’ in G5
‘S2’ in G6

‘Amp1’ in I3
‘Lag1’ in I4
‘Period1’ in I5
‘Amp2’ in I6
‘Lag2’ in I7
‘Period 2’ in I8
‘Amp3’ in I9
‘Lag3’ in I10
‘Period 3’ in I11
‘Amp4’ in I12
‘Lag4’ in I13
‘Period4’ in I14

Set cells H3, J3, J4, J6, J7, J9, J10, J12, J13 equal to ‘0’

Set Cell J5 equal to ‘52’


Set Cell J8 equal to ‘26’
Set Cell J11 equal to ’17.3’
Set Cell J14 equal to ’13’

Set cell C3 equal to


‘= $H$3+$J$3COS(2*PI()*((A3-$J$4)/$J$5))
+$J$6COS(2*PI()*((A3-$J$7)/$J$8))
+$J$9COS(2*PI()*((A3-$J$10)/$J$11))
+$J$12COS(2*PI()*((A3-$J$13)/$J$14))’
Copy cell C3 over the range C4:C258

Set cell D3 equal to:


‘=(C3-B3)’
Copy dell D3 over the range d4:D258

Set H5 equal to:


‘=COVARIANCE(C3:C258;B3:B258)/(STDEV(C3:C258)*STDEV(B3:B258))’

Set H6 equal to:


‘=sum(D3:D258)

We call the solver to optimise the parameters to get the best fit to the model.
Call the ‘Solver’ in the Tool menu
Enter $H$6 for ‘Set target cell’
Set ‘Equal to ‘Min’
Enter ‘H3;J3;J4;J6;J7;J9;J10;J12;J13’ in ‘by changing cell’
Click on ‘Solve’ button
Click ‘OK’ to keep solver solution

Look at the R-squared value and note the variance that is explained by the model

Setting up 95% Confidence Intervals

12
We can set up confidence intervals to identify epidemic increases and decreases in the
raw data. The 95% confidence intervals are approximately the mean model value at a
particular time  1.96*(sum of squares)/(number of observations).

Enter ‘95% CI’ in E2 and F2


Set E3 ‘=C3+1.96*$H$6/256’
Copy formula from E4 to E258

Set F3 ‘=C3+1.96*$H$6/256’
Copy formula from F4 to F258

Plot a graph showing the raw data together with the model and the confidence
intervals.
Report on any epidemic increases or decreases that you find
Is there a way we could do this directly from the Fourier coefficients that would not
require the use of the Excel solver? If there is then explain how you would do it.

Finally generate data from the Fourier model to predict infection rates in the
following year (2002). Plot this on a graph.

13

You might also like