0% found this document useful (0 votes)
30 views10 pages

L1Norm Genetic

1. The document discusses genetic algorithm and outlier-insensitive criterion function based techniques for estimating the frequencies of multiple sinusoidal signals with stationary errors and noise. 2. In simulations and real data analysis, the proposed genetic algorithm based robust frequency estimators were able to resolve frequencies with high accuracy. The genetic algorithm based least squares estimator provided efficient estimates in scenarios without outliers. 3. In the presence of outliers, the proposed robust methods performed well and seemed to have a fairly high breakdown point with respect to the level of outlier contamination. The methods do not depend on initial guess values required by other iterative frequency estimation methods.

Uploaded by

Praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views10 pages

L1Norm Genetic

1. The document discusses genetic algorithm and outlier-insensitive criterion function based techniques for estimating the frequencies of multiple sinusoidal signals with stationary errors and noise. 2. In simulations and real data analysis, the proposed genetic algorithm based robust frequency estimators were able to resolve frequencies with high accuracy. The genetic algorithm based least squares estimator provided efficient estimates in scenarios without outliers. 3. In the presence of outliers, the proposed robust methods performed well and seemed to have a fairly high breakdown point with respect to the level of outlier contamination. The methods do not depend on initial guess values required by other iterative frequency estimation methods.

Uploaded by

Praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Genetic algorithms based robust frequency estimation of sinusoidal signals


with stationary errors
Amit Mitra , Debasis Kundu
Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur, Kanpur-208016, India

a r t i c l e in fo abstract

Article history: In this paper, we consider the fundamental problem of frequency estimation of multiple sinusoidal
Received 30 December 2008 signals with stationary errors. We propose genetic algorithm and outlier-insensitive criterion function
Accepted 16 July 2009 based technique for the frequency estimation problem. In the simulation studies and real life data
analysis, it is observed that the proposed genetic algorithm based robust frequency estimators are able
Keywords: to resolve frequencies of the sinusoidal model with high degree of accuracy. Among the proposed
Genetic algorithms methods, the genetic algorithm based least squares estimator, in the no-outlier scenario, provides
L1-norm estimator efficient estimates, in the sense that their mean square errors attain the corresponding Crame r-Rao
Least median estimator lower bounds. In the presence of outliers, the proposed robust methods perform quite well and seem to
Least square estimator
have a fairly high breakdown point with respect to level of outlier contamination. The proposed
Least trimmed estimator
methods significantly do not depend on the initial guess values required for other iterative frequency
Multiple sinusoidal model
Outlier-insensitive criterion estimation methods.
& 2009 Elsevier Ltd. All rights reserved.

1. Introduction Furthermore, fri gPi¼ 1 are such that the sequence {e(t)} is
stationary. The particular case of ri = 0 for all i, corresponds to
Estimating the parameters of multiple sinusoidal signals the i.i.d. noise case. M, the number of sinusoidal components is
model embedded in additive noise is a fundamental problem in assumed to be known. Given a sample of size N, {y(1), y, y(N)},
signal processing and in time series analysis. In several applica- the problem is to estimate the unknown frequencies and the
tions in signal processing (Kay, 1988; Stoica, 1993; Quinn and corresponding amplitudes.
Hannan, 2001; Stoica and Moses, 2005) and time series analysis The sinusoidal model (1), is used to describe and model many
(Brillinger, 1987), the signals dealt with can be described by the real life applications where periodic phenomena is present. The
following multiple sinusoidal model: extraction of frequencies of the sinusoidal signals model from
time series data is a classical problem of ongoing interest in the
X
M
yðtÞ ¼ ðAk cosðok tÞ þ Bk sinðok tÞÞ þ eðtÞ: ð1Þ literature of statistical signal processing (Mackisack et al., 1994;
k¼1 Kundu and Mitra, 1996; Kundu, 1997; Mitra and Kundu, 1997;
Here y(t)s are observed at the equidistant time points, namely, Smyth and Hawkins, 2000; Nandi et al., 2002; Chan and So, 2004;
Trapero et al., 2007; Bonaventura et al., 2007; Coluccio et al.,
t =1,2, y, N. The unknown parameters of the model are the
frequencies (o1, y, oM) and the corresponding amplitudes (A1, 2008) and indeed has created interests among scientists, from
various diverse fields. There exists a vast amount of literature
y, AM) and (B1, y, BM). Aks and Bks are arbitrary real numbers and
oks are distinct real numbers lying between (0, p). The real valued addressing the computational aspect of the frequencies of the
sinusoidal model as well as focusing on theoretical behavior of the
additive white noise sequence {e(t)} is assumed to be stationary
with finite variance s2 and it has the following form: estimators. The most intuitive and natural approach is the least
squares approach. A closely related approach is the approximate
X
P
least squares estimators (ALSEs) approach, which is asymptoti-
eðtÞ ¼ ri eðt  iÞ þ dðtÞ; ð2Þ
cally equivalent to the least squares estimators (LSEs). Asymptotic
i¼1
properties of the ALSE and LSE are studied in detail in Walker
{d(t)} being a sequence of independent and identically distributed (1971), Hannan (1971), Kundu (1993, 1997) and Kundu and Mitra
(i.i.d.) normal random variable with mean 0 and variance s2. (1996). It is well known that, although the LSEs are the most
desired estimators from theoretical point of point of view,
 Corresponding author. Tel.: + 91 512 2596064; fax: +91 512 2597500. obtaining the LSEs is numerically a very difficult problem (Kahn
E-mail address: [email protected] (A. Mitra). et al., 1993). It is observed that the least squares surface has local

0952-1976/$ - see front matter & 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.engappai.2009.07.001

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
2 A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

minima spaced O(N  1) apart, making the gradient based search we can write cð o ; A ; B Þ as
  
methods of general non-linear optimization ineffective without
cð o ; A ; B Þ ¼ ð Y Að o Þ a ÞT ð Y Að o Þ a Þ: ð7Þ
excellent starting values. Several methods are available in the         
literature to obtain the LSEs efficiently, but unfortunately all the With distinct frequencies, if NZ2M the Vandermonde matrix
methods are quite sensitive to the initial value chosen. It is further Að o Þ is of rank 2M and ðAð o ÞT Að o ÞÞ1 exists. We thus observe that
  
observed in Rice and Rosenblatt (1988), that unless the frequen- the vectors o and a which minimize (3) are given by
cies are resolved at the first step with order O(N  1), the failure  

to converge to global minima may give a very poor estimate of o^ ¼ arg max½ Y T Að o ÞðAð o ÞT Að o ÞÞ1 Að o ÞT Y ; ð8Þ
 LSE o      
the amplitudes. Thus, fitting these multiple sinusoidal models 

can involve daunting computational difficulties. The problem


can be further complicated in case outliers are present in the a^ ¼ ðAð o ÞT Að o ÞÞ1 Að o ÞT Y j o ¼o
^ : ð9Þ
 LSE       LSE

dataset. It is well known that the least square estimators for this
In this paper, we develop genetic algorithm based frequency problem are optimal under various considerations on the noise
estimation methods optimizing outlier-insensitive criterion func- sequence. It is observed that, under the i.i.d. assumption on the
tions. The aim here is to find an algorithm, under the assumption noise sequence, the estimators are strongly consistent (Kundu and
of stationary additive noise random variable, whose performance Mitra, 1996), asymptotically normal with a covariance matrix that
significantly does not depend on initial guess values (or intervals), coincides with the Crame r-Rao bound under the normality
and also have a high breakdown point with respect to outliers assumption. It is further observed that the LSEs under the
present in the data. Recently, Smyth and Hawkins (2000) dependent error structure are also strongly consistent and
proposed an algorithm based on elemental sets for robust asymptotic normal (Kundu, 1993).
frequency estimation, under the assumption of independently As an alternate to the above LSE formulation of the problem,
and identically distributed (i.i.d.) normal random variables. we can use L1-norm formulation, which is often used in the
Contrary to the remarks made in Smyth and Hawkins (2000) that literature of robust regression. The L1-norm estimates of the
the genetic algorithms does not seem to be a suitable approach for parameters of the multiple sinusoidal model (1) are obtained by
this problem (under i.i.d. setup), especially with the presence of minimizing
outlier, we observe that the proposed genetic algorithm based  
X N  X
M 
methods perform quite satisfactorily even in the dependent noise  
fðo; A; BÞ ¼ yðtÞ  ðAk cosðok tÞ þ Bk sinðok tÞÞ: ð10Þ
structure.  
t¼1 k¼1
The rest of the paper is organized as follows. In Section 2, we
The non-linear optimization problem (10) is solved using
give the least squares and the L1-norm formulation of the
standard non-linear optimization routines in order to get the L1-
frequency estimation problem. In Section 3, we will give a brief
norm estimates. The L1-norm estimators, also called the least
review of the outlier-insensitive criterion functions. Section 4
absolute deviation (LAD) estimators, correspond to the maximum
presents the proposed genetic search based iterative algorithms
likelihood estimators under the assumption that noise are i.i.d.
for robust frequency estimation. The empirical studies, imple-
with double exponential distribution. In recent literatures of
menting the proposed algorithms, will be presented in Sections 5.
signal processing, use of modified simplex algorithm is proposed
Finally, the conclusions will be discussed in Section 6.
for obtaining the L1-norm estimates. The estimates are then
computed using the Barrodale–Roberts modified simplex algo-
2. Least squares and L1-norm estimators rithm (Barrodale and Roberts, 1973, 1974). For a more detailed
review of L1-norm techniques, readers are referred to Bloomfield
The least squares estimators of the parameters for the model and Steiger (1983).
(1) are the minimizers of the criterion function:
" #2
X
N X
M 3. Outlier-insensitive criterion functions
cð o ; A ; B Þ ¼ yðtÞ  ðAk cosðok tÞ þ Bk sinðok tÞÞ ; ð3Þ
  
t¼1 k¼1
The conventional estimates that are found by the least squares
where o ¼ ðo1 ; . . . ; oM ÞT is the vector of frequencies and criterion, i.e. minimizing the sum of squares of all the N residuals,

A ¼ ðA1 ; . . . ; AM ÞT and B ¼ ðB1 ; . . . ; BM ÞT are the amplitude vectors. are motivated by the ideas of statistical efficiency. However, the
 
Here ‘T’, denotes transpose of a vector or of a matrix. The estimates are inappropriate if some of the observations are
sinusoidal model parameters estimated through minimization of contaminated. The L1-norm estimates are potentially better
(3) has the smallest least squares distance to the observed data. o , options in situations where the dataset contains outliers. Deviat-

A and B obtained by minimizing (3) are called the non-linear ing from the use of usual sum of square errors or sum of absolute
 
least squares (NLS) estimators. When the noise e(t) is white errors criterion functions, literature of robust regression provides
Gaussian, the NLS estimators are same as the maximum like- us with alternate criterion functions that are relatively insensitive
lihood estimators. to the presence of outliers in the data. The primary aim of these
For the sinusoidal model (1), the criterion function (3) can outlier-insensitive criteria is to protect the estimate from such
conveniently be concentrated with respect to the conditionally outlier contamination.
linear parameters A and B. Introducing the notations: Among the most widely used specialized outlier-sensitive
Y ¼ ½yð1Þ; yð2Þ; . . . ; yðNÞT ; ð4Þ criterion functions, are the least trimmed (LT) sum and the least
 median (LM) criteria. We now formulate the criteria to be used in
2 3 the robust frequency estimation methods proposed in this paper.
cosðo1 Þ sinðo1 Þ  cosðoM Þ sinðoM Þ
6 ^ ^ ^ 7
Að o Þ ¼ 4 5; ð5Þ
 3.1. Least trimmed criterion
cosðo1 NÞ sinðo1 NÞ  cosðoM NÞ sinðoM NÞ

Let e2(1) oe2(2) o, y, oe(N)2 be N ordered estimated squared


a ¼ ½A1 ; B1 ; . . . ; AM ; BM T ; ð6Þ
 residuals, for an estimated value of A, B, o. The unordered e(i)2s

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]] 3

for the model (1) are given by successive generations. The parameter space, Ofreq for the
 X
M 2 frequency vector, o ¼ ðo1 ; . . . ; oM ÞT , for the sinusoidal model

eðtÞ2A ; B ; o ¼ yðtÞ  ðAk cosðok tÞ þBk sinðok tÞÞ ; t ¼ 1; 2; . . . ; N: (1) is given by
  
k¼1
ð11Þ Ofreq ¼ ð0; 1Þ  ð0; 1Þ      ð0; 1Þ  RM : ð15Þ

The least trimmed (sum of) squares (LTS) estimator, proposed We first obtain the binary chromosomal representation of the
in Rousseeuw (1984), is found by finding the parameters that parameter space. We form, for any possible solution belonging to
satisfy the original parameter space Ofreq, a binary string of length M  p.
Where, p denotes the length of the binary bit representation of
X
h
any component of the parameter vector o, i.e. for each of the
Min e2ðtÞ : ð12Þ 
o; A ; B
   t¼1 unknown frequencies, we obtain a p-bit coded binary representa-
tion. It is however well known that ordinary binary coding can
It is well known (Rousseeuw, 1984) that the best robustness
result in search process being deceived, i.e. unable to efficiently
property is obtained when h =N/2, approximately. In this case a
locate the global minima, due to large hamming distances in the
breakdown point of 50% is attained. Higher efficiency of the
representational mapping between adjacent values (Hollstien,
estimates is obtained with lower trimming proportions. We
1971). A hamming distance, between two binary strings is defined
consider in the present paper, a 50% trimming.
as minimum number of bits that must be changed in order to
Alternatively, under L1-norm estimation setup, we can simi-
convert one bit string into another. In order to avoid the above-
larly define a least trimmed (sum of) absolute (LTA) deviation
mentioned problem, a Gray coding approach of the original binary
estimator. The LTA deviation estimator is found by finding the
strings is adopted. The literature of GA and its applications report
parameters that satisfy
that Gray coding exhibits accelerated convergence rate of the
X
h
objective function, and provides better accuracy than the binary
Min jeðtÞ j; ð13Þ
o;A;B coded GA (Caruana and Schaffer, 1988; Yokose et al., 2000).
t¼1
Superior performance of a Gray coded GA is mainly attributed to
where je(1)joje(2)jo, y, oje(N)j be N ordered absolute residuals, the fact that Gray codes do not bias the searching direction, as in
the unordered je(t)js for model (1) are given by the case of ordinary binary coding, having a large hamming
 
 X
M  distance between adjacent values. A Gray code represents each
 
jeðtÞj ¼ yðtÞ  ðAk cosðok tÞ þ Bk sinðok tÞÞ; t ¼ 1; 2; . . . ; N: ð14Þ number in the sequence of integers {0,1, y, 2K  1} as a binary
 
k¼1 string of length K in an order such that adjacent integers have
Once again we consider a 50% trimming for the LTA based Gray code representations that differ in only one bit position. Use
estimators in the present paper. of Gray code thus allows, going through the integer sequence
requiring flipping just one bit at a time. This is called the
3.2. Least median criterion adjacency property of Gray codes. Gray code takes a binary
sequence and shuffles it to form some new sequence with the
The least median squares (LMS) estimator is obtained by adjacency property. We use here a Gray coding derived from the
finding the model parameters that minimizes the hth-ordered initial binary coding.
squared residual, i.e. e(h)2, where h is usually taken as h= [N/ To initialize the genetic search, we populate an initial
2]+ [(p +1)/2], p denotes the number of parameters in the model. population of a pre-determined size. Each member of this initial
This estimator was introduced in Rousseeuw (1984) (see also population is a randomly chosen parameter vector o 0 A Ofreq ;

Rousseeuw, 1988; Rousseeuw and Leroy, 1987). coded to get the chromosomal string representation of bit length
Similar to the LMS estimator, we can define the least median M  p. The ranking based fitness of each of the members of this
absolute (LMA) estimator as the estimator that is obtained by initial population is evaluated according to the criterion (8). For a
finding the parameters that minimize the hth-ordered absolute detailed discussion on various selection procedures, see for
residual, i.e. je(h)j, with appropriate choice of h. example Goldberg (1989). Using a stochastic sampling with
replacement approach, we next populate fit parents pool, size of
the pool depending on the generation gap. From the selected
4. Proposed robust frequency estimation methods parent pool, we select pairs in order and apply a two-point
crossover (with a pre-assigned crossover probability), exchanging
In this section, we present the proposed genetic algorithm genetic material of parents to obtain new chromosomes. Cross-
based frequency estimation techniques for the multiple sinusoidal over produces new individuals that have some parts of both the
model. parent’s genetic material. An example of a multipoint crossover is
We propose six different estimators based on genetic algo- illustrated in Fig. 1.
rithm and different criterion functions. These estimators are: (i) Mutation is applied on the mated chromosome strings with a
genetic algorithm based least square estimator (GA-LS), (ii) low pre-assigned mutation probability. Mutation is considered to
genetic algorithm based least trimmed square estimator (GA- be the genetic operator that ensures that the probability of
LTS), (iii) genetic algorithm based least median square estimator searching any given string will never be zero and thus has the
(GA-LMS), (iv) genetic algorithm based L1-norm estimator (GA- effect of tending to inhibit the possibility of convergence of the GA
L1), (v) genetic algorithm based least trimmed absolute deviation to a local optimum. Mutation changes the genetic representation
estimator (GA-LTA), (vi) genetic algorithm based least median of the chromosomes according to a probabilistic rule. In the binary
absolute deviation estimator (GA-LMA). string representation, mutation will cause a single bit to change
We first present the algorithm for the GA-LS estimator. In the its state, i.e. 0 ) 1 or 1 ) 0.
genetic search formulation of the GA-LS estimator of the An elitist strategy is used to fill the generation gap. An elitist
parameters of the model (1), we take the objective function (8) strategy (De Jong, 1975; Thierens, 1997) is adopted while
as the fitness function in the genetic search setup and aim to find populating a new generation. Elitism encourages the inclusion of
the optimum member through repeated applications of the three highly fit chromosome strings, from earlier generations, in the
genetic operators of selection, crossover and mutation, over the subsequent generations. The fractional difference between the

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
4 A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

Fit Parent I New Chromosome I


10 10101 111111111111 0000000000 01 10 01010 111111111111 1111111111 01

01 01010 000000000000 1111111111 10 01 10101 000000000000 0000000000 10

Fit Parent II New Chromosome II

Fig. 1. Example of a 4-point crossover.

number of chromosomes in the old population and the number of Step 7: Use elitist strategy to fill the generation gap.
new chromosomes produced by selection and recombination is Step 8: Repeat the steps 2–7 till maximum number of
termed as the generation gap. Under the elitist approach, a generations is reached or no better solution is found after the
fraction (based on the value of the pre-determined generation pre-determined maximum number of generations is reached.
gap) of the most-fit individuals is deterministically allowed to Step 9: o^ is the most-fit decoded string found among all
 GA-LSE
propagate through successive generations.
the generations.
Since GA is a stochastic optimization algorithm, the appli-
Step 10: Calculate the estimates of the conditionally linear
cation of conventional termination criteria becomes problematic
parameters a through (9).
in GA based optimization procedure. We follow here the most 

commonly adopted practice, where the cycles of selection,


crossover and mutation is carried on until a pre-determined For the GA-LTS estimator, we consider the 3M dimensional
number of generations have been completed or no better solution model parameter vector as Z ¼ ðA1 ; B1 ; o1 ; . . . ; AM ; BM ; oM Þ and

is found after a pre-determined number of successive genera- consider the objective function as
tions have evolved, whichever is earlier. We walk through
X
½N=2
the GA steps repeatedly, until the termination criterion is Min e2ðtÞ : ð16Þ
reached. o; A ; B
   t¼1
After completion of each generation, we preserve the informa-
tion regarding the most fit, i.e. the parameter vector that is the where e2ð1Þ oe2ð2Þ o    o e2ð½N=2Þ are [N/2] smallest squared resi-
best solution for the optimization of (3) ((8) for frequency duals, the unordered squared residuals, e(t)2s, are given by (11).
estimation), in that generation. The GA based least square (GA- The parameter space, O for the present setup for the model (1) is
LS) solution of o , say o
^ , is the most-fit individual evolving given by
  GA-LSE
among all the generations, at the point when termination O ¼ ð1; 1Þ  ð1; 1Þ  ð0; 1Þ    ð1; 1Þ  ð1; 1Þ  ð0; 1Þ  R3M :
criterion is reached. Once we obtain o
^ , the estimates of the ð17Þ
 GA-LSE
conditionally linear parameters, the amplitudes, a , may be Similar to the GA-LS method, we first obtain the binary

obtained using (9). chromosomal representation of the parameter space O. We form,
The algorithmic steps for the proposed procedure are given for any possible solution belonging to the original parameter space
below: O, a binary string of length 3Mp. p denotes the length of the binary
bit representation of any component of the parameter vector Z .

The algorithmic steps for obtaining the GA-LTS estimator is
Step 1: Randomly initialize initial population generation (of a
similar to the steps followed to obtain the GA-LS estimates with
pre-determined size) of chromosomes of Gray coded binary
the difference that in Step 1 we initialize the initial population
strings of length M  p, each of these chromosomes is the
now with binary strings of length 3Mp and in Step 9 we obtain the
coded binary representation of a possible solution for the least
GA-LTS estimates of the entire parameter vector.
square frequency estimation problem.
For the GA-LMS estimator, the objective function in the GA-LTS
Step 2: Decode the Gray coded binary strings using a linear
setup is replaced by
scaling.
Step 3: Evaluate the objective function (8) for each of the Min e2ð½N=2 þ ½ðp þ 1Þ=2Þ: ð18Þ
o; A ; B
decoded strings and obtain their fitness values using a ranking   

based approach. Preserve the information about the string


The parameter space and the algorithmic steps remain same as
with highest fitness value.
that of GA-LTS estimator. For the L1-norm based estimators,
Step 4: Using a stochastic sampling with replacement
namely the GA-L1 estimator, the GA-LTA estimator and the GA-
approach, populate fit parents pool, size of the pool depending
LMA estimator, the parmeter space remains (17). The objective
on the generation gap.
function for the GA-L1 estimator is given by
Step 5: From the selected parents pool, we select pairs in order
 
and apply a two-point crossover (with a pre-assigned crossover N 
X X
M 
 
probability), exchanging genetic material of parents to obtain Min  yðtÞ  ðAk cosðok tÞ þ Bk sinðok tÞÞ: ð19Þ
A ; B ;x  
   t¼1 k¼1
new chromosomes.
Step 6: Apply mutation on the mated chromosome strings with The algorithmic steps remain the same as the steps for
small pre-assigned mutation probability. obtaining GA-LTS estimator. The objective function for the

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]] 5

GA-LTA estimator is given by Table 1


Choice of genetic parameters for the simulations for model (22).
X
½N=2
Min jeðtÞ j: ð20Þ Genetic parameter Values
o; A ; B
   t¼1

where jeð1Þ j ojeð2Þ j o    o jeð½N=2Þ j are [N/2] smallest absolute Number of chromosomes in one population 200
Bits of precision 80
deviations, the unordered je(t)js for model (1) are given by (14).
Coding Gray coding
Finally, for the GA-LMA estimator, the objective function for the Scaling Linear
GA procedure is given by Range of parameters for initial population oA[0, p]
Crossover probability 0.70
Min jeð½N=2 þ ½ðp þ 1Þ=2Þ j: ð21Þ Crossover method 2-point
o; A ; B
  
Mutation probability 0.01
Elitism Top 10%
Maximum number of generation 200
5. Simulation studies and real life data analysis

In this section, we will apply the proposed procedures of GA Table 2


based frequency estimation techniques for frequency estimation Simulation results for one cosine signal model (22).
of various simulated sinusoidal models. We will also perform
Method No outlier 30% outlier
extensive simulation studies to investigate the possible effect of
outliers present in the data. In the simulation studies, we consider Average St. Dev. RMSE Average St. Dev. RMSE
both dependent error as well as independent error structures. We
report here the performance of the following estimators: (i) GA-LS 0.5001 0.00099 0.00099 1.4750 0.79855 1.26025
GA-LTS 0.5001 0.00104 0.00105 0.5004 0.00116 0.00126
genetic algorithm based least square estimator (GA-LS), (ii)
GA-LMS 0.5002 0.00127 0.00128 0.5008 0.00153 0.00172
genetic algorithm based least trimmed square estimator (GA- GA-L1 0.5002 0.00101 0.00102 0.5004 0.00153 0.00158
LTS), (iii) genetic algorithm based least median square estimator GA-LTA (50) 0.5004 0.00121 0.00124 0.5004 0.00134 0.00141
(GA-LMS), (iv) genetic algorithm based L1-norm estimator (GA- GA-LTA (80) 0.5002 0.00103 0.00104 0.5002 0.00119 0.00120
GA-LMA 0.5004 0.00105 0.00109 0.5004 0.00112 0.00118
L1), (v) genetic algorithm based least trimmed absolute deviation
estimator (GA-LTA), (vi) genetic algorithm based least median
absolute deviation estimator (GA-LMA). Real life data analysis
outliers. Outliers were generated to have standard deviations
using the proposed methods will also be presented.
100 times that of the good observations. The outliers were
associated to a randomly selected subset of 30 observations. The
5.1. Simulation results for independent error structure results for the no outlier and the outlier scenarios are presented in
Table 2.
In this subsection, we present the empirical studies for 1- From the results for the non-outlier case, we observe that the
component and 2-component simulated sinusoidal models with proposed estimators perform quite well, even for the trimmed
independent error structure. For the purpose of comparing the cases. Among the proposed estimators, GA-LS and GA-L1 performs
performance of the proposed robust methods with the elemental the best. The performance of the GA-LS is almost fully efficient
set based robust frequency estimates of Smyth and Hawkins (98%) and better than the best performing estimators ELS-LS, LTS-
(2000), we consider the same models as reported therein. We LI1-MM and LMS-LI1-MM (efficiency 94%) reported in Smyth and
report the average estimates, the root mean square errors (RMSE) Hawkins (2000). We further observe that the best performing
and the standard deviations (St. Dev.) over 100 simulation runs. method in the non-outlier case, the GA-LS method fails com-
The random numbers are generated using MATLAB random pletely in the presence of outliers. The performance of the GA-L1
number generator. method is still quite promising. However, much better results are
obtained with genetic algorithm based trimmed and least median
5.1.1. One sinusoid criterion functions. The best results are obtained for the GA-LMA
We consider the following one-component sinusoidal model method. The performance of the GA-LTS with 50% trimming and
the GA-LTA (80%) method is also quite encouraging. For the outlier
yðtÞ ¼ cosðot þ fÞ þ eðtÞ; t ¼ 1; 2; . . . ; N: ð22Þ
scenario, the performances of the proposed GA-LMA and GA-LTA
The true value of the frequency of the simulation model is (80%) are better than the best performing method LTS-L1-MM
o = 0.5 and that of f is 0.1. e(t) is taken as i.i.d. normal noise (with St. Dev. and RMSE 0.00127) of Smyth and Hawkins (2000).
sequence, with mean zero and standard deviation s = 0.2. The
sample size is taken as 100. The Crame r-Rao bound, which is same
as the asymptotic variance of the LSE (Kundu and Mitra, 1996), for 5.1.2. Two sinusoids
the frequency parameter is 9.6E  07. For each of the simulated We consider the following two-component sinusoidal model
datasets, we estimated the frequency using the methods de-
yðtÞ ¼ cosðo1 t þ f1 Þ þcosðo2 t þ f2 Þ þ eðtÞ; t ¼ 1; 2; . . . N; ð23Þ
scribed in Section 4. The particular choice of the genetic
parameters for the genetic formulation setup for the simulation where we take the true values of the frequencies of the simulation
model is given in Table 1. model as o1 =0.3 and o2 =0.7 and that of f1 as 0.2 and f2 as 0.1.
The trimmed proportions for GA-LTS are taken as 50% and 80% e(t) is taken as i.i.d. normal noise sequence, with mean zero and
and for the GA-LTA it is taken as 50%. The root mean square errors standard deviation s = 0.2. The sample size is taken as 100. The
(RMSE), the average estimates and the standard deviation over Crame r-Rao bounds for the frequency parameters are same and
100 simulations for the frequency is computed for all the equal to 9.6E 07. For each of the simulated dataset, we estimated
proposed methods. We also report, for comparison, the corre- the frequencies using the methods described in Section 3. The
sponding result of the best performing robust frequency estimate choice of the genetic parameters for the two-component sinusoi-
of Smyth and Hawkins (2000). We also investigate the perfor- dal model is similar to the ones mentioned for the one-component
mance of the proposed estimators when data contains 30% model (Table 1). However, to accommodate for higher-dimensional

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
6 A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

Table 3
Simulation results for the two cosine signals model (23).

Frequency Method No outlier 30% outlier

Average St. Dev. RMSE Average St. Dev. RMSE

o ¼ 0:7 GA-LS 0.6999 0.00075 0.00075 1.9237 0.77294 1.44734


GA-LTS 0.6998 0.00123 0.00124 0.7000 0.00137 0.00137
GA-LMS 0.6999 0.00108 0.00108 0.7000 0.00121 0.00121
GA-L1 0.7001 0.00092 0.00092 0.6999 0.00151 0.00151
GA-LTA 0.6999 0.00127 0.00127 0.6998 0.00138 0.00139
GA-LMA 0.7001 0.00116 0.00116 0.6999 0.00128 0.00128

o ¼ 0:3 GA-LS 0.3002 0.00067 0.00069 0.9250 0.70135 0.93941


GA-LTS 0.3004 0.00103 0.00106 0.3005 0.00113 0.00125
GA-LMS 0.3006 0.00102 0.00107 0.3009 0.00114 0.00147
GA-L1 0.3002 0.00081 0.00097 0.3008 0.00124 0.00148
GA-LTA 0.3004 0.00123 0.00127 0.3008 0.00146 0.00169
GA-LMA 0.3007 0.00110 0.00117 0.3009 0.00124 0.00153

parameter space, we form a larger chromosome pool (350) for


each population. The root mean square errors (RMSE), the average
estimates and the standard deviations over 100 simulations for all
the frequencies are computed for all the proposed methods. Once
again we also report, for comparison, the corresponding results of
the best performing robust frequency estimate of Smyth and
Hawkins (2000). Similar to the one-component model, we also
investigate the performance of the proposed estimators when data
contains 30% outliers. Once again outliers were generated to have
standard deviations 100 times that of the good observations. The
outliers were associated to a randomly selected subset of 30
observations. The results for the outlier as well as the non-outlier
cases are presented in Table 3.
From the results of the two-component model, we observe that
for the no-outlier scenario, GA-LS and GA-L1 methods perform the
best. These estimators give super efficient estimates in the sense
that their MSEs are lower than the corresponding Crame r-Rao
bounds. The performances of GA-LS (efficiency of 131% for the
Fig. 2. A representative dataset of one-component dependent error (s = 0.01)
higher frequency and 124% for the lower frequency) and GA-L1
sinusoidal model containing 30% outlier and the corresponding GA-L1fit for model
(efficiency 113% for the higher frequency and 102% for the lower (24).
frequency) are much better than Smyth and Hawkins (2000)
elemental set based methods (the reported maximum efficiency
methods described in Section 4. The choices of the genetic
of 109% is reported for the higher frequency and 104% for the
parameters for the genetic formulation setup for the simulation
lower frequency). The performances of the genetic algorithm
model are as in Table 1. The trimming proportions for GA-LTS and
based trimmed criterion function estimators are also reasonably
GA-LTA are taken as 80%. The root mean square errors (RMSE), the
good. The results for the two-component sinusoidal model with
average frequency estimates and the associated standard devia-
30% outliers are qualitatively same as the results of the one
tions over 100 simulations are computed for all the proposed
sinusoid. The results once again indicate satisfactory performance
methods. The theoretical asymptotic standard deviation of the
of the proposed robust estimators.
least squares estimator for s =0.01 is 1.044E 5 and that for
s =0.05 is 5.218E  5. We next investigate the performance of the
5.2. Simulation results for dependent error structure proposed estimators when data contains 30% outliers, under the
correlated error structure. Outliers were generated to have
In this subsection, we present the simulation studies for standard deviations 100 times that of the good observations and
sinusoidal models with dependent error structure. associated to a randomly selected subset of 30 observations. Two
representative plots of 30% outlier dataset in the dependent error
5.2.1. One sinusoid setup and the corresponding GA-L1 fit of the data are given in
We consider the following one-component sinusoidal model Figs. 2 and 3. The results for the non-outlier as well as the outlier
scenarios are presented in Table 4.
yðtÞ ¼ 5:0 cosð0:4tÞ þ4:0 sinð0:4tÞ þ eðtÞ; t ¼ 1; 2; . . . ; N: ð24Þ We observe that the proposed methods are able to resolve the
The error structure of e(t) is taken as unknown frequency with high level of accuracy for dependent
error structure as well. For the non-outlier scenario, GA-LS
eðtÞ ¼ 0:3eðt  1Þ þ dðtÞ; ð25Þ
performs the best closely followed by GA-L1. Even the perfor-
where d(t)s are i.i.d. normal noise sequence, with mean zero and mances of the least trimmed and least median based approaches
standard deviation s. We consider two different values of s, 0.01 provide fairly accurate estimates. From the simulations of the
and 0.05. The sample size is taken as 100. For each of the outlier study, we observe that the proposed robust frequency
simulated datasets, we estimated the frequency using the estimation methods perform quite well. While the performance of

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]] 7

the GA-LS deteriorates significantly as compared to non-outlier 5.2.2. Two sinusoids


case, the performance of GA-L1 and the least trimmed and least We consider the following two-component sinusoidal model
median approaches remain fairly stable even with 30% outlier
contamination in the data. We had observed similar pattern for yðtÞ ¼ 1:0 cosð0:3tÞ þ1:5 sinð0:3tÞ þ2:5 cosð0:8tÞ
independent errors also. The GA-L1 estimator performs the best in þ 2:0 sinð0:8tÞ þ eðtÞ; t ¼ 1; 2; . . . ; n: ð26Þ
this situation. A further investigation reveals that for s Z0.1, the
GA-LS totally breaks down, the robust frequency estimators still
The error structure of e(t) is taken as e(t)= 0.3e(t  1)+ d(t),
continue to give reasonably good results.
where d(t)s are i.i.d. normal random variables, with mean zero
and standard deviation s. The sample size is taken as 75. We have
considered two different values of s, 0.01 and 0.1. Performances of
the proposed estimators with 30% outlier contamination are also
investigated. As in the previous cases, outliers were generated to
have standard deviations 100 times that of the good observations
and are associated to a randomly selected subset. For each of the
simulated datasets, we estimated the frequencies using the
methods described in Section 4. The choice of the parameters
for the genetic formulation setup for the two-component
dependent error simulation model is similar to that of the two-
component independent error model. The results for the lower of
the two sinusoids are presented in Table 5 and the results for the
higher of the two sinusoids are presented in Table 6. A
representative plot of 30% outlier dataset in the two-component
dependent error setup is given in Fig. 4 and the data along with
the fit corresponding to GA-L1 solution is given in Fig. 5.
For the two-component model non-outlier cases, we observe
that the GA-LS performs the best closely followed by GA-L1. The
Fig. 3. A representative dataset of one-component dependent error (s = 0.05) theoretical asymptotic standard deviation of the LSE of o =0.8 at
sinusoidal model containing 30% outlier and the corresponding GA-L1fit for model s = 0.01 is 9.738E  6 and that for s =0.1 is 9.738E  5 and the
(24).
asymptotic standard deviation of the LSE of o = 0.3 at s = 0.01 is

Table 4
Simulation results for one sinusoid dependent error model (24).

s Method No outlier 30% outlier

Average St. Dev. RMSE Average St. Dev. RMSE

0.01 GA-LS 0.4000 1.009E 5 1.011E  5 0.4000 3.218E  4 3.227E  4


GA-LTS 0.4000 4.363E  5 4.364E 5 0.4000 4.828E  5 4.842E  5
GA-LMS 0.4000 5.433E 5 5.455E  5 0.4000 6.705E 5 6.709E 5
GA-L1 0.4000 1.336E 5 1.339E 5 0.4000 1.849E 5 1.875E  5
GA-LTA 0.4000 6.937E 5 6.999E  5 0.4000 7.003E 5 7.011E 5
GA-LMA 0.4000 7.444E  5 7.447E 5 0.4000 7.460E  5 7.476E 5

0.05 GA-LS 0.4000 5.174E  5 5.282E  5 0.4000 1.355E 3 1.356E 3


GA-LTS 0.4000 1.179E  4 1.179E 4 0.4000 1.556E  4 1.557E  5
GA-LMS 0.4000 1.330E 4 1.331E  4 0.4000 1.432E  4 1.464E 4
GA-L1 0.4000 5.543E  5 5.735E 5 0.4000 9.303E  5 9.309E  5
GA-LTA 0.4000 1.086E  4 1.094E  4 0.4000 1.105E  4 1.112E  4
GA-LMA 0.4000 1.219E  4 1.229E  5 0.4000 1.237E 4 1.249E  4

Table 5
Simulation results for the lower of the two sinusoids for model (26).

s Method No outlier 30% outlier

Average St. Dev. RMSE Average St. Dev. RMSE

0.01 GA-LS 0.3000 7.082E  5 7.117E  5 0.3000 1.296E  3 1.296E  3


GA-LTS 0.3000 1.890E  4 1.918E 4 0.3000 4.922E  4 4.930E  4
GA-LMS 0.2999 6.064E 4 6.161E  4 0.3000 7.171E  4 7.183E 4
GA-L1 0.3000 9.723E  5 9.753E  5 0.3000 1.265E  4 1.290E  4
GA-LTA 0.3000 2.703E 4 2.717E  4 0.3000 2.989E  4 2.997E  4
GA-LMA 0.3000 6.398E  4 6.398E  4 0.3000 6.652E  4 6.658E  4

0.1 GA-LS 0.3000 6.021E  4 6.036E  4 0.3366 1.057E 1 1.119E 1


GA-LTS 0.3000 6.930E  4 6.934E  4 0.3000 8.987E  4 8.987E  4
GA-LMS 0.2999 8.027E 4 8.030E  4 0.3000 8.991E 4 8.996E  4
GA-L1 0.3000 6.151E  4 6.157E  4 0.3000 7.816E  4 7.832E  4
GA-LTA 0.3001 6.703E 4 6.800E  4 0.3001 7.870E 4 7.884E  4
GA-LMA 0.3000 8.834E  4 8.365E  4 0.3000 8.966E 4 8.969E  4

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
8 A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

Table 6
Simulation results for the higher of the two sinusoids for model (26).

s Method No Outlier 30% Outlier

Average St. Dev. RMSE Average St. Dev. RMSE

0.01 GA-LS 0.8000 3.650E  5 3.678E 5 0.8001 8.241E 4 8.312E 4


GA-LTS 0.8000 9.397E  5 9.416E  5 0.8000 2.040E 4 2.044E 4
GA-LMS 0.8001 2.153E  4 2.346E  4 0.8000 3.454E  4 3.485E 4
GA-L1 0.8000 4.911E  5 4.911E 5 0.8000 6.465E  5 6.558E  5
GA-LTA 0.8000 9.362E  5 9.363E  5 0.8000 9.448E 5 9.450E  5
GA-LMA 0.8000 2.230E  4 2.236E  4 0.8000 3.909E  4 3.931E  4

0.1 GA-LS 0.8000 2.962E  4 2.963E  4 0.7992 1.256E  2 1.262E  2


GA-LTS 0.8000 3.561E  4 3.571E  4 0.7999 5.707E  4 5.743E 4
GA-LMS 0.8000 5.257E  4 5.265E  4 0.8001 6.121E 4 6.151E  4
GA-L1 0.8000 3.440E  4 3.450E  4 0.8000 4.283E  4 4.298E  4
GA-LTA 0.8000 3.710E  4 3.710E  4 0.8000 4.737E  4 4.738E 4
GA-LMA 0.8000 5.117E  4 5.121E  4 0.8000 5.516E  4 5.519E 4

estimates. For the outlier contaminated data, the performance of


the GA-LS deteriorates significantly, especially for the higher s
value. The GA-L1 and the GA trimmed and median based
approaches appear to be fairly robust with respect to outliers in
the data. Among the robust methods GA-L1 performs the best.

5.3. Real life data analysis

In this subsection, we present the real life data analysis results.


Two different datasets, the ‘Circadian Rhythms’ data and the
‘Variable Star’ data, are considered for analysis.

5.3.1. Fitting Circadian Rhythms data


We consider the ‘Circadian Rhythm’ dataset. The data was
collected at the Princeton University in the late 1960s under the
direction of Dr. C.S. Pittendrich. In order to observe the
periodicities in the behavior of Perognathus formosus (also called
long-tail pocket mouse), a nocturnal mammal, the animal was
Fig. 4. A representative dataset of two-component, dependent error sinusoidal
given 8 days of 12 hours light and 12 hours darkness as an
model (26) containing 30% outlier.
adjustment period, which was followed by about 73 days of
constant darkness (Andrews and Herzberg, 1985). The data are
temperature recordings made at 2-min intervals over 3 months. It
is known that problems occurred during the experiment asso-
ciated with transient failures of the monitoring equipment and
with imperfections in the data logging process. As a result of
which the data contains a good proportion of outliers. The data
have been downloaded from http://www.statsci.org/data/general/
pformosu.html.
For the analysis of the Circadian Rhythms dataset, we analyze
20-min averages of the temperatures. We fit a one-component
sinusoid model of the form
yðtÞ ¼ K þA cosðotÞ þ B sinðotÞ
to the Circadian data using GA-LTA (50% trimming) approach. The
parameter initialization for genetic search is made in the
following ranges:

Parameter Initialization range

K [Median(y(1)yy(n))  50, Median(y(1)yy(n)) + 50]


Fig. 5. A representative dataset of two-component dependent error sinusoidal
model containing 30% outlier and the corresponding GA-L1fit for model (26).
A [  100, 100]
B [  100, 100]
o [0, p]
1.228E  5 and that for s = 0.1 is 1.228E 4. The GA-LS almost
The final fitted model, for the first 8 days data, arrived after 13
attains these above-mentioned asymptotic values for the respec-
GA generations is given below:
tive frequencies. The GA estimators based on least trimmed
(GA-LTS and GA-LTA) approaches also provide reasonably accurate yðtÞ ¼ 370:49 þ 20:69 cosð0:08672tÞ þ 12:12 sinð0:08672tÞ:

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]] 9

Fig. 6. Circadian data fitting using GA-LTA.


Fig. 8. Variable star (blue band) data fitting using GA-LTA.

recorded separately for the blue and red bands. Observation times
were irregularly spaced depending on the conditions of sky and
the observation schedule. We consider here the analysis of blue
band measurements. A number of observations were considered
to be unreliable due to observation conditions. The data have been
downloaded from http://www.statsci.org/data/oz/ceph2.html.
We fit a two-component sinusoid model of the form
yðtÞ ¼ K þA1 cosðo1 tÞ þ B1 sinðo1 tÞ þA2 cosðo2 tÞ þ B2 sinðo2 tÞ

using GA-LTA (50% trimming) approach. The initial population is


populated from the following ranges of the respective parameters:

Parameter Initialization range

K [Median(y(1)yy(n))  0.5,
Median(y(1)yy(n))+ 0.5]
A1, A2 [ 0.5, 0.5]
B1, B2 [ 0.5, 0.5]
Fig. 7. Circadian data fitting using GA-LMA. o1,o2 [0, p]

The plot of the fitted and the observed data with outliers is The final fitted model is
given in Fig. 6. yðtÞ ¼ 0:01992  0:0128 cosð0:1247tÞ þ0:2032 sinð0:1247tÞ
It is obvious from the data plot that the dataset contains a large þ 0:4613 cosð0:2467tÞ  0:0532 sinð0:2467tÞ:
number of outliers, but the fitted curve successfully ignores them
and follows nicely the periodic pattern. The less obvious outliers, The plot of the observed data and the fitted model is given in
closer to the fitted curve, also do not distort the data fit. We get Fig. 8. We observe from the plot that the fit ignores the outliers
similar results using other proposed outlier-insensitive robust and is able to trace the correct sinusoidal pattern.
frequency estimation techniques. Fig. 7 gives fit of the data using Smyth and Hawkins (2000) considered the same dataset for
GA-LMA estimator. The fitted model under this approach is testing the usefulness of their robust frequency estimation
technique. For implementation of their method, which requires
yðtÞ ¼ 369:84 þ 19:01 cosð0:08711tÞ þ 12:51 sinð0:0:08711 tÞ: time points to be equidistant, the data was first interpolated
Similar fits are observed for other methods, except the GA-LS linearly onto an equally spaced grid of time points of the same
method, which fails completely. Considering the same dataset, it length, no such preprocessing of the data is required for
is reported in Smyth and Hawkins (2000) that the fitted frequency implementation of our methods. The estimated frequencies
for the first 8 days to be 0.87273, which is very close to our reported in Smyth and Hawkins (2000) are 0.126 and 0.253,
frequency estimates. which once again are close to our frequency estimates.
The GA-LTA estimates of the frequencies 0.1247 and 0.2461
5.3.2. Fitting Variable Star data correspond to periods of 50 and 25 days. The star is therefore
The variable star dataset is another important and very determined to be periodic with period of about 50 days.
frequently used data. The determination of the periodicities of a
variable star and the shape of its light curve is important in
studies of stellar structure and evolution. The relationship 6. Conclusion
between the period and magnitude is used to determine distances
on a cosmic scale, for example. The data in this example gives In this paper, we propose genetic algorithm based robust
observations on the magnitude of a variable star, made from the frequency estimation techniques for multiple sinusoidal models
Mount Stromlo Observatory near Canberra in Australia over a with correlated error structures. The proposed methods use
period of about 250 days (Reimenn, 1994). Magnitudes were genetic search technique for optimizing various outlier-insensitive

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001
ARTICLE IN PRESS
10 A. Mitra, D. Kundu / Engineering Applications of Artificial Intelligence ] (]]]]) ]]]–]]]

criterion functions. The methods do not require the data points to De Jong, K.A., 1975. An analysis of the behavior of a class of genetic adaptive
be equidistant or the noise sequence to be independent Gaussian systems. Ph.D. Thesis, University of Michigan.
Goldberg, D.E., 1989. In: Genetic Algorithms in Search, Optimization and Machine
structure, which is otherwise required in other robust frequency Learning. Pearson Education Inc..
estimation techniques for this model (see for example Smyth and Hannan, E.J., 1971. Non-linear time series regression. Journal of Applied Probability
Hawkins, 2000). Furthermore, the GA based robust frequency 8, 767–780.
Hollstien, R.B., 1971. Artificial genetic adaptation in computer control systems.
estimation techniques search a population of possible optimal Doctoral Dissertation, University of Michigan, Dissertation Abstracts Interna-
solutions in parallel and do not require derivative information or tional, 32(3), 1510B, University Microfilms No. 71-23,773.
other auxiliary information, only the levels of fitness influence the Kahn, M., Osborne, M.R., Smyth, G.K., 1993. On the consistency of Prony’s method
and related algorithms. Journal of Computational and Graphical Statistics 1,
direction of search. Another advantage of using the proposed
329–349.
methods is that since they are based on genetic algorithms they Kay, S.M., 1988. Modern Spectral Estimation: Theory and Applications. Prentice-
use probabilistic transition rules and have potentially high chance Hall, New York.
Kundu, D., 1993. Asymptotic theory of least-squares estimators of a particular non-
of converging to the optimal solution.
linear regression model. Statistics & Probability Letters 18, 13–17.
In the simulation studies and real life data analysis, it is Kundu, D., 1997. Asymptotic theory of least-squares estimators of sinusoidal
observed that the proposed genetic algorithm based robust signals. Statistics 30 (3), 221–238.
frequency estimators, optimizing outlier-insensitive criteria are Kundu, D., Mitra, A., 1996. Asymptotic theory of the least-squares estimators of a
nonlinear time series regression model. Communications in Statistics, Theory
able to resolve frequencies of the sinusoidal model with high Methods 25, 133–141.
degree of accuracy and provides reasonably high breakdown point Mackisack, M.S., Osborne, M.R., Smyth, G.K., 1994. A modified Prony algorithm for
robust estimates. estimating sinusoidal frequencies. Journal of Statistical Computation and
Simulation 49, 111–124.
Mitra, A., Kundu, D., 1997. Consistent method of estimating sinusoidal frequencies:
a non-iterative approach. Journal of Statistical Computation and Simulation 58,
Acknowledgement 171–194.
Nandi, S., Iyer, S., Kundu, D., 2002. Estimating the frequencies in presence of heavy
tail errors. Statistics and Probability Letters 58 (3), 265–282.
The work is supported by Department of Science & Technology, Quinn, B.G., Hannan, E.J., 2001. In: The Estimation and Tracking of Frequency.
Government of India, Grant no. SR/S4/MS:374/06. Cambridge University Press, Cambridge.
Reimenn, J.D., 1994. Frequency estimation using unequally-spaced astronomical
data. Ph.D. Thesis, University of California, Berkeley.
References Rice, J.A., Rosenblatt, M., 1988. On frequency estimation. Biometrika 75, 477–484.
Rousseeuw, P.J., 1984. Least-median of squares regression. Journal of American
Andrews, D.F., Herzberg, A.M., 1985. In: Data: A Collection of Problems from Many Statistical Association 79, 871–880.
Fields for the Student and Research Worker. Springer, New York. Rousseeuw, P.J., 1988. Robust estimation and identifying outliers. In: Wadsworth,
Barrodale, I., Roberts, F.D.K., 1973. An improved algorithm for discrete L1 linear H.M. (Ed.), Handbook of Statistical Methods for Engineers and Scientists.
approximations. SIAM Journal of Numerical Analysis 10, 839–848. McGraw-Hill, New York (Chapter 17).
Barrodale, I., Roberts, F.D.K., 1974. Solution of an overdetermined system of Rousseeuw, P.J., Leroy, A.M., 1987. In: Robust Regression and Outlier Detection.
equations in the L1 norm. Communications of the ACM 17, 319–320. Wiley, New York.
Bloomfield, P., Steiger, W.L., 1983. Least Absolute Deviations: Theory, Applications, Smyth, G.K., Hawkins, D.M., 2000. Robust frequency estimation using elemental
and Algorithms. Birkhauser, Boston, Mass. sets. Journal of Computational and Graphical Statistics 9, 196–214.
Bonaventura, A., Coluccio, L., Fedele, G., 2007. Frequency estimation of multi- Stoica, P., 1993. List of references on spectral line analysis. Signal Processing 31 (3),
sinusoidal signal by multiple integrals. In: IEEE International Symposium on 329–340.
Signal Processing and Information Technology, pp. 564–569. Stoica, P., Moses, R., 2005. Spectral Analysis of Signals. Prentice-Hall, Upper Saddle
Brillinger, D.R., 1987. Fitting cosines: some procedures and some physical River, NJ.
examples. In: MacNeill, B., Umphrey, G.J. (Eds.), Applied Probability and Thierens, D., 1997. Selection schemes, elitist recombination, and selection intensity.
Stochastic Process and Sampling Theory. D. Reidel Publishing Company, USA, In: Back, T. (Ed.), Proceedings of the Seventh International Conference on
pp. 75–100. Genetic Algorithms. San Francisco, USA, pp. 152–159.
Caruana, R.A., Schaffer, J.D., 1988. Representation and hidden bias: Gray vs. binary Trapero, J.R., Sira-Ramirez, H., Batlle, V.F., 2007. An algebraic frequency estimator
coding. In: Proceedings of the Sixth International Conference Machine for a biased and noisy sinusoidal signal. Signal Processing 87 (6), 1188–1201.
Learning, pp. 153–161. Walker, A.M., 1971. On the estimation of the Harmonic components in a time series
Chan, K.W., So, H.C., 2004. Accurate frequency estimation for real harmonic with Stationary residuals. Biometrika 58, 21–26.
sinusoids. IEEE Signal Processing Letters 11 (7), 609–612. Yokose, Y., Cingoski, V., Kaneda, K., Yamashita, H., 2000. Performance comparison
Coluccio, L., Eisinberg, A., Fedele, G., 2008. A property of the elementary symmetric between gray coded and binary coded genetic algorithms for inverse shape
functions on the frequencies of sinusoidal signals. Issue Series Title: Signal optimization of magnetic devices. Applied Electromagnetics (pp. 115–120),
Processing, doi:10.1016/j.sigpro.2008.10.021. 115–120.

Please cite this article as: Mitra, A., Kundu, D., Genetic algorithms based robust frequency estimation of sinusoidal signals with
stationary errors. Engineering Applications of Artificial Intelligence (2009), doi:10.1016/j.engappai.2009.07.001

You might also like