Bci Unit 3
Bci Unit 3
Bci Unit 3
UNIT-III
Time/Space methods – Fourier Transform, PSD -Wavelets - Parametric methods-AR, MA, ARMA
models-PCA -Linear and Nonlinear Features.
UNIT-III
1.Introduction:
Brain-computer Interfaces (BCIs) are control and communication systems based on acquisition
and processing of brain signals to control a computer or an external device. Usually, BCI is focused in
recognizing acquired events by different neuroimage methods, but the most used is the
electroencephalography (EEG). Feature extraction over EEG signals for BCI systems is crucial to the
classification performance.
Features are extracted from the raw data by means of PSD, and alpha and beta band power. Even
though results shows the discrimination ability given by such features, it is worth to note the
dependent variability of each subject, also it is important to determine the most prominent
electrodes for each task.
features that can be used for further analysis. In the context of image processing, this involves
converting pixels into a form that a machine learning model can understand and utilize, typically
resulting in a feature vector that encapsulates the essential aspects of the input data.
Figure 3: Feature extraction is the thread that weaves raw data into patterns of insight.
Process: It involves transforming raw data (like pixels in an image) into a set of usable features.
In deep learning, this is typically done through a series of convolutional layers.
Layers Involved: Early layers of a convolutional neural network (CNN) capture basic features
like edges and textures, while deeper layers capture more complex features like patterns or
specific objects.
Output: The output is a high-dimensional vector or set of vectors that succinctly represent the
important aspects of the input data.
of data. Raw data, like the pixel values of an image, are often too voluminous and complex for direct
analysis. Feature extraction distills this data into a more manageable form, retaining only the most
relevant information.
machine learning models. By providing a clear, concise representation of the data, it allows models to
3. Facilitating Transfer Learning: In the realm of deep learning, pre-trained models on extensive
datasets (like ImageNet) serve as powerful feature extractors. These pre-trained models can be
repurposed for various tasks, significantly reducing the time and resources required for model training.
1. Balancing Complexity and Performance: A significant challenge is balancing the complexity of the
feature extractor with the computational resources available. More complex models may offer better
feature extraction but at the cost of increased computational demands.
2. Generalization: Another challenge is ensuring that feature extractors generalize well to new, unseen
data. This is particularly important in applications like autonomous vehicles and medical image
analysis, where errors can have serious consequences.
UNIT-3 CBM342/BCI III YR/VI SEM
7
********************************************************************
Explain Fourier Transform method used in feature extraction
*********************************************************************
2.Fourier Transform:
• The Fourier Transform is a powerful tool in feature engineering, widely used in various fields
like signal processing, image analysis, and data science.
• Its significance lies in transforming time or space-based signals into the frequency domain,
offering a different perspective to analyze and process data. In this essay,
• we’ll explore the concepts of Fourier Transform and its application in feature engineering.
EQUATION
The continuous Fourier Transform (for a signal x(t)) is given by:
∞
X(f) =∫−∞ 𝑥(𝑡)𝑒 −𝑗2𝜋𝑓𝑡 𝑑𝑡
Where:
X(f) is the Fourier Transform of x(t),
x(t) is the input signal in the time domain,
f is the frequency, and
j is the imaginary unit.
Block Diagram:
The block diagram for the Fourier Transform involves taking the input signal x(t) and passing it
through a Fourier Transform block. The output is X(f), which represents the signal in the frequency
domain.
+--------+ +----------------------+
| x(t) | ----> | Fourier Transform | ----> X(f)
+--------+ +----------------------+
.**********************************************************************************
UNIT-3 CBM342/BCI III YR/VI SEM
8
************************************************************************
Explain the significance and importance of Power spectral density in feature
extraction of EEG signals
************************************************************************
𝑁 2
1
𝑆(𝑒 𝑗𝜔 ) = |∑ 𝑥𝑛 𝑒 𝑗𝜔𝑛 |
2𝜋𝑁
𝑛=1
or
Equation:
The Power Spectral Density (for a signal x(t)) is often defined as the Fourier Transform of
the autocorrelation function Rxx(τ):
∞
Where:
Sxx(f) is the PSD,
Block Diagram:
The block diagram for PSD involves computing the autocorrelation function Rxx(τ) and then
passing it through a Fourier Transform block.
The distribution of average power of a signal x(t) in the frequency domain is called the
power spectral density (PSD) or power density (PD).
A Power Spectral Density (PSD) is the measure of signal's power content versus
frequency.
The power spectral density (PSD) which represents the power distribution of EEG series in
the frequency domain is used to evaluate the abnormalities of AD brain.
The power spectral density (PSD) or power spectrum represents the proportion of the total
signal power contributed by each frequency component of a voltage signal.
It is computed from the DFT as the mean squared amplitude of each frequency component,
averaged over the n samples in the digitised record.
The PSD is a real, not a complex, quantity, expressed in terms of squared signal units per
frequency units and can be plotted as a single graph.
The relationship between power spectral density and frequency is that each element of the
PSD is a measure of the signal power contributed by frequencies within a band of width ∆f
centred on the frequency k ∆f.
The variance of the original digitised record can be computed from the integral of the PSD
EEG relative power can be calculated by comparing the power values of specific frequency
bands in the EEG data with the power values of a control group.
To calculate EEG relative power, the program POTENCOR uses Fourier analysis to separate
frequency components and calculates the normalized data for relative power.
PSD is a good tool for stationary signal processing and suitable for narrowband signal. It is a
common signal processing technique that distributes the signal power over frequency and
show the strength of the energy as function of frequency.
Power Spectra Density was calculated by using Welch and Burg Method to extract the
features from filtered data.
A. Welch Method
Generally, the Welch method of the PSD can be described by the equations below, the
power spectra density equation is defined first. Then Welch Power Spectrum that mean
average of the periodogram for each interval is expressed.
𝑀−1 2
1
𝑃 (𝑓 ) = | ∑ 𝑥𝑖 (𝑛)𝑤(𝑛)𝑒 −𝑗2𝜋𝑓 |
𝑀𝑈
𝑛=0
1 𝐿−1
𝑃𝑤𝑒𝑙𝑐ℎ (𝑓) = ∑ 𝑃(𝑓)
𝐿 𝑖=0
B. Burg Method
Burg method is a method that diminishing the forward and backward prediction errors so it
satisfy the Levinson-Durbin Recursion.
With higher order of Burg Model, the accuracy become lower, and false peaks will be
inferred in the spectra.
The Burg method is highly suitable for short data records as it can generate accurate
prediction and always produces a stable model.
Overall, the Burg method of PSD can be computed through following equation:
̂𝒑
𝑬
𝑷𝒃𝒖𝒓𝒈 (𝒇) = 𝟐
|𝟏 + ∑𝒑𝒌=𝟏 𝒂
̂ 𝒑 (𝒌)𝒆−𝟐𝒋𝝅𝒇 |
On the whole, the effects on PSDs suggest that researchers should be careful while making
choices in EEG transformation and time-window since they seemed to have the most effects on
PSDs. Artifact removal, filter, and PSD estimation method choices may have less effect on
PSDs,whichcan possibly be ignored in trial-to-trial studies.
.
Figure 5. Flow diagram of the processing methods carried out to estimate powers and phases for four frequency
bands. The selection choices of fives methods were explored highlighted with diamond shape: artifact removal,
electroencephalogram (EEG) transformation, filtering, time window selection, and power spectral density (PSD)
estimation. The estimated powers and phases were used to find the correlation between the choices.
***************************************************************************
Explain Wavelet method in detail and how it is used to extract feature from EEG.
***************************************************************************
4.Wavelet Transform:
Equation:
The Continuous Wavelet Transform (CWT) is given by:
∞ 𝟏 𝒕−𝒃
W(a,b) =∫−∞ 𝒙(𝒕) 𝒂 𝝋∗ ( 𝒂 ) 𝒅𝒕
√
Where:
W(a,b) is the wavelet transform,
x(t) is the input signal,
ψ∗(t) is the complex conjugate of the wavelet function,
a is the scale parameter, and
b is the translation parameter.
Block Diagram:
The block diagram for the Continuous Wavelet Transform involves scaling and translating the
wavelet function to analyze the input signal at different scales and positions.
+--------+ +------------------------+
| x(t) | ----> | Continuous Wavelet | ----> W(a, b)
+--------+ | Transform |
• Each of these methods has its own strengths and weaknesses in different applications.
• Fourier Transform is excellent for analyzingthe frequency content of a signal.
• PSD gives information about the distribution of power with respect to frequency, and Wavelet
Transform is valuable for analyzing signals in both time and frequency domainssimultaneously,
making it useful for non-stationary signals.
• Wavelet Transform is suitable for nonstationary signals and has advantage over spectral analysis.
• For time frequency representation of a signal wavelet is an effective method. The important feature
of WT is that it provides accurate frequency information at the low frequencies and accurate time
information at the high frequencies.
• This property is important in biomedical applications. Because most signals in the biomedical field
always contain high frequency components with short time period and low frequency components
with long time period. The WT provides multiresolution analysis of nonstationary signals. It is
shown in Fig.
• Where g[n] is high-pass filter and h[n] is low-pass filter. WT is most suitable for location of
transient events. It has advantage over spectral analysis Here EEG signal is decomposed into D1-
D4 levels.
• Wavelet overcomes the limitations of short time fourier transform (STFT). In serious patients
detection of disorder in the brain using conventional method is very inconvenient.
• Frequency content in the EEG signal provides useful information as compared to time domain.
The mother function ψ(n) is convolved with the signal x(n).
where a is called as scale coefficient and b is called shift coefficient. Formation of mother wavelet
is important because when it is fixed then it is easy to understand signal at possible coefficients a
and b.
Explain the parametric methods AR,MA and ARMA models in detail used for analysis of
features in EEG.
*********************************************************************************
5.Parametric methods
• AR, MA, ARMA, and ARIMA models are used to forecast the observation at (t+1) based on the
historical data of previous time spots recorded for the same observation.
• However, it is necessary to make sure that the time series is stationary over the historical data of
observation overtime period.
• If the time series is not stationary then we could apply the differencing factor on the records and
see if the graph of the time series is a stationary overtime period.
LTIsystemmodel
• In the model given below, the random signal x[n] is observed. Given the observed signal x[n],
the goal here is to find a model that best describes the spectral properties of x[n] under the
following assumptions
• x[n] is WSS(WideSense Stationary) and ergodic.
The input signal to the LTI system is white noise following Gaussian distribution – zero mean
and varianceσ2.
• The LTI system is BIBO (Bounded Input Bounded Output) stable.
UNIT-3 CBM342/BCI III YR/VI SEM
14
In the model shown above, the input to the LTI system is a white noise following Gaussian distribution
– zero mean and variance σ2. The power spectral density (PSD) of the noise w[n] is
Sωω(ejω) =𝜎 2
The noise process drives the LTI system with frequency response H(ejɷ) producing the signal of
interest x[n]. The PSD of the output process is therefore,
Sωω(ejω) =𝜎 2 ⃒𝐻(𝑒𝑗𝑤)⃒2
Three cases are possible given the nature of the transfer function of the LTI system that is under
investigation here.
1
H(ejω) =∑𝑁 −𝑗𝑘𝜔
, a0 =1
𝑘=0 𝑎𝑘 𝑒
The transfer function H(ejɷ) is an all-pole transfer function (when the denominator is set to zero, the
transfer function goes to infinity -> creating peaks in the spectrum). Poles are best suited to model
resonant peaks in a given spectrum. At the peaks, the poles are closer to unit circle. This model is well
suited for modeling peaky spectra.
5.2. Moving Average (MA) models (all-zeros model)
In the MA model, the present output sample x[n] is determined by the present source input w[n] and
past N samples of source input w[n]. The difference equation that characterizes this model is given by
x[n] =b0 w[n] + b1 w[n-1] + b2w[n-2] + … + bM w[n-M]
Here, the LTI system is an Finite Impulse Response (FIR) filter. This is evident from the fact that
the above equation that no feedback is involved from output to input. The frequency response of
the FIR filter is well known
H(ejω) =∑𝑀
𝑘=0 𝑏𝑘 𝑒
−𝑗𝑘𝜔
The transfer function H(ejɷ) is an all-zero transfer function (when the numerator is set to zero, the
transfer function goes to zero -> creating nulls in the spectrum). Zeros are best suited to model
sharp nulls in a given spectrum.
captures the shocks or unexpected events in the past that are still affecting the series.
Combined Models:
Often, these models are combined to model and forecast time series data more effectively:
ARMA (Autoregressive Moving Average): This model combines both AR and MA
components.
ARIMA (Autoregressive Integrated Moving Average): This model adds an “I” (integrated)
component, which involves differencing the series to make it stationary before applying an
ARMA model.
Both AR and MA models (and their combinations) are foundational in time series forecasting, and
their applicability depends on the characteristics of the data and the nature of the underlying
processes generating the time series.
∑𝑀 𝑏𝑘 𝑒 −𝑗𝑘𝜔
H(ejω) =∑𝑁𝑘=0 −𝑗𝑘𝜔
,a0 =1
𝑘=0 𝑎𝑘 𝑒
The transfer function H(ejɷ) is a pole-zero transfer function. It is best suited for modelling complex
spectra having well defined resonant peaks and nulls.
In the AR model, the present output sample x[n] and the past N-1 output samples determine the
source input w[n]. The difference equation that characterizes this model is given by
x[n] +a1x[n-1] +a2 x[n-2] +…+aNx[n-N] =w[n]
The model can be viewed from another perspective, where the input noise w[n] is viewed as an
error – the difference between present output sample x[n] and the predicted sample of x[n] from
the previous N-1 output samples. Let’s term this “AR model error”. Rearranging the difference
equation,
w[n] =x[n] - (− ∑𝑁
𝑘=1 𝑎𝑘 𝑥 [𝑛 − 𝑘 ])
• The summation term inside the brackets are viewed as output sample predicted from past N-
1 output samples and their difference being the error w[n].
• Least squared estimate of the co-efficientsak are found by evaluating the first derivative of the
squared error with respect to ak and equating it to zero finding the minima.
• From the equation above, w2[n] is the squared error that we wish to minimize. Here, w2[n] is a
quadratic equation of unknown model parameters ak.
• Quadratic functions have unique minima, therefore it is easier to find the Least Squared
Estimates of ak by minimizing w2[n].
ARMA model error and minimization
w[n] = x[n] – ( − ∑𝑵 𝑴
𝒌=𝟏 𝒂𝒌 𝒙[𝒏 − 𝒌] + ∑𝒌=𝟏 𝒃𝒌 𝒘[𝒏 − 𝒌])
Now, the predictor (terms inside the brackets) considers weighted combinations of past values of
both input and output samples.
The squared error, w2[n] is NOT a quadratic function and we have two sets of unknowns ak and bk.
Therefore, no unique solution may be available to minimize this squared error-since multiple minima
pose a difficult numerical optimization problem.
ARMA Process
The ARMA process of order (p, q) is obtained by combining an MA(q) process and an AR(p)
processes. That is, it contains p AR terms and q MA terms and is given by
𝒑 𝒒
Yn =∑𝒌=𝟏 𝜶𝒌 𝒀𝒏 − 𝒌 + ∑𝒌=𝟎 𝜷𝒌 𝑾𝒏 − 𝒌 𝒏 ≥ 𝟎
A structural representation of the ARMA process is a combination of the structures
One of the advantages of ARMA is that a stationary random sequence (or time series) may be more
adequately modeled by an ARMA model involving fewer parameters than a pure MA or AR process
alone. Since E[W[n − k]] = 0 for k = 0, 1, 2, …, q, it is easy to show that E[Y[n]] = 0. Similarly, it can
be shown that the variance of Y[n] is given by
𝑝 𝑞
Thus, the variance is obtained as the weighted sum of the autocorrelation function evaluated at
different times and the weighted sum of various crosscorrelation functions at different times. Finally, it
can be shown that the transfer function of the linear system defined by the ARMA(p, q) is given by
∑𝑞
𝑘=0 𝛽
𝑘 𝑒−𝑗Ω𝑘
HΩ = 𝑝
1−∑𝑘=0 𝛼𝑘 𝑒−𝑗Ω𝑘
***************************************************************************
Explain the concept of Principal component Analysis in detail for feature selection in BCI
****************************************************************************
• PCA is a multivariate analytical method based on the linear transformation that is often used to
reduce the dimensionality of the data, to extract significant information from big data, to analyze
the variable structures, etc.
• The PCA method has been used for dimensionality reduction of the EEG signals . Since the
spatial resolution of the EEG signal is poor, considering all channels for feature extraction is just
increasing the burden.
• :
4. Lastly, we quantify the importance of these relationships using Eigenvalues and keep
the important principal components.
UNIT-3 CBM342/BCI III YR/VI SEM
20
Step by step, Principal Component Analysis unveils the hidden layers of data complexity,
simplifying the intricate to reveal the essential.
Algorithm
Principal Component Analysis (PCA) is a widely used technique in data analysis and
dimensionality reduction. It helps in identifying the most significant patterns and reducing the
complexity of high-dimensional data while preserving its essential information. This essay will
explain the steps involved in performing PCA:
SCORE MATRIX GENERATION
Step 1: Data Collection and Standardization Before applying PCA, gather your data. Ensure that
your data is numeric, as PCA is primarily suited for numerical data. If your data has categorical
variables, you may need to preprocess them.
Next, standardize the data. Standardization is important because PCA is sensitive to the
scales of variables. Standardization transforms the data so that each variable has a mean of 0 and a
standard deviation of 1. This ensures that all variables are on the same scale.
Step 2: Covariance Matrix Calculation The first step in PCA is to compute the covariance matrix
of the standardized data. The covariance matrix represents the relationships between variables.
Each element in the matrix represents the covariance between two variables.The formula for the
covariance between two variables X and Y is:
1
Cov(X,Y) =𝑁−1 ∑𝑁 ̅ ̅
𝑖=1(𝑋𝑖 − 𝑋) ( 𝑌𝑖 − 𝑌)
Where:
N is the number of data points.
Xi and Yi are individual data points.
ˉXˉ and ˉYˉ are the means of variables X and Y, respectively.
The covariance matrix is a square matrix, with each element representing the covariance between
two variables.
Step 3: Eigenvalue and Eigenvector Computation After computing the covariance matrix, the
next step is to find the eigenvalues and eigenvectors of this matrix.
These are crucial in determining the principal components.Eigenvalues (λ) and eigenvectors (v) are
obtained by solving the following equation:
Covariance Matrix x ν = λ x ν
1
̅) ( 𝑌𝑖 − 𝑌̅ )
Cov (X,Y) =𝑁−1 ∑𝑁𝑖=1(𝑋𝑖 − 𝑋
In this equation, λ represents the eigenvalue, and v represents the eigenvector. You’ll have as many
eigenvalues and eigenvectors as the number of variables in your data.
Step 4: Sorting and Selecting Principal Components The eigenvalues represent the amount of
variance in the data that each eigenvector explains. To reduce the dimensionality, sort the
eigenvalues in descending order.
The eigenvector corresponding to the largest eigenvalue explains the most variance and is the first
principal component. The second largest eigenvalue corresponds to the second principal
component, and so on.
Typically, you’ll select a subset of the top eigenvalues/eigenvectors that explain most of the
variance in the data while reducing the dimensionality. You can decide on the number of principal
components to keep based on a variance explained threshold (e.g., 95% of the total variance).
Step 5: Data Transformation To reduce the dimensionality of your data, create a projection
matrix using the selected eigenvectors (principal components). This matrix represents the
transformation needed to project the data into the new reduced-dimensional space.
Multiplystandardized data by this projection matrix to obtain the new data in the principal
component space.
Step 6: Interpretation and Analysis Once the data is transformed, you can interpret the principal
components and their relationships to the original variables. This is crucial for understanding the
most significant patterns in the data.
Explain how PCA is used for linear and non-linear component Analysis in detail
******************************************************************************
Next the eigenvectors and eigenvalues are computed andassorted according to decreasing
eigenvalue. Call these eigenvectors e1 with eigenvalue λ1, e2 with eigenvalue λ2, and so on, and
choose the k eigenvectors having the largest eigenvalues.
Often there will be just a few large eigenvalues, and this implies that k is the inherent
dimensionality of the subspace governing the "signal” while the remaining d -k dimensions generally
contain noise.
Nextwe form a dx k matrix A whose columns consist of the k eigenvectors. The representation
of data by principal components consists of projecting the data onto the k-dimensional subspace
according to
AUTO ENCODER
Each pattern of the data set is presented to both the input and output layers and the full
network trained by gradient descenton a sum-squared-error criterion, for instance by
backpropagation.
It can be shown that this representation minimizes a squared error criterion.After the network
is trained,the top layer is discarded and the linear hidden layer provides the principal components.
FIGURE 16. A three-layer neural network with linear hidden units, trained to bean auto-encoder, develops an internal
representation that corresponds to the principalcomponents of the full data set. The transformation F is a linear projection
onto a k-dimensional subspace denoted Г(F2).
• Principal component analysis yields a k-dimensional linear subspace of feature space that best
represents the full data according to a minimum-square-error criterion,
• If the data represent complicated interactions of features, then the linear subspace may be a
poor representation and nonlinear components may be needed.
• A neural network approach to such nonlinear component analysis employs a network with five
layers of units, as shown in Fig.17
• The middle layer consists of k < d linear units, and it is here that the nonlinear components will
be revealed. It is important that the two other internal layers have nonlinear units.
• The entire network is trained using the techniques as an auto encoder or auto-associator. That
is, each d-dimensional pattern is presented as both the input and as the target or desired output.
• When trained using a sum-squared error criterion, such a network readily learns the auto-
encoder problem. The top two layers of the trained network are discarded, and the rest used for
nonlinear component analysis.
• For each input pattern x, the outputs of the k units of the three-layer network correspond to the
nonlinear components.
• We can understand the function' of the full five-layer network in terms of two successive
mappings, F1 followed by F2. As Fig. 10.23 illustrates, F1, is a projection from the d-dimensional
input onto a k-dimensional nonlinear subspace and F2 is amapping from that subspace back to
the full d-dimensional space.
• There are often multiple local minima in the error surface associated with the five- layer
network, and we must take care to set an appropriate number k of units.
• Recall that in (linear) principal component analysis, the number of components k could be a e
chosen based on the spectrum of eigenvectors.
• If the eigenvalues are ordered by magnitude, any significant drop between successive values
indicates a "natural" numberdimension to the subspace. Likewise, suppose five-layer networks
are trained, with different numbers k of units in the middle layer.
FIGURE 17. A five-layer neural network with two layers of nonlinear units (e-g-sigmoidal), trained to be an auto-
encoder, develops an internal representation that corresponds to the nonlinear components of the full data set. The
process can be viewed36 in feature space (at the right), The transformation F1 is a nonlinear projection onto ak-
dimensional subspace, denoted Г(F2). Points in Г(F2) are mapped via F2 back to the d-dimensional space of the
original data. After training, the top two layers of the netare removed and the remaining three-layer network maps
inputs x to the space Г (F2).
FIGURE 18. Features from two classes are as shown, along with nonlínear components of the full data set.
Apparently, these classes are well-separated along the line marked Z2, but the large noise gives the largest nonlinear
component to be along z. Pre-processing by keeping merely the largest nonlínear component would retain the "noise
and discard the "signal," giving poor recognition. The same defect can arise in linear principal components, where the
coordinates are linear and orthogonal.
• Assuming poor local minima have been avoided, the training error will surely decrease for
successively larger values of k.
• If the improvement k + l over k is small, this may indicate that k is the "natural" dimension of the
subspace at the network's middle layer.
• We should not conclude that principal component analysis or nonlinear component analysis is always
beneficial for classification.
• If the noise is large compared to the difference between categories, then component analysis will find
the directions of the noise, rather than the signal, as illustrated in Fig.18
• In such cases, we seek to ignore the noise, and instead we extract the directions that are indicative of
thecategories technique we consider next.
UNIT-III
TWO MARK Questions and Answers
5. What is the primary purpose of utilizing the Fourier Transform in the context of Brain-
Computer Interface (BCI)?
The Fourier Transform is used in BCI to convert neural signals from the time domain into the
frequency domain. By doing so, it allows the analysis and extraction of frequency components
present in the signal.
This transformation is crucial for identifying and understanding different brain activities that
manifest at specific frequency bands, aiding in feature extraction for BCI applications.
Interpretation:The PSD indicates the strength of the signal at different frequencies, helping to
identify dominant frequency components and patterns in the data.
7. Explain the significance of Power Spectral Density (PSD) in BCI feature extraction.
Power Spectral Density (PSD) provides information about the distribution of signal power
across different frequencies.
In BCI, PSD is essential for identifying frequency-specific characteristics of neural signals.
It helps in extracting features related to different mental states or tasks by highlighting the
power variations within specific frequency bands.
PSD is particularly useful for discerning patterns associated with cognitive processes, making
it a valuable tool in BCI feature extraction.
8. How do wavelets contribute to feature extraction in BCI, and what distinguishes them from
Fourier Transform?
Wavelets are employed in BCI for both time and frequency analysis. Unlike the Fourier
Transform, which provides a fixed resolution in the frequency domain, wavelets offer variable
resolution, allowing the analysis of both.
12. What is the primary purpose of using AutoRegressive (AR) models in BCI?
AutoRegressive (AR) models in BCI are employed to capture the temporal dependencies within
neural signals. By representing each data point as a linear combination of its previous values, AR
models help in understanding and extracting the dynamic aspects of brain activity over time.
13. How does Moving Average (MA) contribute to feature extraction in BCI?
Moving Average (MA) in BCI serves the purpose of smoothing time-series data. It calculates
the average of consecutive data points, reducing noise and highlighting underlying trends.
MA is often used in BCI feature extraction to enhance the signal quality and improve the
interpretability of neural activity patterns.
16. Provide an example of a linear feature and a nonlinear feature used in BCI.
Principal Component Analysis (PCA) is a statistical method used for analyzingand simplifying the
structure in high-dimensional data by transforming it into a new coordinate system called the
principal components. The goal of PCA is to identify the directions, or principal components, in
which the data varies the most.
17. What is the Linear feature consequences of Principal Component Analysis operates?
PCA operates through linear transformations. It seeks to find a set of orthogonal axes (principal
components) along which the data has the maximum variance.Each principal component is a
linear combination of the original features.
The principal components are orthogonal, meaning they are uncorrelated. This is a consequence of
the linear transformation applied during PCA.