0% found this document useful (0 votes)

431 views10 pages

Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data

The research article examines Principal Component Analysis (PCA) for noise removal in data analysis and its application in fraud detection. Using a simulated data matrix, the study evaluates a threshold-based denoising strategy and confirms PCA's effectiveness in enhancing data accuracy, with potential real-world applications and future research opportunities discussed.

Uploaded by

Disant Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

431 views10 pages

Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data

Uploaded by

Disant Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Disant Upadhyay (2023), 1, 1–10

PROJECT 4

Principal Component Analysis for Noise Reduction and

Fraudulent Activity Detection in Scientific Data
Disant Upadhyay*
Memorial University of Newfoundland
*Corresponding author. Email: [email protected]

Abstract
In modern data analysis, the presence of noise poses significant challenges, often compromising the
accuracy of insights and concealing critical underlying signals. Principal Component Analysis (PCA)
has emerged as a potent technique for extracting valuable information from contaminated data, with
widespread applications in various domains, including the identification of fraudulent activities. This
research article delves into the utilization of PCA for noise removal in a simulated data matrix, which is
intentionally crafted using well-structured matrix functions to incorporate noise. We employ a threshold-
based denoising strategy using Singular Value Decomposition and rigorously assess its effectiveness under
varying noise intensities. Our findings underscore the prowess of PCA in mitigating noise and augmenting
the accuracy of data analysis. Moreover, we contextualize our results within the realm of real-world
applications and highlight promising avenues for future research in this dynamic field.

Contents

1 Introduction 2

2 Methodology and Data Simulation 2

2.1 Data Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 PCA for Noise Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Singular Values Analysis and Identifying Relevant PCA Components . . . . . . . . 3
2.4 Thresholding Criteria and Extracting Clean Data . . . . . . . . . . . . . . . . . . 4

3 Effects of ϵ on Threshold-Based Denoising and Accuracy 5

4 Discussion 5
4.1 Simulating Noisy Data Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Identifying Relevant PCA Components . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Threshold-Based Denoising Using Singular Value Decomposition . . . . . . . . . 6
4.4 Evaluating the Impact of Noise Intensity on PCA Performance . . . . . . . . . . . 6
4.5 Limitations of the Study and Future Directions . . . . . . . . . . . . . . . . . . . . 6
4.6 Real-World Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Conclusion 7

Acknowledgement 7

References 7
2 Disant Upadhyay et al.

A Visualizing Noisy and Original Transformations of Function-Based Matrices 8

B Singular Value Analysis for Identifying Relevant PCA Components 9

C Threshold-Based Denoising of Matrices Using Singular Value Decomposition 9

D Denoising and Correlation Analysis of Matrix Rows Using Thresholded SVD 10

1. Introduction
In the field of data analysis, noise is an ever-present challenge that can obscure the underlying signal
and hinder the accuracy of the analysis. One powerful technique for extracting critical information
from contaminated data or identifying fraudulent activities is Principal Component Analysis (PCA)
Elhaik 2022. This method can effectively reduce noise and improve the accuracy of data analysis by
identifying and retaining only the most significant variations in the data Kurita 2020.
In this research article, we investigate the application of PCA in removing noise from data and
improving the accuracy of data analysis, addressing the thesis question: How can PCA be applied
to remove noise from data and improve the accuracy of data analysis? To this end, we simulate a
noisy data matrix V using carefully designed matrix functions Φ, Σ, and Ψ, and apply PCA-based
thresholding criteria to extract the clean signal from the contaminated data Do principal component
analysis regression eliminate noise in the data set? Through our analysis, we demonstrate the effectiveness
of PCA in reducing noise and improving the accuracy of data analysis Pca based image denoising.
The remainder of this article is organized as follows. In the subsequent section, we describe our
methods for simulating the noisy data matrix V and present our results in detail. We explain the
process of creating a noisy data matrix V such that U = ΦΣΨT and V = U + ϵη, where η is a random
matrix with standard normal distribution and ϵ is a scalar parameter. Additionally, we outline how
we sample these functions for a set of values of x to simulate noisy data V, and how we focus on
removing the noise using the PCA method. The following sections delve further into the application
of PCA, highlighting its power in extracting critical information from contaminated data Zhang
et al. 2017.

2. Methodology and Data Simulation

In this section, we outline the methodology employed to investigate the effectiveness of PCA in noise
reduction and the accuracy improvement of data analysis. We will describe the process of simulating
the noisy data matrix V and the clean signal matrix U using carefully designed matrix functions Φ,
Σ, and Ψ.

2.1 Data Simulation

Our approach begins with the creation of a noisy data matrix V ∈ Rn×n , defined as V = U + ϵη,
where U = ΦΣΨT , and η ∈ Rn×n is a random matrix with a standard normal distribution. The scalar
parameter ϵ controls the level of noise introduced into the data. In this study, we set n = 600 and
ϵ = 1.
Disant Upadhyay 3

The matrix functions Φ(x), Σ, and Ψ(x) are defined as follows:

2
cos(17x)e–x

Φ(x) = ,
sin(11x)

2 0
Σ= ,
0 12
2
sin(5x)e–x

Ψ(x) = ,
cos(13x)

where x ranges from –3 to 3. To simulate the noisy data matrix V, we sample these functions for a
set of n equally spaced values of x between –3 and 3.

Figure 1. U and V visualized

Using the provided Python code in Appendix A, we generate the matrices U and V, and
subsequently plot them side-by-side to visualize the differences and the presence of noise in the data
(Figure 1). This visual comparison allows us to observe the impact of noise on the data and sets the
stage for applying PCA to remove the noise and improve the accuracy of data analysis.

2.2 PCA for Noise Reduction

In the following sections, we will discuss the application of PCA for noise reduction in the simulated
data. We will describe the PCA-based thresholding criteria used to extract the clean signal from
the contaminated data, and we will present our results in detail. Through our analysis, we aim to
demonstrate the effectiveness of PCA in reducing noise and improving the accuracy of data analysis.

2.3 Singular Values Analysis and Identifying Relevant PCA Components

In this section, we analyze the singular values of the clean signal matrix U and the noisy data matrix
V in relation to our thesis statement, which focuses on the application of PCA for noise reduction and
improved data analysis accuracy. By comparing their strongest singular values, we aim to identify
which PCA components are most likely relevant to the clean data. This analysis provides insights
into the effectiveness of PCA in isolating the significant variations in the data and facilitating noise
reduction.
We compute the singular values of U and V using Python code provided in Appendix B.
Subsequently, we compare the top 10 singular values of U and V in a table (Table 1).
Upon inspection of the table, we observe a clear distinction between the first two singular values
and the rest. The first two singular values of both U and V are considerably larger than the others.
4 Disant Upadhyay et al.

Table 1. Comparison of the top 10 singular values of U and V

U V
0 1.51 × 102 1.55 × 102
1 1.25 × 102 1.30 × 102
2 5.74 × 10–14 4.85 × 101
3 5.01 × 10–14 4.80 × 101
4 4.76 × 10–14 4.78 × 101
5 4.26 × 10–14 4.74 × 101
6 4.13 × 10–14 4.73 × 101
7 3.62 × 10–14 4.71 × 101
8 3.24 × 10–14 4.68 × 101
9 2.99 × 10–14 4.65 × 101

This suggests that the first two PCA components capture the most significant variations in the data
and are most likely relevant to the clean data.
Having identified the relevant PCA components, we will proceed to the next section, where
we will explore the application of PCA-based thresholding criteria to extract the clean signal from
the contaminated data, further demonstrating the effectiveness of PCA in noise reduction and data
analysis accuracy improvement.

2.4 Thresholding Criteria and Extracting Clean Data

Figure 2. Visualization of the thresholded matrices Ũ and Ṽ

Building upon the identification of relevant PCA components in the previous section, we now
focus on applying a thresholding criterion to extract the clean data from the noisy data matrix V.
This thresholding criterion is based on setting all singular values smaller than a tolerance τ to zero.
We then compute Ũ = ΦΣ̃ΨT as the clean data, where Σ̃ is the thresholded singular value matrix.
√In √the data science literature, the threshold τ is known to have an optimal value given by
(4/ 3) nϵ. Using this optimal value, we apply the thresholding criterion to our data, as shown
in the Python code provided in Appendix C. The resulting thresholded matrices, Ũ and Ṽ, are
visualized in Figure 2.
As shown in Figure 2, the thresholded matrices demonstrate the effectiveness of the PCA-based
thresholding criterion in reducing the noise present in the data. Comparing Ũ and Ṽ, we can observe
Disant Upadhyay 5

similarities, suggesting that the clean data has been successfully extracted from the noisy data.
In the next section, we will further validate the effectiveness of our PCA-based thresholding
criterion by analyzing the correlation between a specific row of the clean data matrix U and the
thresholded matrix Ṽ.

3. Effects of ϵ on Threshold-Based Denoising and Accuracy

In this section, we investigate the effects of changing the noise parameter ϵ on the threshold-based
denoising process and the resulting accuracy. We repeat the denoising process using ϵ = 0.5, and
compare a row of U with that of Ũ visually using a plot and numerically using a correlation matrix.

Figure 3. Plot of row 300 of U and Ṽ , showing the effects of changing ϵ on the denoising process and accuracy.

The Python code for this analysis can be found in Appendix D. As shown in Figure 3, we plot
row 300 of U and Ṽ, and calculate the correlation between these rows using the np.corrcoef
function. The resulting correlation value indicates the level of accuracy in the denoising process, and
any changes in ϵ can be visually and numerically assessed through the plot and correlation value,
respectively.
By analyzing the effects of ϵ on the denoising process, we can gain a deeper understanding of
the relationship between noise level and the performance of PCA-based denoising techniques.

4. Discussion
In this research article, we have conducted an in-depth exploration of Principal Component Analysis
(PCA) as a powerful tool for reducing noise in data analysis. We examined various aspects of PCA and
assessed its performance in extracting critical information from contaminated data using a carefully
designed simulation study. The following subsections discuss our findings in detail, providing insight
into the strengths, limitations, and future directions for PCA as a noise reduction technique.

4.1 Simulating Noisy Data Matrices

We began our investigation by simulating a noisy data matrix V using matrix functions Φ, Σ, and
Ψ. We created the matrices U and V such that U = ΦΣΨT and V = U + ϵη. The noise level was
6 Disant Upadhyay et al.

controlled by the scalar parameter ϵ. This simulation allowed us to systematically study the effects of
noise on PCA’s performance in a controlled environment. Our side-by-side visualization of matrices
U and V revealed the significant differences introduced by the noise term ϵη.

4.2 Identifying Relevant PCA Components

After simulating the noisy data matrix, we proceeded to calculate the singular values of U and V.
We compared the strongest ten singular values to identify which PCA components were most likely
relevant to the clean data. Our analysis suggested that the first two PCA components had the most
significant impact on the clean data. This finding highlights the importance of focusing on the most
dominant components when using PCA for noise reduction in data analysis.

4.3 Threshold-Based Denoising Using Singular Value Decomposition

To further examine PCA’s noise reduction capabilities, we implemented a threshold-based denoising
strategy using Singular Value Decomposition (SVD). We computed Ũ and Ṽ as the clean data
matrices
√ by setting all singular values smaller than a tolerance τ to zero. The threshold τ was selected
√
as (4/ 3) nϵ, an optimal value known in the data science literature. Our visual comparison of the
thresholded matrices Ũ and Ṽ demonstrated the effectiveness of PCA in recovering the clean data
from the noisy data matrix.

4.4 Evaluating the Impact of Noise Intensity on PCA Performance

We further investigated the impact of noise intensity on PCA’s performance by repeating the
thresholding process with a lower noise level (ϵ = 0.5). Our comparison of a row of U and Ṽ, both
visually and numerically through correlation analysis, showed a high degree of similarity between
the original and denoised data. This result confirms that PCA is highly effective in mitigating the
effects of noise and improving the accuracy of data analysis, even when the noise level varies.

4.5 Limitations of the Study and Future Directions

While our simulation study has demonstrated the robustness of PCA as a noise reduction technique
in data analysis, there are some limitations to our approach. First, our study is based on a controlled
simulation environment, which may not fully capture the complexity and variability of real-world
data. Second, the choice of matrix functions Φ, Σ, and Ψ may affect the generalizability of our
findings. Moreover, we have used a specific thresholding criterion based on the literature; other
thresholding strategies might lead to different results.
To address these limitations, future research could investigate the performance of PCA in handling
more complex data structures or explore alternative denoising techniques. Additionally, the impact
of different thresholding criteria on the effectiveness of PCA for noise reduction could be examined.

4.6 Real-World Applications

Our simulation study has significant implications for various real-world applications, such as scientific
measurements and the identification of fraudulent activities. In many scientific fields, data often
contain noise due to measurement errors or other factors. Our results show that PCA can be a
valuable tool for extracting critical information from contaminated data, enabling researchers to
obtain more accurate and reliable insights.
In the context of fraud detection, PCA can be employed to identify anomalous patterns in large
datasets where fraudulent activities may be hidden. By reducing noise and focusing on the most
relevant components, PCA can help reveal underlying structures that may indicate potential fraud.
Disant Upadhyay 7

5. Conclusion
In this research article, we have thoroughly investigated the effectiveness of Principal Component
Analysis (PCA) as a noise reduction technique in data analysis. Our simulation-based approach
aimed to demonstrate PCA’s ability to extract critical information from contaminated data and its
potential to identify fraudulent activities. The thesis of our study posited that PCA would be effective
in removing noise from a data matrix and could potentially be employed in real-world applications.
Our findings support the thesis, showcasing that PCA is robust and capable of recovering clean
data from noisy data matrices. We observed the importance of focusing on the most dominant
components when using PCA for noise reduction, as these components have the most significant
impact on the clean data.
Our threshold-based denoising strategy effectively demonstrated that PCA could mitigate the
effects of noise, improving the accuracy of data analysis even when the noise level varies. This
observation aligns with our initial thesis, further substantiating the utility of PCA in real-world
applications, such as scientific measurements and fraud detection.
In conclusion, the results of our investigation confirm the correctness of our thesis and showcase
the potential of PCA as a valuable tool for extracting critical information from contaminated data.
Our study offers valuable insights into the effectiveness and robustness of PCA for noise reduction in
data analysis, making it applicable in a wide range of scenarios. While there are limitations to our
approach, the overall outcomes emphasize the importance of PCA in dealing with noisy data and its
applicability in various real-world situations.

Acknowledgement
This article is the Fourth and final cap-stone project for the course Technical writing taught by
Jabrul Alam at the Memorial University of Newfoundland.

References
Do principal component analysis regression eliminate noise in the data set? https://stats.stackexchange.com/questions/304449/do-
principal-component-analysis-regression-eliminate-noise-in-the-data-set.
Elhaik, Eran. 2022. Principal component analyses (pca)-based findings in population genetic studies are highly biased and
must be reevaluated. Scientific Reports.
Kurita, Takio. 2020. Principal component analysis (pca). In Computer vision. SpringerLink.
Pca based image denoising. https://aircconline.com/sipij/V3N2/3212sipij18.pdf .
Zhang, Xiaoming, Xuefeng Zhang, Xiaohong Zhang, and Xiaodong Zhang. 2017. Random noise suppression algorithm for
seismic signals based on modified principal component analysis. Wireless Personal Communications.
8 Disant Upadhyay et al.

A. Visualizing Noisy and Original Transformations of Function-Based Matrices

1 import numpy as np
2 import matplotlib . pyplot as plt
3

4 def gen erate_mat rices (n , epsilon ) :

5 """ Generate U and V matrices . """
6

7 # Define x with n linearly spaced values

8 # between -3 and 3
9 x = np . linspace ( -3 , 3 , n )
10

11 # Define phi , sigma , and psi matrices

12 phi = np . array ([ np . cos (17* x ) * np . exp ( - x **2) , np . sin (11* x ) ])
13 sigma = np . array ([[2 , 0] , [0 , 1/2]])
14 psi = np . array ([ np . sin (5* x ) * np . exp ( - x **2) , np . cos (13* x ) ])
15

16 # Generate random noise matrix eta with dimensions n x n

17 eta = np . random . randn (n , n )
18

19 # Compute U and V matrices

20 U = phi . T @ sigma @ psi
21 V = U + epsilon * eta
22

23 return U , V
24

25 def plot_matrices (U , V ) :
26 """ Plot U and V side - by - side . """
27 fig , ax = plt . subplots (1 , 2)
28 ax [0]. imshow ( U )
29 ax [0]. set_title ( ’U ’)
30 ax [1]. imshow ( V )
31 ax [1]. set_title ( ’V ’)
32 plt . show ()
33

34 def main () :
35 # Define matrix size and noise level
36 n = 600
37 epsilon = 1
38 # Set random seed for reproducibility
39 np . random . seed (42)
40

41 # Generate U and V matrices

42 U , V = generate _matrice s (n , epsilon )
43

44 # Plot U and V matrices

45 plot_matrices (U , V )
46

47 main ()
Disant Upadhyay 9

B. Singular Value Analysis for Identifying Relevant PCA Components

1 import pandas as pd
2

3 # Compute singular values of U and V matrices without computing

the actual singular vectors
4 U_s = np . linalg . svd (U , compute_uv = False )
5 V_s = np . linalg . svd (V , compute_uv = False )
6

7 # Create a DataFrame containing the first 10 singular values of

U and V
8 df = pd . DataFrame ({ ’U ’: U_s [:10] , ’V ’: V_s [:10]})
9 print ( df )
10

11 # Set the number of PCA components to be considered

12 n_components = 2
13

14 # Inform the user about the number of relevant PCA components

15 print ( f ’ The first { n_components } PCA components are most likely
relevant to the clean data . ’)

C. Threshold-Based Denoising of Matrices Using Singular Value Decomposition

1 # Calculate the threshold value ( tau ) based on matrix size ( n )
and noise level ( epsilon )
2 tau = (4/ np . sqrt (3) ) * np . sqrt ( n ) * epsilon
3

4 # Compute singular value decompositions of U and V matrices

5 U_d , U_s , U_v = np . linalg . svd ( U )
6 V_d , V_s , V_v = np . linalg . svd ( V )
7

8 # Apply the threshold value ( tau ) to the singular values of U

and V matrices
9 U_s_tilde = np . where ( U_s > tau , U_s , 0)
10 V_s_tilde = np . where ( V_s > tau , V_s , 0)
11

12 # Reconstruct the denoised matrices U_tilde and V_tilde using

the thresholded singular values
13 U_tilde = U_d @ np . diag ( U_s_tilde ) @ U_v
14 V_tilde = V_d @ np . diag ( V_s_tilde ) @ V_v
15

16 # Create subplots for U_tilde and V_tilde

17 fig , ax = plt . subplots (1 , 2)
18

19 # Plot the denoised U_tilde matrix

20 ax [0]. imshow ( U_tilde )
21 ax [0]. set_title ( ’ U_tilde ’)
22

23 # Plot the denoised V_tilde matrix

24 ax [1]. imshow ( V_tilde )
10 Disant Upadhyay et al.

25 ax [1]. set_title ( ’ V_tilde ’)

27 plt . show ()

D. Denoising and Correlation Analysis of Matrix Rows Using Thresholded SVD

1 # Set a new noise level ( epsilon )
2 epsilon = 0.5
3

4 # Generate the noisy matrix V with the new noise level

5 V = U + epsilon * eta
6

7 # Calculate the threshold value ( tau ) based on matrix size ( n )

and the new noise level ( epsilon )
8 tau = (4/ np . sqrt (3) ) * np . sqrt ( n ) * epsilon
9

10 # Compute singular value decomposition of the noisy matrix V

11 V_d , V_s , V_v = np . linalg . svd ( V )
12

13 # Apply the threshold value ( tau ) to the singular values of V

matrix
14 V_s_tilde = np . where ( V_s > tau , V_s , 0)
15

16 # Reconstruct the denoised matrix V_tilde using the thresholded

singular values
17 V_tilde = V_d @ np . diag ( V_s_tilde ) @ V_v
18

19 # Plot row 300 of U and V_tilde matrices

20 plt . plot ( U [300 ,:] , label = ’U ’)
21 plt . plot ( V_tilde [300 ,:] , label = ’ V_tilde ’)
22

23 # Add legend to the plot

24 plt . legend ()
25

26 # Calculate the correlation coefficient between row 300 of U

and V_tilde matrices
27 corr = np . corrcoef ( U [300 ,:] , V_tilde [300 ,:])
28

29 # Print the correlation coefficient

30 print ( f ’ Correlation between row 300 of U and V_tilde : { corr
[0 ,1]} ’)
31

32 plt . show ()

How Efficient Is Jacobi Iteration in Solving Linear Systems
No ratings yet
How Efficient Is Jacobi Iteration in Solving Linear Systems
7 pages
The WISC-V
No ratings yet
The WISC-V
16 pages
How To Pitch To Investors 1709668578
100% (1)
How To Pitch To Investors 1709668578
17 pages
NSF Proposal
100% (2)
NSF Proposal
22 pages
17 Equations That Changed The World - Business Insider
No ratings yet
17 Equations That Changed The World - Business Insider
10 pages
L Hiller: L Isaacson Experimental Music
No ratings yet
L Hiller: L Isaacson Experimental Music
216 pages
Zuell Open-Ended Questions
100% (1)
Zuell Open-Ended Questions
10 pages
The Science of Social Networks
100% (4)
The Science of Social Networks
47 pages
1016 Hanshaw Road Ithaca, Ny 14850 October 10, 2015
No ratings yet
1016 Hanshaw Road Ithaca, Ny 14850 October 10, 2015
14 pages
Time Table Scheduling Using Genetic Algorithm
No ratings yet
Time Table Scheduling Using Genetic Algorithm
68 pages
Analysis As A Tool in Mathematical Physics - in Memory of Boris Pavlov
No ratings yet
Analysis As A Tool in Mathematical Physics - in Memory of Boris Pavlov
635 pages
Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)
No ratings yet
Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)
21 pages
Design Science Research in Information Systems
No ratings yet
Design Science Research in Information Systems
62 pages
How To Ace Calculus
100% (2)
How To Ace Calculus
4 pages
Field Guide For Investigating Internal Corrosion of Pipelines
100% (2)
Field Guide For Investigating Internal Corrosion of Pipelines
110 pages
Why Probability Probably Doesn't Exist (But It Is Useful To Act Like It Does)
No ratings yet
Why Probability Probably Doesn't Exist (But It Is Useful To Act Like It Does)
11 pages
STAR CCM+ - Introductory Training - 2023 - Pipe in Duct - 4 4
No ratings yet
STAR CCM+ - Introductory Training - 2023 - Pipe in Duct - 4 4
44 pages
Probability & Probability Distribution
No ratings yet
Probability & Probability Distribution
45 pages
Quality MGMT Control 17
0% (1)
Quality MGMT Control 17
69 pages
Previewpdf
No ratings yet
Previewpdf
46 pages
Boolean Expressions
No ratings yet
Boolean Expressions
17 pages
Decision Making
No ratings yet
Decision Making
59 pages
Efficient True Random Number Generation
No ratings yet
Efficient True Random Number Generation
104 pages
Language Change: The Tree Model and Wave Model - Chanelle Katsidzira
100% (1)
Language Change: The Tree Model and Wave Model - Chanelle Katsidzira
16 pages
Quiz About Moon
No ratings yet
Quiz About Moon
1 page
Thinking Critically About Critical Thinking
No ratings yet
Thinking Critically About Critical Thinking
3 pages
PRAXIS ESP Packer Installation Instructions
No ratings yet
PRAXIS ESP Packer Installation Instructions
11 pages
7 Leading Machine Learning Use Cases
No ratings yet
7 Leading Machine Learning Use Cases
11 pages
241 Survey Research
No ratings yet
241 Survey Research
28 pages
The Matrix Cook Book
No ratings yet
The Matrix Cook Book
71 pages
Math For Data Science 6001 Lecture 1
No ratings yet
Math For Data Science 6001 Lecture 1
12 pages
Neil Falkner - The Fundamentals of Higher Mathematics-XanEdu (2021)
No ratings yet
Neil Falkner - The Fundamentals of Higher Mathematics-XanEdu (2021)
191 pages
(Vijayan Sugumaran, Arun Kumar Sangaiah, Arunkumar
100% (1)
(Vijayan Sugumaran, Arun Kumar Sangaiah, Arunkumar
379 pages
SCULPFUN S30 Series User Manual
100% (1)
SCULPFUN S30 Series User Manual
72 pages
Dynamic Programming Examples
No ratings yet
Dynamic Programming Examples
2 pages
Aerospace Engineering
No ratings yet
Aerospace Engineering
10 pages
Project Euler Solutions
No ratings yet
Project Euler Solutions
17 pages
Proofs Techniques
No ratings yet
Proofs Techniques
5 pages
Electronotes AN-355, October 2003
No ratings yet
Electronotes AN-355, October 2003
26 pages
Saptarishis Astrology Vol 8 Jun 2010 Part 1
No ratings yet
Saptarishis Astrology Vol 8 Jun 2010 Part 1
342 pages
Aprelim Exam Review MD 1st Sem 22 - 23
100% (1)
Aprelim Exam Review MD 1st Sem 22 - 23
2 pages
Creating Reverb Effects Using Granular Synthesis
No ratings yet
Creating Reverb Effects Using Granular Synthesis
6 pages
Weak Emergence
No ratings yet
Weak Emergence
28 pages
Electronotes AN-348, July 1998
No ratings yet
Electronotes AN-348, July 1998
9 pages
Dsa Notes Unit 5
No ratings yet
Dsa Notes Unit 5
21 pages
Emptiness and Category Theory
No ratings yet
Emptiness and Category Theory
14 pages
The Art of Asking Powerful Questions in The World of Systems
No ratings yet
The Art of Asking Powerful Questions in The World of Systems
9 pages
Safety Data Sheet Conbextra 621: 1 Identification of The Substance/Preparation and of The Company/Undertaking
No ratings yet
Safety Data Sheet Conbextra 621: 1 Identification of The Substance/Preparation and of The Company/Undertaking
5 pages
Combinatorics PDF
No ratings yet
Combinatorics PDF
137 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Setting The Stage: Quality Basics Quality Advocates Quality Improvement: Problem Solving
100% (1)
Setting The Stage: Quality Basics Quality Advocates Quality Improvement: Problem Solving
0 pages
Qualitative Research BRM
No ratings yet
Qualitative Research BRM
28 pages
Five Models of Software Engineering
No ratings yet
Five Models of Software Engineering
8 pages
Gödel's Incompleteness Theorems
No ratings yet
Gödel's Incompleteness Theorems
6 pages
Summary Introductory Linear Algebra
No ratings yet
Summary Introductory Linear Algebra
17 pages
4.method Statement-Excavation & Backfilling
No ratings yet
4.method Statement-Excavation & Backfilling
10 pages
Elementary Workbook Unit7
No ratings yet
Elementary Workbook Unit7
8 pages
A Detailed Lesson Plan in SNED 132
No ratings yet
A Detailed Lesson Plan in SNED 132
14 pages
Schneider Solar Catalog 2015
No ratings yet
Schneider Solar Catalog 2015
65 pages
JHA For Excavation - Jackhammer
No ratings yet
JHA For Excavation - Jackhammer
4 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
(XXXX) Bright, W. An Introduction To Scientific Research, Harvard University
No ratings yet
(XXXX) Bright, W. An Introduction To Scientific Research, Harvard University
6 pages
Blindsight: Notes and References: (Longwinded Version)
No ratings yet
Blindsight: Notes and References: (Longwinded Version)
32 pages
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
No ratings yet
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
13 pages
Advanced Soil Mechanics Assignment 2018
No ratings yet
Advanced Soil Mechanics Assignment 2018
43 pages
Ce 304 Lesson 6
No ratings yet
Ce 304 Lesson 6
5 pages
Concept Statement
No ratings yet
Concept Statement
1 page
BSc. (Hons.) Psychology 3564 ACA - 16 2025 MGU
No ratings yet
BSc. (Hons.) Psychology 3564 ACA - 16 2025 MGU
16 pages
NOTICE WRITING - Final
No ratings yet
NOTICE WRITING - Final
10 pages
Analytic Number Theory, Modular Forms and q-Hypergeometric Series: In Honor of Krishna Alladi's 60th Birthday, University of Florida, Gainesville, March 2016 1st Edition George E. Andrews - The ebook in PDF and DOCX formats is ready for download
No ratings yet
Analytic Number Theory, Modular Forms and q-Hypergeometric Series: In Honor of Krishna Alladi's 60th Birthday, University of Florida, Gainesville, March 2016 1st Edition George E. Andrews - The ebook in PDF and DOCX formats is ready for download
57 pages
التأصيل النظري للمحاسبة وفق المعايير الدولية لإعداد التقارير المالية
No ratings yet
التأصيل النظري للمحاسبة وفق المعايير الدولية لإعداد التقارير المالية
21 pages
Lecture-5 (Fitting of A Exponential Curve)
No ratings yet
Lecture-5 (Fitting of A Exponential Curve)
4 pages
Jap Sam Books - February 2024
No ratings yet
Jap Sam Books - February 2024
6 pages
The Genius of Medicine
No ratings yet
The Genius of Medicine
22 pages
Prosper Chikanyire Final Project Esh 2017 Chapters 1,2,3,4,5
No ratings yet
Prosper Chikanyire Final Project Esh 2017 Chapters 1,2,3,4,5
92 pages
The Detection and Determination of Adulterants in Turmeric Using Fourier-Transform Infrared (FTIR) Spectroscopy Coupled To Chemometric Analysis and micro-FTIR Imaging - ScienceDirect
No ratings yet
The Detection and Determination of Adulterants in Turmeric Using Fourier-Transform Infrared (FTIR) Spectroscopy Coupled To Chemometric Analysis and micro-FTIR Imaging - ScienceDirect
8 pages
The New Geopolitics of Climate Change - The Diplomat
No ratings yet
The New Geopolitics of Climate Change - The Diplomat
5 pages
PR3 Sample Research PAper
No ratings yet
PR3 Sample Research PAper
11 pages
Ibis Styles Hotel Presentation
No ratings yet
Ibis Styles Hotel Presentation
21 pages
Chapter-2 - Distance Measurements - PPB - Surveying - I
No ratings yet
Chapter-2 - Distance Measurements - PPB - Surveying - I
4 pages
Examples and Problems in Mathematical Statistics
From Everand
Examples and Problems in Mathematical Statistics
Shelemyahu Zacks
5/5 (2)
Cutting Edge Techniques in Biophysics, Biochemistry and Cell Biology: From Principle to Applications
From Everand
Cutting Edge Techniques in Biophysics, Biochemistry and Cell Biology: From Principle to Applications
PublishDrive
No ratings yet
Modeling and Simulation of Human Behavior: An Introduction
From Everand
Modeling and Simulation of Human Behavior: An Introduction
Emory Sanders
No ratings yet
Techniques for Learning Happily: Certain Games Lead You to Discover Your Skills in Learning
From Everand
Techniques for Learning Happily: Certain Games Lead You to Discover Your Skills in Learning
Adel Habeeb Hassan Qumbar
No ratings yet
The Portrait Of A Super Student: Sure ways to improve memory, concentration and personality
From Everand
The Portrait Of A Super Student: Sure ways to improve memory, concentration and personality
Abhishek Thakore
No ratings yet
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet
Delphi Pascal Programming: Efficient Code Editing, Visual Designing, And Integrated Debugging
From Everand
Delphi Pascal Programming: Efficient Code Editing, Visual Designing, And Integrated Debugging
Rob Botwright
No ratings yet
Semantic Knowledge Graphing Third Edition
From Everand
Semantic Knowledge Graphing Third Edition
Gerardus Blokdyk
No ratings yet
The Essays of Arthur Schopenhauer; the Art of Controversy
From Everand
The Essays of Arthur Schopenhauer; the Art of Controversy
Arthur Schopenhauer
No ratings yet
Complex analysis A Complete Guide
From Everand
Complex analysis A Complete Guide
Gerardus Blokdyk
No ratings yet

Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data

Uploaded by

Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data

Uploaded by

Disant Upadhyay (2023), 1, 1–10

Principal Component Analysis for Noise Reduction and

2 Methodology and Data Simulation 2

3 Effects of ϵ on Threshold-Based Denoising and Accuracy 5

A Visualizing Noisy and Original Transformations of Function-Based Matrices 8

B Singular Value Analysis for Identifying Relevant PCA Components 9

C Threshold-Based Denoising of Matrices Using Singular Value Decomposition 9

D Denoising and Correlation Analysis of Matrix Rows Using Thresholded SVD 10

2. Methodology and Data Simulation

2.1 Data Simulation

The matrix functions Φ(x), Σ, and Ψ(x) are defined as follows:

Figure 1. U and V visualized

2.2 PCA for Noise Reduction

2.3 Singular Values Analysis and Identifying Relevant PCA Components

Table 1. Comparison of the top 10 singular values of U and V

2.4 Thresholding Criteria and Extracting Clean Data

Figure 2. Visualization of the thresholded matrices Ũ and Ṽ

3. Effects of ϵ on Threshold-Based Denoising and Accuracy

4.1 Simulating Noisy Data Matrices

4.2 Identifying Relevant PCA Components

4.3 Threshold-Based Denoising Using Singular Value Decomposition

4.4 Evaluating the Impact of Noise Intensity on PCA Performance

4.5 Limitations of the Study and Future Directions

4.6 Real-World Applications

A. Visualizing Noisy and Original Transformations of Function-Based Matrices

4 def gen erate_mat rices (n , epsilon ) :

7 # Define x with n linearly spaced values

11 # Define phi , sigma , and psi matrices

16 # Generate random noise matrix eta with dimensions n x n

19 # Compute U and V matrices

41 # Generate U and V matrices

44 # Plot U and V matrices

B. Singular Value Analysis for Identifying Relevant PCA Components

3 # Compute singular values of U and V matrices without computing

7 # Create a DataFrame containing the first 10 singular values of

11 # Set the number of PCA components to be considered

14 # Inform the user about the number of relevant PCA components

C. Threshold-Based Denoising of Matrices Using Singular Value Decomposition

4 # Compute singular value decompositions of U and V matrices

8 # Apply the threshold value ( tau ) to the singular values of U

12 # Reconstruct the denoised matrices U_tilde and V_tilde using

16 # Create subplots for U_tilde and V_tilde

19 # Plot the denoised U_tilde matrix

23 # Plot the denoised V_tilde matrix

25 ax [1]. set_title ( ’ V_tilde ’)

D. Denoising and Correlation Analysis of Matrix Rows Using Thresholded SVD

4 # Generate the noisy matrix V with the new noise level

7 # Calculate the threshold value ( tau ) based on matrix size ( n )

10 # Compute singular value decomposition of the noisy matrix V

13 # Apply the threshold value ( tau ) to the singular values of V

16 # Reconstruct the denoised matrix V_tilde using the thresholded

19 # Plot row 300 of U and V_tilde matrices

23 # Add legend to the plot

26 # Calculate the correlation coefficient between row 300 of U

29 # Print the correlation coefficient

You might also like