Literature Review 2.1 Current Anti-Money Laundering (AML) and Fraud Detection Systems
Literature Review 2.1 Current Anti-Money Laundering (AML) and Fraud Detection Systems
In the results chapter, machine learning model performance outcomes are presented with
detailed metrics such as precision, recall and AUC-ROC. Comparisons are made to illustrate
the benefits of the proposed framework over typical approaches.
The key findings are summarised in the final chapter which discusses the contributions to
fraud detection of the project and recommendations for future research and real-world
applications. Potential improvements and challenges faced in the project are discussed as
well.
The document concludes with a full reference list of all cited work as standardised via the
Harvard referencing style. Supplementary materials such as technical diagrams, code
snippets, and additional datasets are included in appendices.
The structured approach will help clarify and help cohere the reader’s progress in project
development from identification of the problem to the solution development, as well as the
solution evaluation.
2. Literature Review
2.1 Current Anti-Money Laundering (AML) and Fraud Detection Systems
Systems for fraud detection and anti money laundering (AML) are key to the integrity of
financial institutions, but they have many limitations. Conventional systems heavily depend
on rule-based framework by establishing thresholds and static rules to detect uncommon
transactions. These systems are simple but inflexible, and cannot evolve with changing tactics
of the fraudsters. Agorbia-Atta and Atalor (2024) state that these systems frequently do not
recognize increasingly sophisticated schemes because they use historical patterns, not
behaviors, and cannot keep up with new types of fraud. However, this shortcoming has
turned traditional AML systems passive into ‘reactive’ instead of being ‘proactive’ resulting
in leaving financial institutions vulnerable.
These systems are another key problem, in that they are prone to generating very high false
positive rates. For example, analysts facing rule-based systems are likely to spend inordinate
amounts of time investigating 'suspicious' transactions where there is nothing sinister about
them at all. According to Agorbia-Atta and Atalor (2024), these inefficiencies affect not only
compliance teams but also increase operational costs. Mallidi and Zagabathuni (2021) made
similar conclusions; rule-based models fail on complicated datasets where for example, there
are few fraudulent transactions compared to the general scenario, leading to large unbalanced
flagged cases.
They thus constitute additional limitations that mandate more adaptive and scalable solutions.
This project attempts to address these gaps by utilizing machine learning and big data
technologies to deliver a strong framework which amalgamates robust prediction
performance with operational efficiency.
Combined with Hadoop’s distributed file system (HDFS), it is a great fit for the storage of
this kind of large scale transaction data. It can aggregate historical data (which is critical to
detecting long term fraud patterns) as batch processing. Hadoop is complemented by Apache
Spark, through the use of their in-memory computing ability to provide real time data
processing. These tools combined make available a unified framework for simultaneous
analysis of historical and real time financial data streams.
Another big data platform strength is it can be used together with machine learning models.
For example, MLlib library of Spark allows to launch algorithms like Random Forest,
XGBoost on large tabular data. The integration speed and accuracy for fraud detection are
improved as machine learning models can be trained and deployed at scale, and without loss
of performance.
While they have many benefits, big data platforms also impose challenges, specifically in
regard to regulatory compliance and data privacy. For the storage and processing of sensitive
financial data, the General Data Protection Regulation (GDPR) has strict requirements, thus
demanding robust security measures. According to Rabhi and Berry (2024), it is important to
deal with these concerns so that big data technologies for fraud detection is used ethically.
This project overcomes the current scalability and real-time processing limitations of the
current systems by incorporating big data analytics in its framework to provide immensely
fast and accurate fraud detection solutions.
Among the highly used algorithms for fraud detection are RF and XGBoost because of their
flexibility and speed. An ensemble method, Random Forest utilizes multiple decision trees to
classify transactions and is robust against overfitting and high dimensional data. Mallidi and
Zagabathuni (2021) showed that RF does better than traditional models in terms of precision
and recall and is therefore suited for datasets with imbalanced classes.
Although these algorithms perform well in fraud detection, they need well refined
preprocessing for handling imbalanced data. Synthetic Minority Oversampling Technique
(SMOTE) is an important tool to help resolve the issue of imbalanced data through training
(Rai et al., 2024), by ensuring that the minority class, being fraudulent transactions, is
represented sufficiently. In this way, the models don’t get biased towards the majority class
and can pick rare fraudulent activities better.
In addition, ML models can be scaled. These algorithms are integrated with Big Data
platforms like Apache Spark by which huge transaction datasets can be processed in real time
and are indispensable to modern financial systems. Performing ML tasks with Spark is
convenient because Spark’s MLlib allows us to offload that parallel execution for training or
prediction, cutting down load time.
Though promising, ML models also come with problems. Performance can be compromised
by overfitting, when a model learns noise rather than meaningful patterns. Moreover, the
implementation of ML algorithms is on heavy computational resources and task requires
domain expertise. Keskar (2020) highlighted that it is crucial to choose good hyperparameters
and carefully cross validate to get the best out of our model.
In this project, Random Forest, XGBoost and other ML models’ strengths are harnessed to
create a robust fraud detection framework. Through the combination of these algorithms with
big data technologies, the proposed system targets to achieve a high accuracy, scalability and
efficiency in detecting the fraudulent transactions.