0% found this document useful (0 votes)
28 views13 pages

ML Course Project

Uploaded by

esha36066
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

ML Course Project

Uploaded by

esha36066
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

APPLIED MACHINE LEARNING TO ANOMALY

DETECTION IN ENTERPRISE
PURCHASE PROCESSES

02FE23BCI015 Esha Yalagi


02FE23BCI042 Shreya Rokade
02FE23BCI052 Tejas Gatge
02FE22BCI059 Mudabbir Naragaddi

Course: 22ECSC306
Academic Year: 2024-25

KLE Tech. Univ.’s Dr. MSSCET 1/13


Agenda

1 Introduction
2 Problem Statement
3 Literature Survey
4 Dataset Description
5 Methodology
6 Experimental Setup
7 Results and Analysis
8 Discussions
9 Conclusions
10 References
11 Acknowledgements

KLE Tech. Univ.’s Dr. MSSCET 2/13


Introduction

Brief Background: The project uses machine learning to automate


fraud detection in enterprise purchase processes, replacing traditional,
less efficient auditing methods.
Motivation and Scope: Manual audits lose effectiveness as data
volumes increase. The project’s goal is to find anomalies in big
datasets in order to increase audit efficiency and accuracy
Key Objective Use technologies like LIME and SHAP to explain
flagged results, automate the detection process, and identify
purchasing abnormalities.

KLE Tech. Univ.’s Dr. MSSCET 3/13


Problem Statement

Problem: Traditional audit approaches are insufficient for spotting


anomalies in large-scale company purchase data, resulting in missed
fraud and inefficiencies.
Research Gaps/obstacles: Current techniques have obstacles such
as high false positives, difficulty scaling, and restricted interpretability,
emphasizing the need for improved automated detection.
Hypotheses/Research Questions:
1) Can machine learning approaches help detect fraud in purchase
data?
2)How well do technologies like LIME and SHAP explain reported
anomalies to auditors?

KLE Tech. Univ.’s Dr. MSSCET 4/13


Literature Survey: Parameter-Based Comparison

Study Objective Dataset Methods Results Limitations Relevance


Key results
Author Short description Dataset Models/Algorithms Challenges or gaps How it informs
(Accuracy,
1 (Year) of the goal size/type used in the research your work
etc.)
Key results
Author Short description Dataset Models/Algorithms Challenges or gaps How it informs
(Accuracy,
2 (Year) of the goal size/type used in the research your work
etc.)
Key results
Author Short description Dataset Models/Algorithms Challenges or gaps How it informs
(Accuracy,
3 (Year) of the goal size/type used in the research your work
etc.)
Key results
Author Short description Dataset Models/Algorithms Challenges or gaps How it informs
(Accuracy,
4 (Year) of the goal size/type used in the research your work
etc.)

KLE Tech. Univ.’s Dr. MSSCET 5/13


Dataset Description

Dataset Source: Provided by two companies in a multinational


group, containing real procurement transactions from 2021.
Size Features: 65,712 records with 17 columns, split between
Company1 and Company2. Key features: purchase orders, items,
categories, vendors, and stakeholders.
Preprocessing: removed ID codes and anonymized personal data.
Transformation: Applied target encoding, handled missing values, and
normalized numerical data.

KLE Tech. Univ.’s Dr. MSSCET 6/13


Methodology

Machine Learning Models: The project uses k-Means, Isolation


Forest, and DBSCAN for anomaly detection in procurement data1.
Model Selection: These unsupervised algorithms are chosen because
they can detect outliers without labeled data, making them suitable
for large datasets2.
Evaluation Metrics: Performance is evaluated using Precision,
Recall, F1 Score, and ROC-AUC to ensure accurate assessment of
model efficiency3

KLE Tech. Univ.’s Dr. MSSCET 7/13


Experimental Setup
Hardware/Software Environment: The project uses a
high-performance computer
with a multi-core processor, 16GB of RAM, running Python with
libraries like
scikit-learn and Pandas.
Hyperparameters and Configurations: Key hyperparameters
include the number of
clusters for k-Means, contamination rate for Isolation Forest, and
epsilon for
DBSCAN to optimize model performance.
Experimentation Process: The process involves data preprocessing,
hyperparameter tuning, model training, and systematic evaluation on
various
datasets to ensure effective anomaly detection
KLE Tech. Univ.’s Dr. MSSCET 8/13
Accuracy: When compared to unsupervised algorithms, supervised
techniques
such as SVM and neural networks demonstrated higher accuracy.
-Comparison of Models: In controlled situations, supervised
approaches
performed better than unsupervised ones.
- Visualization: The study compares the performance of several
algorithms using
tables and graphs.

KLE Tech. Univ.’s Dr. MSSCET 9/13


Discussions

Anomaly Detection: Both supervised and unsupervised machine learning


approaches for anomaly detection are highlighted in this paper.

Supervised Methods: Known for their high detection rates, techniques


such as Support Vector Machines (SVM), Neural Networks, and Decision
Trees necessitate labeled data.

Unsupervised Techniques: Without labeled data, techniques like


K-Means and SOM are helpful for identifying unknown attacks2.

Challenges: Two major obstacles to anomaly detection are high false


alarm rates and complicated systems

KLE Tech. Univ.’s Dr. MSSCET 10/13


Conclusions

Anomaly Detection: Effective use of k-Means, DBSCAN, and Isolation


Forest for detecting anomalies in purchase datasets. Prioritization:
Ensemble method to prioritize anomalies for review. Explicability: Use of
LIME, Shapley, and SHAP for explaining anomalies. Objectives Met:

the methodology successfully detected and prioritized anomalies.

Future work Explore additional clustering algorithms.

KLE Tech. Univ.’s Dr. MSSCET 11/13


References

Luna and others, title:Assembly Line Anomaly Detection and Root Cause
Analysis Using Machine Learning,Journal, 2020

Smith, Title: A Modular Ice Cream Factory Dataset on Anomalies in


Sensors to Support Machine Learning Research in Manufacturing Systems
Journal, 2020

Manar Abu Talib,Title:A Systematic Literature Review (SLR) of machine


learning models for anomaly detection, Journal, Year:2020

KLE Tech. Univ.’s Dr. MSSCET 12/13


Acknowledgements

Project supported by the Ministry of Science and Innovation and ERDF ”A


way of making Europe.”

KNIME Analytics Platform used for data processing and analysis.

KLE Tech. Univ.’s Dr. MSSCET 13/13

You might also like