We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30
EMPOWERING FINANCIAL SECURITY:
DETECTING FRAUDULENT TRANSACTIONS USING ADVANCED MACHINE LEARNING TECHNIQUES AND PREDICTIVE ANALYTICS PRESENTED BY: ARUN KUMAR R DATA SCIENCE TRAINEE LEARNBAY AGENDA
1. Introduction to the problem
2. Data Collection and Preprocessing 3. Exploratory Data Analysis 4. Model Selection and Evaluation 5. Results and Conclusion 6. Q&A 1.INTRODUCTION According to the Market Statsville Group (MSG), the global e-commerce fraud prevention market size is expected to grow from USD 38,714.0 million in 2022 to USD 303,870.4 million by 2033, growing at a CAGR of 20.6% from 2023 to 2033 Indian banks reported a Rs 4.69 lakh crore loss on account of frauds between June 1, 2014, and March 31, 2023, from around 65017 frauds reported across banks In FY2023, the total number of fraud cases in the banking system were 13,530. Of this almost 49 per cent or 6,659 cases were in the digital payment – card/internet – category. India lost at least Rs 100 crore every day to bank fraud or scams over the past seven years In financial year 2023, the Reserve Bank of India (RBI) reported a total of more than 13 thousand bank fraud cases across India. The total value of bank frauds decreased from 1.38 trillion Indian rupees to 302 billion Indian rupees. 10 TYPES OF BANKING FRAUDS IN INDIA
1.Phishing-creating fake websites and gather important information.
2.Vishing-fraudster call & gather info from customers as they call from banks or institutions 3.Frauds using online sales platform 4. Frauds due to the use of unknown/unverified mobile apps 5.ATM card skimming 6.Frauds using screen sharing apps/Remote access 7.SIM swap or SIM cloning 8.Frauds by compromising credentials on results through search engines 9.Scam through QR code scan 10.Impersonation on social media PROBLEM STATEMENT
Develop a machine learning model to detect potentially
fraudulent transactions based on the provided features. The dataset contains information about various transactions, including account age, payment method, time of transaction, and category. The goal is to build a classification model that can accurately classify transactions as either legitimate or potentially fraudulent. DATA DICTIONARY accountAgeDays: The number of days the account has been active. numItems: The number of items associated with the account. localTime: Some measure of time, possibly in hours or a similar unit. paymentMethod: The method used for payment (e.g., PayPal, store credit, credit card). paymentMethodAgeDays: The number of days since the payment method was associated with the account.(It indicates how long ago the current payment method (e.g., PayPal, credit card) was linked to the account.) isWeekend: A binary indicator of whether the transaction occurred on a weekend (1 for yes, 0 for no). Category: The category of the transaction (e.g., electronics, shopping, food). Label(Target column) A binary label (0 for legitimate, 1 for potentially fraudulent). DATA STRUCTURE No_of_columns – 8 Nos
No_of_Rows – 38662 Nos
DATA DISTRIBUTION 2.DATA CLEANING AND PREPROCESSING Duplicate values Treating missing values Encoding Outlier Treatment Feature Scaling Imbalanced data treatment – Random Over Sampler DUPLICATE VALUES
3033 duplicate rows
7.73% of total data Made two models with and without duplicate values MISSING VALUES Variables ‘isWeekend’ & ‘Category’ has 560 and 95 missing values respectively. the missing values of 'isWeekend’ is aligned with 'label's category of 'fraud' i.e.1. so filling this with 0 or 1(weekday or weekend) would make a false model, so drop this variable. Treat the ‘category’ variable with “mode” values. ENCODING Variables Category & paymentMethod has categorical values. Treat them with One Hot Encoder & drop the duplicate variable OUTLIER TREATMENT Variables numItems & paymentMethodAgeDays has outlier values. Since these outliers represent natural variations in the population, they were leaved as it is. FEATURE SCALING Variables accountAgeDays & paymentMethodAgeDays has value range upto 2000. Since there is no limit for this values, I scaled the dataset with standardization method. IMBALANCED DATASET The dependent variable ‘label’ have 0’s & 1’s in 38661 & 560 times respectively. Huge imbalance(98.57% & 1.43%) Used SMOTE method to balance the data. EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS EXPLORATORY DATA ANALYSIS MODEL SELECTION After splitting the data into train & test, I build the model in almost all the classification algorithms. Out of all the classifier models, I choose the model with high accuracy. i.e. RF model. MODEL EVALUATION
Metrics Accuracy Precision Recall F1 score
model Training 1.00 1.00 1.00 1.00 Test 1.00 1.00 1.00 1.00 MODEL EVALUATION
Here, our focus should be on Type-II
error. i.e. False Negative. It is less compared to the False Positive. MODEL EVALUATION The ROC-AUC curve also shows accuracy score of 1.00 and 0.99 for training and test accuracy. The area under the curve value also 0.99 To reduce the over fitting problem I did Cross Validation on this RF model. RESULTS The final accuracy after cross validation: 99.63 & 99.31 Business Impact: could avoid the loss of crores of money for the customers of our bank. CONCLUSION Summary: successfully implemented the bank fraud detection model. Future Works: 1. Integration with real-time data by deploying the model in cloud. 2. Exploring the anomaly detection models. THANK YOU
(Lecture Notes in Computer Science 6309 _ Information Systems and Applications, Incl. Internet_Web, And HCI) M. Tamer Özsu, Patrick Kling (Auth.), Mong Li Lee, Jeffrey Xu Yu, Zohra Bellahsène, Rainer