0% found this document useful (0 votes)
30 views4 pages

Problem Statement - Data Analytics

The Bajaj Finserv Health Datathon challenges participants to develop algorithms for detecting fraudulent insurance claim documents, focusing on various types of forgeries such as scribbling, digital manipulation, and whitener use. The project emphasizes creating a user-friendly interface for document uploads and visualizing detected forgeries while ensuring efficiency and scalability. Participants can earn bonus points for additional features like language analysis and reducing false positives, with the freedom to choose their technology stack and datasets.

Uploaded by

jaimaabharati102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Problem Statement - Data Analytics

The Bajaj Finserv Health Datathon challenges participants to develop algorithms for detecting fraudulent insurance claim documents, focusing on various types of forgeries such as scribbling, digital manipulation, and whitener use. The project emphasizes creating a user-friendly interface for document uploads and visualizing detected forgeries while ensuring efficiency and scalability. Participants can earn bonus points for additional features like language analysis and reducing false positives, with the freedom to choose their technology stack and datasets.

Uploaded by

jaimaabharati102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bajaj Finserv Health - Datathon

Problem Statement

The challenge is to detect potentially fraudulent insurance claim documents, such as medical
invoices, prescriptions, and lab test reports, which are received from various providers and
customers. Fraudulent claimants exploit identical digital or printed templates with minor
modifications, making standard document comparison ineffective due to background variations.
To streamline processes and improve efficiency, we aim to develop a robust algorithm to identify
Description instances of potential fraud by automating background noise removal, enabling document
standardization and accurate content comparison.

The objective of this problem statement is to develop a comprehensive forgery detection and
algorithm for printed, handwritten scanned documents or digitally generated documents. The
tool will focus on detecting and highlight four main types of document forgeries: scribbling or
overwriting, digital forgery, data manipulation, and whitener-based manipulation.

1. Participants can choose to work on one or more of the below type of forgeries.

• Scribbling or Overwriting Detection (20 Points)


o Develop an algorithm to detect regions where critical data, such as date,
customer name, amount, and invoice number, has been scribbled or
overwritten on the document.
• Digital Forgery Detection (20 Points)
o Implement a mechanism to identify digitally edited or tampered regions
within the document. This includes detecting parts of the document edited
using image editing applications and identifying entirely digitally created
documents.
• Data Manipulation Detection (20 Points)
o Develop an algorithm to detect data manipulation, where certain parts of
Key Objectives the document have been added or removed, specifically focusing on critical
fields like amounts, dates, and other important information.
• Whitener Detection (20 Points)
o Implement an algorithm to detect areas where manipulation has occurred
using a whitener, aiming to identify portions of the document that have
been altered using correction fluids or similar methods.
• Any further type of forgery/tempering detection can fetch you 30 extra points.

2. Visualization of Detected Forgeries (20 Points)


a. Create a visualization interface that highlights the detected regions of forgery,
their type and accuracy/confidence. The visualization should clearly distinguish
the types of forgery detected.
3. Efficiency and Scalability (20 Points): Ensure that the solution is efficient and scalable,
capable of processing various types of documents with differing complexities.

4. User-Friendly Interface (10 Points): Design an intuitive and user-friendly interface for
users to upload documents, view the detected forgeries, and access the visualized
results.

Bonus Points (20 Points):


5. Language and Font Analysis: Implement a feature to detect inconsistencies in
language, font styles, or character sizes within the document, which may indicate
potential forgeries.
6. False Positive Reduction: Implement methods to minimize false positives in forgery
detection, ensuring high precision and reliability.

7. Dataset:
a. Participants are encouraged to create their own data set or use any publicly
available datasets for training and testing their forgery detection algorithms.
Few sample documents are shared for reference.

8. Technology Stack:
a. Participants are free to choose any open-source tool, programming languages,
frameworks, or libraries for development. The solution should be deployable on
a standard machine or as a web application.
9. Ethical Considerations:
a. Ensure that the solution adheres to ethical guidelines, respects privacy, and
does not promote harmful or unethical use.
Input 1:

Sample Input and Output Output 1:


Input 2:

Output 2:

You might also like