Problem Statement - Data Analytics
Problem Statement - Data Analytics
Problem Statement
The challenge is to detect potentially fraudulent insurance claim documents, such as medical
invoices, prescriptions, and lab test reports, which are received from various providers and
customers. Fraudulent claimants exploit identical digital or printed templates with minor
modifications, making standard document comparison ineffective due to background variations.
To streamline processes and improve efficiency, we aim to develop a robust algorithm to identify
Description instances of potential fraud by automating background noise removal, enabling document
standardization and accurate content comparison.
The objective of this problem statement is to develop a comprehensive forgery detection and
algorithm for printed, handwritten scanned documents or digitally generated documents. The
tool will focus on detecting and highlight four main types of document forgeries: scribbling or
overwriting, digital forgery, data manipulation, and whitener-based manipulation.
1. Participants can choose to work on one or more of the below type of forgeries.
4. User-Friendly Interface (10 Points): Design an intuitive and user-friendly interface for
users to upload documents, view the detected forgeries, and access the visualized
results.
7. Dataset:
a. Participants are encouraged to create their own data set or use any publicly
available datasets for training and testing their forgery detection algorithms.
Few sample documents are shared for reference.
8. Technology Stack:
a. Participants are free to choose any open-source tool, programming languages,
frameworks, or libraries for development. The solution should be deployable on
a standard machine or as a web application.
9. Ethical Considerations:
a. Ensure that the solution adheres to ethical guidelines, respects privacy, and
does not promote harmful or unethical use.
Input 1:
Output 2: