DeepSeek Email Classification Overview
DeepSeek Email Classification Overview
1. Project Overview
This project processes emails (.eml/.msg) to classify their request types and extract key details
(amount, date, deal name) using a fine-tuned DeepSeek AI model. It also applies OCR for
attachment processing.
Objective: Automate the classification and processing of financial service emails to improve
operational efficiency and reduce manual effort.
Actors:
Preconditions:
Workflow:
Postconditions:
Components:
2. Preprocessing Layer
3. Classification Layer
o Uses a fine-tuned DeepSeek LLM to classify emails into predefined request types
and sub-types.
2. File Structure
GenAIEmailClassification/
│-- data/ # Folder containing sample email data (.eml, .msg) and attachments
│ ├── sample1.eml
│ ├── sample2.msg
│ ├── attachments/
│ ├── invoice1.pdf
│ ├── receipt2.png
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── tokenizer.json
3. File Descriptions
• data/: Contains sample .eml and .msg emails with attachments for testing.
o api.py: Implements a FastAPI web service to classify emails and extract text.
• trained_model/: Stores the trained model files (weights, tokenizer, and configuration).
The model is trained to classify emails into the following request types along with their
subtypes:
o Account Reconciliation
o Transaction Correction
o Fee Adjustments
o Internal Transfer
o External Transfer
o Asset Consolidation
o Account Closure
o Final Settlement
o Loan Modification
o Agreement Renewal
o Invoice Payment
o Penalty Fees
o Service Charges
o Customer Deposits
o Refund Processing
o Vendor Payments
o Loan Disbursements
o Customer Withdrawals
5. Test Steps
1. Environment Setup
• venv\Scripts\activate (Windows)
• cd scripts
• Collect sample .eml and .msg files representing different request types.
• Ensure that some test files contain attachments (PDFs, images) for OCR testing.
• curl -X 'POST' \
• 'http://127.0.0.1:8000/classify-email' \
• -H 'accept: application/json' \
• -H 'Content-Type: multipart/form-data' \
• -F 'file=@sample_email.eml'
• Verify that the response includes a classified request type and extracted details.
• Ensure extracted details (amount, date, deal name) are correctly identified.
• Check logs and ensure proper processing of .eml and .msg files.
• Test error handling for unsupported file types and incorrect formats.
• python finetune.py