0% found this document useful (0 votes)

65 views8 pages

DeepSeek Email Classification Overview

The DeepSeek Email Classification project automates the classification and processing of financial service emails by using a fine-tuned AI model and OCR for attachment processing. It involves a user interface for uploading emails, a preprocessing layer for text extraction, and a classification layer that predicts request types and extracts key details. The system aims to enhance operational efficiency by accurately classifying emails and providing structured insights for further processing.

Uploaded by

priyadarsini.tripathy9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views8 pages

DeepSeek Email Classification Overview

Uploaded by

priyadarsini.tripathy9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

DeepSeek Email Classification & OCR - Documentation

1. Project Overview

This project processes emails (.eml/.msg) to classify their request types and extract key details
(amount, date, deal name) using a fine-tuned DeepSeek AI model. It also applies OCR for
attachment processing.

Objective: Automate the classification and processing of financial service emails to improve
operational efficiency and reduce manual effort.

Actors:

• End Users: Operations teams handling financial transactions.

• System: The AI-powered email classification tool.

Preconditions:

• User provides .eml or .msg files as input.

• Trained model is available for classification.

Workflow:

1. User uploads an email file via API or dashboard.

2. Email text extraction is performed.

3. Classification model predicts request type & sub-type.

4. Key details (amount, date, deal name) are extracted.

5. Results are stored and displayed in the dashboard.

6. User exports results if needed.

Postconditions:

• Email is classified accurately.

• Extracted details are saved in the database.

• User gets structured insights for further processing.

Architecture diagram

Components:

1. User Interface (API & Dashboard)

o A FastAPI-based API that accepts .eml and .msg email files.

o A Blazor-based dashboard for viewing classification results.

2. Preprocessing Layer

o Extracts email body content from .msg and .eml files.

o Uses OCR (Tesseract) to extract text from attachments (PDF, images).

o Converts extracted text into a structured format.

3. Classification Layer

o Uses a fine-tuned DeepSeek LLM to classify emails into predefined request types
and sub-types.

o Extracts key details such as amount, date, and deal name.

o Computes a confidence score for classification.

4. Storage & Data Handling

o Saves classified emails and extracted details into a PostgreSQL database.

o Detects duplicate emails to prevent redundant processing.

5. Model Training & Fine-tuning

o Uses a dataset (email_training_data.json) for supervised fine-tuning.

o Fine-tunes the DeepSeek model for improved classification accuracy.

o Saves the trained model (fine_tuned_model/) for inference.

2. File Structure

GenAIEmailClassification/

│-- data/ # Folder containing sample email data (.eml, .msg) and attachments

│ ├── sample1.eml

│ ├── sample2.msg

│ ├── attachments/

│ ├── invoice1.pdf

│ ├── receipt2.png

│-- scripts/ # Folder for core scripts

│ ├── model.py # Model definition & loading

│ ├── finetune.py # Script for fine-tuning the model

│ ├── api.py # FastAPI implementation for classification & OCR

│ ├── utils.py # Helper functions for email processing

│ ├── deepseek_email_classification.py # Classification logic (renamed)

│ ├── extract_key_details.py # OCR & data extraction logic

│-- trained_model/ # Folder containing the trained model

│ ├── config.json

│ ├── pytorch_model.bin

│ ├── tokenizer.json

│-- test/ # Folder for testing scripts & results

│ ├── test_samples/ # Sample emails for testing

│ ├── test_results.csv # Output file with classification results

│-- requirements.txt # Dependencies for installation

│-- README.md # Documentation

│-- Deepseek Test Steps.docx # Test steps document

3. File Descriptions

• data/: Contains sample .eml and .msg emails with attachments for testing.

• scripts/: Houses all core scripts.

o model.py: Loads the trained model for email classification.

o finetune.py: Fine-tunes the model with labeled training data.

o api.py: Implements a FastAPI web service to classify emails and extract text.

o utils.py: Helper functions for processing emails.

o deepseek_email_classification.py: The core classification logic.

o extract_key_details.py: Extracts important details like amount, date, and deal

name using OCR.

• trained_model/: Stores the trained model files (weights, tokenizer, and configuration).

• test/: Contains test samples and the results of classification.

• requirements.txt: Lists dependencies required for running the project.

• README.md: Main documentation file with installation and usage instructions.

• Deepseek Test Steps.docx: A step-by-step guide for testing the solution.

4. Request Types, Definitions & Subtypes

The model is trained to classify emails into the following request types along with their
subtypes:

1. Adjustment - Emails related to adjustments in financial transactions.

o Account Reconciliation

o Transaction Correction

o Fee Adjustments

2. AU Transfer - Requests for transferring assets under management.

o Internal Transfer

o External Transfer

o Asset Consolidation

3. Closing Notice - Notifications regarding closing of a deal or account.

o Account Closure

o Final Settlement

o Loan Closure Notice

4. Commitment Change - Requests to modify financial commitments.

o Credit Line Adjustment

o Loan Modification
o Agreement Renewal

5. Fee Payment - Emails related to processing fee payments.

o Invoice Payment

o Penalty Fees

o Service Charges

6. Money Movement Inbound - Requests concerning inbound fund transfers.

o Customer Deposits

o Wire Transfers Received

o Refund Processing

7. Money Movement Outbound - Requests concerning outbound fund transfers.

o Vendor Payments

o Loan Disbursements

o Customer Withdrawals

5. Test Steps

1. Environment Setup

• Ensure the Python environment is set up with required dependencies.

• Activate the virtual environment (if applicable):

• source venv/bin/activate (Linux/Mac)

• venv\Scripts\activate (Windows)

• Install dependencies if not already installed:

• pip install -r requirements.txt

2. Running the API

• Navigate to the API script location:

• cd scripts

• Start the FastAPI server using Uvicorn:

• uvicorn api:app --reload

• Verify that the server is running at http://127.0.0.1:8000/docs.

3. Preparing Test Data

• Collect sample .eml and .msg files representing different request types.

• Ensure that some test files contain attachments (PDFs, images) for OCR testing.

4. Uploading Emails for Classification via API

• Use Postman or CURL to send a request to the API endpoint:

• curl -X 'POST' \

• 'http://127.0.0.1:8000/classify-email' \

• -H 'accept: application/json' \

• -H 'Content-Type: multipart/form-data' \

• -F 'file=@sample_email.eml'

• Verify that the response includes a classified request type and extracted details.

5. Testing OCR Extraction via API

• Submit emails with PDF or image attachments.

• Confirm extracted text from images/PDFs is included in the response.

6. Validating API Responses

• Check classification accuracy against expected request types.

• Ensure extracted details (amount, date, deal name) are correctly identified.

• Log results in a CSV file for analysis.

7. Testing api.py End-to-End

• Run the API and upload test emails.

• Check logs and ensure proper processing of .eml and .msg files.

• Verify OCR extraction and classification outputs.

• Test error handling for unsupported file types and incorrect formats.

8. Performance & Error Handling Tests

• Test handling of unsupported file formats.

• Assess response time for various email sizes.

• Verify API stability with multiple concurrent requests.

9. Logging & Exporting Results

• Collect API responses and store them in test_results.csv.

• Review and analyze the accuracy of classification and extraction.

10. Model Fine-Tuning Validation

• Run the fine-tune script using:

• python finetune.py

• Re-test classification accuracy after fine-tuning.

• Ensure the newly trained model is used in model.py.

11. Final Review & Documentation

• Verify all functionalities work as expected.

• Update documentation with any additional findings or improvements needed.

Stages of Language Development
No ratings yet
Stages of Language Development
14 pages
Q2 DISS Wk8 Final
100% (1)
Q2 DISS Wk8 Final
8 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Document Understanding Webinar
No ratings yet
Document Understanding Webinar
28 pages
Email Spam Detection PPT Github
No ratings yet
Email Spam Detection PPT Github
11 pages
Topical Drug Bioavailability Bioequivalence and Penetration 2014
No ratings yet
Topical Drug Bioavailability Bioequivalence and Penetration 2014
393 pages
Đề Số 4 Key
No ratings yet
Đề Số 4 Key
8 pages
Pyhon FastAPI
No ratings yet
Pyhon FastAPI
10 pages
SMTE 8th International Conference Book Final TEXT
No ratings yet
SMTE 8th International Conference Book Final TEXT
498 pages
ICT4L-M and ICT4-M 2D Animation
No ratings yet
ICT4L-M and ICT4-M 2D Animation
15 pages
Cwi Book 5 Version 0.2
No ratings yet
Cwi Book 5 Version 0.2
57 pages
Flower and Hayes Model Writing
No ratings yet
Flower and Hayes Model Writing
1 page
Devangi It Report
No ratings yet
Devangi It Report
22 pages
Tips and Tricks Toefl
No ratings yet
Tips and Tricks Toefl
6 pages
Srikanth
No ratings yet
Srikanth
3 pages
Profile 1
No ratings yet
Profile 1
7 pages
Chapter 4 Dissertation
100% (2)
Chapter 4 Dissertation
5 pages
Chatgpt Code Chat Data
No ratings yet
Chatgpt Code Chat Data
32 pages
AI Capstone Project Email Classification
No ratings yet
AI Capstone Project Email Classification
10 pages
Best Narrative Essays
100% (2)
Best Narrative Essays
3 pages
Kirti LohokareResume
No ratings yet
Kirti LohokareResume
8 pages
Embodied Cognition Beralde
No ratings yet
Embodied Cognition Beralde
11 pages
2025 Induction
No ratings yet
2025 Induction
14 pages
Document
No ratings yet
Document
11 pages
Detailed Lesson Plan (DLP) Format: Learning Competency/Ies: Code: S10Mt-Iva-B21
No ratings yet
Detailed Lesson Plan (DLP) Format: Learning Competency/Ies: Code: S10Mt-Iva-B21
5 pages
Lang Chain
No ratings yet
Lang Chain
11 pages
D - Saimuni - Uipath 1
No ratings yet
D - Saimuni - Uipath 1
8 pages
Testbank For Nutrition and You 6th Edition Blake
No ratings yet
Testbank For Nutrition and You 6th Edition Blake
17 pages
Thirumal Resume RPA
No ratings yet
Thirumal Resume RPA
9 pages
Cody Mckeand Resume-Lang
No ratings yet
Cody Mckeand Resume-Lang
5 pages
3 WRDD
No ratings yet
3 WRDD
8 pages
Group Project
No ratings yet
Group Project
13 pages
Document Understanding Webinar
No ratings yet
Document Understanding Webinar
28 pages
2022 Judicial Performance Review
No ratings yet
2022 Judicial Performance Review
41 pages
AI Intern Assignment - InveeSync
No ratings yet
AI Intern Assignment - InveeSync
4 pages
Professional Summary: Responsibilities
No ratings yet
Professional Summary: Responsibilities
5 pages
NageshBellala (4y - 6m) - Python Developer
No ratings yet
NageshBellala (4y - 6m) - Python Developer
8 pages
Virtual Reality: by Yavuzhan Akyiğit
No ratings yet
Virtual Reality: by Yavuzhan Akyiğit
16 pages
Best Practices Guide
No ratings yet
Best Practices Guide
10 pages
DeepSeek Email Classification Documentation
No ratings yet
DeepSeek Email Classification Documentation
5 pages
Dharsshini Gandhi Resume
No ratings yet
Dharsshini Gandhi Resume
2 pages
1987 Roman Jakobson and The Semiotic Fou
No ratings yet
1987 Roman Jakobson and The Semiotic Fou
5 pages
Nick Jang
No ratings yet
Nick Jang
3 pages
Jacob Van Hoogstrate12
No ratings yet
Jacob Van Hoogstrate12
3 pages
BBRLA
No ratings yet
BBRLA
4 pages
BestPracticesGuide DUPT
No ratings yet
BestPracticesGuide DUPT
10 pages
Winter Vacation Work
No ratings yet
Winter Vacation Work
5 pages
Spam Email Detection and Deletion
No ratings yet
Spam Email Detection and Deletion
5 pages
Sunil Resume
No ratings yet
Sunil Resume
2 pages
Synopsys of Spam Classifer
No ratings yet
Synopsys of Spam Classifer
4 pages
Kaveri Bhanerkar CV
No ratings yet
Kaveri Bhanerkar CV
2 pages
Paritala
No ratings yet
Paritala
4 pages
Autogen Studio Agent Builder
No ratings yet
Autogen Studio Agent Builder
4 pages
Bijayata Resume2
No ratings yet
Bijayata Resume2
3 pages
DeepSeek Email Classification Test Steps
No ratings yet
DeepSeek Email Classification Test Steps
2 pages
Test Plan For DeepSeek Email Classification and OCR Solution
No ratings yet
Test Plan For DeepSeek Email Classification and OCR Solution
2 pages
Phase 1
No ratings yet
Phase 1
6 pages
Curriculum Vitae: Career Objective
No ratings yet
Curriculum Vitae: Career Objective
6 pages
Ministry of Education
No ratings yet
Ministry of Education
3 pages
Upper PCS UKPSC Interview Transcript - 53
No ratings yet
Upper PCS UKPSC Interview Transcript - 53
2 pages
Ashok.M: Python Developer
No ratings yet
Ashok.M: Python Developer
3 pages
Sending Emails With Python - Real Python
No ratings yet
Sending Emails With Python - Real Python
2 pages
Term 1 Syllabus Class 11
No ratings yet
Term 1 Syllabus Class 11
2 pages
Vijaya Rekha Pandimurugan - Resum PDF
No ratings yet
Vijaya Rekha Pandimurugan - Resum PDF
2 pages
Naveen Python
No ratings yet
Naveen Python
2 pages
QF ACD 019 Students Needs Assessment Questionnaire
No ratings yet
QF ACD 019 Students Needs Assessment Questionnaire
2 pages
Reaction Paper: The Spusurigao'S Outcomes-Based Education
No ratings yet
Reaction Paper: The Spusurigao'S Outcomes-Based Education
3 pages
BFB 40903 - Test 1 - Latest Semakan Nov 2018
No ratings yet
BFB 40903 - Test 1 - Latest Semakan Nov 2018
3 pages
Lions Lunch Lesson Plan
No ratings yet
Lions Lunch Lesson Plan
2 pages
Bengal School of Art
No ratings yet
Bengal School of Art
6 pages
Professional Learning Networks
No ratings yet
Professional Learning Networks
2 pages
Mastering Django: Core
From Everand
Mastering Django: Core
Nigel George
3/5 (1)
Microsoft BizTalk Server 2010 Patterns
From Everand
Microsoft BizTalk Server 2010 Patterns
Dan Rosanova
2/5 (1)
Linux Email
From Everand
Linux Email
Ian Haycox
No ratings yet
The PHP Workshop: Learn to build interactive applications and kickstart your career as a web developer
From Everand
The PHP Workshop: Learn to build interactive applications and kickstart your career as a web developer
Alexandru Busuioc
No ratings yet
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
From Everand
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
Anand Vemula
No ratings yet
Microsoft System Center Configuration Manager Advanced Deployment
From Everand
Microsoft System Center Configuration Manager Advanced Deployment
Martyn Coupland
No ratings yet
Force.com Enterprise Architecture
From Everand
Force.com Enterprise Architecture
Andrew Fawcett
4.5/5 (2)
Learning Python Application Development
From Everand
Learning Python Application Development
Ninad Sathaye
No ratings yet
Building Full Linux Mail Server Solution with Virtual Domains and Users
From Everand
Building Full Linux Mail Server Solution with Virtual Domains and Users
Dr. Hedaya Mahmood Alasooly
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Python Automation for Beginners: A Practical Guide with Examples
From Everand
Python Automation for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Exam MS-102: Microsoft 365 Administrator Complete Exam Preparation
From Everand
Exam MS-102: Microsoft 365 Administrator Complete Exam Preparation
Georgio Daccache
No ratings yet
Django Unleashed: Building Web Applications with Python's Framework
From Everand
Django Unleashed: Building Web Applications with Python's Framework
Kameron Hussain
No ratings yet
Creation of Postfix Mail Server Based on Virtual Users and Domains
From Everand
Creation of Postfix Mail Server Based on Virtual Users and Domains
Dr. Hidaia Mahmood Alassouli
No ratings yet
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing
From Everand
Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing
Rex Black
4/5 (8)
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)