0% found this document useful (0 votes)

48 views10 pages

BestPracticesGuide DUPT

The document provides guidance on best practices for successful implementation of document understanding projects. It outlines key steps to take before a project, including requesting necessary software, documents, and data from clients. It also describes gathering requirements and designing processes during initial stages of a project.

Uploaded by

Tirth Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views10 pages

BestPracticesGuide DUPT

Uploaded by

Tirth Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Best Practices

Guide
Mastering the stages: Best practices for successful
implementation of Document Understanding projects
This guide equips you with the knowledge and tips to navigate the various stages of a
Document Understanding project with confidence.
Step-by-step guide
This is a step-by-step guide that can help you when starting a new job for a client. Following these steps will help
overcome some of the frictions and issues that can appear later on.

The assumption is that you are an Automation Developer with Document Understanding knowledge.

Step 0. Before the project actually starts

1. Request these applications to be installed on the VM of the client (or if working locally, install them on your
machine):

• UiPath Studio

• Process applications (SAP, web applications, portals etc.).

• Office suite (excel, outlook etc.).

• Optional software:

• Code editor (PyCharm, Visual Studio Code, Notepad++ etc.).

• Programing language compiler/interpreter (Python, Java etc.).

2. Request that the machine you're working on is correctly configured with the following:

• Developer, Test, and Production environments accessible from your machine. (Depending on needs)

• The Machine respects the minimum hardware and software requirements here. (Do Note: Have at least
16gb for a development machine. High density ones might require even more.)

• UiPath orchestrator (Folder admin rights, add/modify assets, queues, storage buckets etc.)

• UiPath Action Center (If using an ML model)

• UiPath Actions (For validating data)

• UiPath Insights configuration

• UiPath studio license.

• Shared folders (if needed).

3. Find out what kind of deployments the client will have:

• Cloud or On-premise deployment?

• For on-premise: on-prem online or air-gapped?

• What Cloud Vendor do they prefer?

• If AI Center is deployed - single-node or multi-node installation?

2
• Attended or Unattended automation?

• How many instances of AI Center do they want/have? (QA/Dev/Prod or simply have one AI Center
instance for all and use public endpoints?)

4. Make sure there is a Non-Disclosure Agreement (NDA) in place and you can exchange documents.
5. Request to see the documents you will need to extract the data from and:

• Get a vague idea of what needs to be extracted from the documents - a more in depth documentation will
come later. (Step 1, Point 3)

• Make sure you get the best possible quality of the documents (bad documents examples: having stamps
of text that needs to be extracted, resolution so low that not even a human can read it, handwritten text
over signatures, etc)

• See what solution you will be using (Form, Regex, ML) to extract the required data.

6. If you are using an ML model, Request 15 documents / vendor to use for training:

• 12 will be used for training the model.

• 3 will be used for evaluating the model.

7. Make sure that you, or the PM on the project is using this template (or an updated version of it) for tracking the
progress of the project.

Step 5 can take a bit longer for them to complete. Request it as fast as possible for you to not be delayed
too much. INSIST on receiving at least 15 documents/vendor.

Step 1. Gathering data (1-5 days)

1. Set-up calls for the following: (Recommendation is to have a BA & SA present with DU knowledge to create
the PDD/SDD, it is their responsibility, but it is understandable if they are not available).

• A walk-through of the “as-is” process.

• A walk-through of the “to-be” process.

• Note: The Walk-through of the to-be process is proposed by the BA + SA and approved by the
project.

2. In the As-Is process try to answer these questions:

• How are the documents being processed currently?

• Where are the documents coming from?

• How is the extracted data used?

• How much time does it take to process one document?

• What is the error-rate?

• Do they use any software?

3
• What’s the accuracy rate?

• Do they do any type of preprocessing or classification before extracting the data?

• What are their main pain points?

• What are the manual touch points in their end-to-end solution?

• How many people are processing these documents now?

• What is the current average handling time to process a document?

• What made them choose to switch to UiPath Document Understanding now - was there a specific event?

3. Gather more data about the documents and try to answer these questions:

• What data needs to be extracted?

• Are the same fields extracted from each document?

• Are any fields optional?

• Document types and expected volumes?

• What is the expected volume of digital + scanned + pre-digitized documents?

• What is the scanned document quality?

• Can they provide scanned samples for all expected quality levels?

• Files vs documents: input files contain one or multiple documents?

• Can an input file also contain different document types (eg: a receipt & ID card within a single
file)?

• How many document types/templates/vendors/languages are we expecting?

• What is the distribution of senders that you receive documents from?

• Is it uniform or does a subset cover a large portion? (Eg: 80% of documents come from 10
vendors)

• Focus on the large volumes? (Generally, you should, but it might differ from use case to use case)

• Focus on easiest to automate? (Generally, you should, but it might differ from use case to use case)

• How many documents are processed by the customer every year?

• On average, how many pages are per document?

• Is there handwritten text to be extracted?

• Is there a requirement to extract non-text information (Logos, check boxes, barcodes, etc)

4
• Is the customer willing to provide sufficient sample documents for all doc
types/templates/vendors/layouts/languages they want automated?

4. Find out which languages are in scope:

• Which language(s) are in scope for automation?

• Volumes/proportions for each language?

• Are there any documents containing multiple languages at the same time?

5. Find out what kind deployments the client will have:

• Cloud or On-premises deployment?

• For on-premises: on-prem online or air-gapped?

• What Cloud Vendor do they prefer?

• If AI Center is deployed - single-node or multi-node installation?

• Attended or Unattended automation?

• How many instances of AI Center do they want/have? (QA/Dev/Prod or simply have one AI Center
instance for all and use public endpoints?)

6. Try and manage expectations because the process outcome is heavily dependent on the customer providing
sufficient document samples (15/vendor minimum):

• What does the customer expect from the end-to-end solution?

• What does the customer expect from UiPath's Document Understanding product?

• What are the SLAs or timelines required for the process of documents?

• Set realistic expectations on deliverables and required effort.

• Have clear & measurable KPIs as success criteria.

7. If the client is new to RPA/DU set up a meeting to explain terminology

• Do they want to set up a CoE?

• This can help later as saying the “OCR has an error rate of 50%” is completely different from “The robot
has an error rate of 50%”

Generally, you should always focus on the large volume documents and the ones that are easier to
automate to give the best ROI to the customer.

Step 2. Process design (1-5 days)

1. Based on the answers received in the previous steps, start working on the SDDA walk-through of the “as-is”
process.

2. Document what is in scope for the automation:

5
• Document Types

• Taxonomy - group as many document types under the same taxonomy as possible

• Templates

• Languages

• Scans/Digital docs

• Success Criteria

• Others

3. Build a comprehensive and exhaustive list of BusinessRuleExceptions:

• When going through the As-Is and To-Be processes always try and find out what BREs are possible.

• Note: The client will make it sound that there are not many. Do not trust him. Ask him as if you were a
new hire what would happen if you clicked a random button, or what happens if a field is missing and so
on.

4. The End-to-End process will have a minimum of 3 processes, but can contain more:

• Dispatcher

• Document Understanding Process (DU Process)

• Performer

• Other processes can include Error Validation, API Interrogations, and others

5. Create a list of business rules for fields:

• Which fields are mandatory?

• Which fields are optional?

• If a field is missing, what should the process do?

• Can a field be computed dynamically? (For instance, in an invoice table we can compute Line Amount
dynamically: Line Amount = Quantity * Unit Price)

• Can we send from a dispatcher/outside system extra data to the DU Process for automatic/rule-based
validation?

6. Organize the taxonomies of the different document types under proper Groups and Categories

7. Try different OCRs and see which gives the best results. Start with UiPath OCR and if the results are not as
good as needed:

• Google / Microsoft OCRs (these require an extra license bought from them)

• Omnipage

6
• UiPath CJK (for Chinese, Japanese, Korean)

• Tesseract

8. How will the robot access the input files:

• A dispatcher sends the relevant item to a storage bucket.

• A dispatcher sends the relevant item to a network attached storage.

• A dispatcher sends the relevant item to Data Service

• An Integration Service triggers the DU Process when a file is placed in a specific location.

• An Integration Service triggers the DU Process when an email is received.

• Other solutions.

9. Document Classification:

• Is it required?

• Is Intelligent Keyword Classifier (IKC) enough?

• Do we need to use the ML Classifier (with ML Splitter)?

• Determine the business rules for sending the document to the human-in-the-loop for classification
validation.

• For trainable classifiers determine the input training set and the feedback loop.

10. Data Extraction:

• What extractors are most suitable?

• Choose an extractor that works best for most fields.

• Using more than one extractor is also an option, but it will consume more AI Units.

• Please inform the customer of this.

• For trainable extractors, consider the initial training & feedback loop.

11. Post Extraction:

• Post Processing for Invoices/Other document types should be done here with the help from the rules
gathered in Step 2: Point 5.

• Determine the business rules and thresholds for sending the document to the human-in-the-loop for
extraction validation.

12. How will the robot export the data?

• Recommendation is to serialize the data and store it (storage bucket, NAS, data service) and then send
in the queue item for the performer only the reference to the location.

7
13. How will we test the solution:

• Does the client want tests?

• What should we actually test?

14. How are we reaching our KPI targets:

• Our recommendation would be to use Insights with some OOTB dashboards with custom additions
where needed by business.

• Daily/Weekly reports for business (depending on need)

• Daily Weekly reports for technical reasons (issue monitoring etc) (depending on need).

Step 3. Implementation (X days – based on complexity)

1. Implement the Document Understanding process as soon as possible:

• The process should be up and running in the first couple of days after you get the required access. This
will help in testing out the infrastructure and making sure everything works (with the DU Process sample
data).

• Changes to the process for the client’s specific needs will be done after completing the Data Gathering
(Step 1) and Process Design (Step2) steps.

• Having the basic process will help you with the testing of the ML model for extraction/classification.

2. Set up a meeting with the client SMEs and teach them how to label data in AI Center Data Labeling tool:

• Remember to verify their work as well as golden rule of ML states: Garbage in → Garbage out

• This can take anywhere from 2-3 days to 2 weeks usually.

3. Build the first model as soon as possible and test it out. Ideally you would have a model with at least 750+
documents (50+vendors)

• Labelling should be done by people with a good understanding of the documents and data.

• Set appropriate types & post-processing when defining fields.

• When data corresponding to a field appears multiple times, tag all instances (even on different pages)

• Label diverse data; have sufficient sample size for all document layouts.

• When using user-validated data exported by the robot (via train ML extractor), add it to the existing
dataset for the model; don't create new datasets.

4. Evaluate the model:

• If the results are not promising, add more data (either more vendors, or more data for specific vendors
where it had more issues).

• Make sure the labeling has been done correctly!

8
• See if there are any OCR errors and try to see if other OCRs would perform better. (You can
configure different OCRs in Document Manager)

• The fields discussed are correctly identified and labeled.

• Stamps/Handwriting over the field we actually want to extract. (Not much one can do here,
except to keep in this fact in mind and inform the client about the issue).

5. If the model is decent (85%+ accuracy) continue with this model until UAT/Hypercare depending on client
feedback:

• If the model is decent enough and the human in the loop (HITL) rate is not that high, continue with this
before doing a retraining as it can take a while.

6. If using an ML classifier repeat points 3-5 as well for it.

7. Develop the Dispatcher if present. (It can be an Integration Service or API call as well).

8. Implement the DU Process with the specific Use Case modifications.

• Use LINQ to calculate confidence level for data extraction. It’s much faster than iterating through all the
fields with ForEach.

• Do not pass extracted data to other processes via Arguments or Queue Items. Extracted data can easily
exceed the maximum size allowed by Arguments/Queues.

9. Develop the Error handling mechanism. (What happens with the BREs and System Exceptions?).

10. Develop the Performer. (As you know, some systems can be very complex, or you can have a lot of edge
cases. This is why this step usually takes the longest).

11. Develop the Tests (with Test suite) if required.

12. Develop the Insights integration if required.

Step 4. Integration and UAT (5-10 days)

1. Do copious end-to-end testing on your Dev/QA environment to make sure everything works.

• All happy paths.

• All known exception paths.

• Load testing if possible (create 100 queue items and start the trigger then, see how everything unfolds).

2. Set a call with the customer to set expectations in UAT and teach them how to use the process.

3. Start the UAT process and expect to fix a lot of bugs.

Step 5. Deployment and Hypercare (5-10 days)

1. Create migration documentation from Dev/QA to Prod (assets, queues, buckets, etc).

2. Start the process in production and watch the world burn!

9
3. Hotfix the critical issues. Any major changes will require a Change Request.

4. The PM should get a sign-off on the delivery of the project.

Emv Manual
100% (2)
Emv Manual
9 pages
Microsoft Fabric BootCamp
No ratings yet
Microsoft Fabric BootCamp
32 pages
Touchpad Plus Ver. 1.1 Class 5
From Everand
Touchpad Plus Ver. 1.1 Class 5
Nisha Batra
No ratings yet
ERP Rollout and Merger - Demerger Process
No ratings yet
ERP Rollout and Merger - Demerger Process
15 pages
Best Practices Guide
No ratings yet
Best Practices Guide
10 pages
MSA Questionnaire - Aashish J
No ratings yet
MSA Questionnaire - Aashish J
9 pages
Document Understanding Webinar
No ratings yet
Document Understanding Webinar
28 pages
RPA Unit 3 .B, 4 and 5
No ratings yet
RPA Unit 3 .B, 4 and 5
36 pages
Document Understanding Webinar
No ratings yet
Document Understanding Webinar
28 pages
Solution Design Document: Invoice Entry Project
No ratings yet
Solution Design Document: Invoice Entry Project
9 pages
Kirti LohokareResume
No ratings yet
Kirti LohokareResume
8 pages
Solution Design Document: Invoice Entry Project
No ratings yet
Solution Design Document: Invoice Entry Project
10 pages
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Cursuri RPA
No ratings yet
Cursuri RPA
18 pages
Project Scope
No ratings yet
Project Scope
5 pages
D - Saimuni - Uipath 1
No ratings yet
D - Saimuni - Uipath 1
8 pages
Touchpad Prime Ver. 1.2 Class 6: Windows 7 & MS Office 2010
From Everand
Touchpad Prime Ver. 1.2 Class 6: Windows 7 & MS Office 2010
Nisha Batra
No ratings yet
UiPath Studio Best Practices
No ratings yet
UiPath Studio Best Practices
14 pages
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
From Everand
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
Dr. Sanjay Jain
No ratings yet
Unit 2
No ratings yet
Unit 2
85 pages
AutomationAnywhere Migration Document
No ratings yet
AutomationAnywhere Migration Document
16 pages
Paras UiPath
No ratings yet
Paras UiPath
6 pages
Invoice Processing Using AI
No ratings yet
Invoice Processing Using AI
13 pages
Session 1: - Robotic Process Automation Introduction & Concepts Topics
No ratings yet
Session 1: - Robotic Process Automation Introduction & Concepts Topics
4 pages
Anand - RPA Developer - Resume
No ratings yet
Anand - RPA Developer - Resume
5 pages
Touchpad Plus Ver. 3.1 Class 6: Linux & LibreOffice
From Everand
Touchpad Plus Ver. 3.1 Class 6: Linux & LibreOffice
Geeta Zunjani
No ratings yet
Training
No ratings yet
Training
4 pages
Udacity Enterprise Syllabus RPA Developer With UiPath Nd340
No ratings yet
Udacity Enterprise Syllabus RPA Developer With UiPath Nd340
14 pages
Aman CV 27 Nov 2023
No ratings yet
Aman CV 27 Nov 2023
6 pages
AI Intern Assignment - InveeSync
No ratings yet
AI Intern Assignment - InveeSync
4 pages
RPA Developer Nanodegree Program Syllabus
No ratings yet
RPA Developer Nanodegree Program Syllabus
12 pages
Advanced Software Engineering Group - Consolidated
No ratings yet
Advanced Software Engineering Group - Consolidated
20 pages
Touchpad Information Technology Class 9
From Everand
Touchpad Information Technology Class 9
Sanjay Jain
No ratings yet
AI Summit Workshop Slides
No ratings yet
AI Summit Workshop Slides
22 pages
RPA
No ratings yet
RPA
2 pages
Specialized AI Professional - Academy One-Pager
No ratings yet
Specialized AI Professional - Academy One-Pager
1 page
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Touchpad Plus Ver. 1.1 Class 6: Windows 7 & MS Office 2010
From Everand
Touchpad Plus Ver. 1.1 Class 6: Windows 7 & MS Office 2010
Nisha Batra
No ratings yet
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
From Everand
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
Vladimir Pantic
No ratings yet
Touchpad Modular Ver. 1.1 Class 7: Windows 7 & MS Office 2010
From Everand
Touchpad Modular Ver. 1.1 Class 7: Windows 7 & MS Office 2010
Team Orange
No ratings yet
Lesson 4 - Process Analysis
No ratings yet
Lesson 4 - Process Analysis
21 pages
How Should I Continue My Learning Journey On The UiPath Academy - With Links
No ratings yet
How Should I Continue My Learning Journey On The UiPath Academy - With Links
2 pages
Ananda Kumar J RPA UIPath
No ratings yet
Ananda Kumar J RPA UIPath
5 pages
Naukri PinkuTarini (5y 11m)
No ratings yet
Naukri PinkuTarini (5y 11m)
8 pages
J P Morgan Case Study
No ratings yet
J P Morgan Case Study
6 pages
Ultimate Full-Stack Web Development with MEVN: Learn From Designing to Deploying Production-Gr7ade Web Applications with MongoDB, Express, Vue, and Node.js on AWS, Azure, and GCP (English Edition)
From Everand
Ultimate Full-Stack Web Development with MEVN: Learn From Designing to Deploying Production-Gr7ade Web Applications with MongoDB, Express, Vue, and Node.js on AWS, Azure, and GCP (English Edition)
Bhargav Bachina
No ratings yet
Touchpad Computer Applications Class 9
From Everand
Touchpad Computer Applications Class 9
Sanjay Jain
4/5 (1)
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Professional Summary: Responsibilities
No ratings yet
Professional Summary: Responsibilities
5 pages
Getting Started with Review Board
From Everand
Getting Started with Review Board
Sandeep Rawat
No ratings yet
Touchpad Plus Ver. 3.1 Class 8
From Everand
Touchpad Plus Ver. 3.1 Class 8
Geeta Zunjani
No ratings yet
UiPath Developer
No ratings yet
UiPath Developer
3 pages
UiPath Certified Professional - Automation Developer Associate Exam Description
No ratings yet
UiPath Certified Professional - Automation Developer Associate Exam Description
9 pages
DSD Template v01
No ratings yet
DSD Template v01
10 pages
SDD Invoice Entry
No ratings yet
SDD Invoice Entry
8 pages
Touchpad Plus Ver. 3.1 Class 5: Linux & LibreOffice
From Everand
Touchpad Plus Ver. 3.1 Class 5: Linux & LibreOffice
Geeta Zunjani
No ratings yet
Ui - Pathnotes1 (AutoRecovered)
No ratings yet
Ui - Pathnotes1 (AutoRecovered)
43 pages
Croma Campus - UiPath (RPA) Training Curriculum
No ratings yet
Croma Campus - UiPath (RPA) Training Curriculum
4 pages
Attended Bots Bluerrsim
No ratings yet
Attended Bots Bluerrsim
8 pages
Unit I Introduction To Robotic Process Automation
No ratings yet
Unit I Introduction To Robotic Process Automation
26 pages
Touchpad Prime Ver. 1.2 Class 8: Windows 7 & MS Office 2010
From Everand
Touchpad Prime Ver. 1.2 Class 8: Windows 7 & MS Office 2010
Nisha Batra
No ratings yet
Devops in Practice: Reliable and automated software delivery
From Everand
Devops in Practice: Reliable and automated software delivery
Danilo Sato
1/5 (1)
Tests Timetable Sem I 2024-2025
No ratings yet
Tests Timetable Sem I 2024-2025
14 pages
B.TECH IT - Syllabus 1
No ratings yet
B.TECH IT - Syllabus 1
7 pages
NRJCAT20023EN
No ratings yet
NRJCAT20023EN
64 pages
Securitypatrolrobot
No ratings yet
Securitypatrolrobot
7 pages
Major Project PPT Format (1) Hand Gesture Recognition
No ratings yet
Major Project PPT Format (1) Hand Gesture Recognition
20 pages
Mimum Requ Suppo
No ratings yet
Mimum Requ Suppo
5 pages
XENIA - ISO 27001 Audit
No ratings yet
XENIA - ISO 27001 Audit
11 pages
Redis Tutorial
No ratings yet
Redis Tutorial
29 pages
Ostrich 1 Operation Moates Support
No ratings yet
Ostrich 1 Operation Moates Support
5 pages
Emu Log
No ratings yet
Emu Log
33 pages
Bassel Agha
No ratings yet
Bassel Agha
3 pages
EO Gaddis Java Chapter 06 6e-ClassesObjectsPart2
No ratings yet
EO Gaddis Java Chapter 06 6e-ClassesObjectsPart2
48 pages
Unit 5 - Queues
No ratings yet
Unit 5 - Queues
16 pages
Tim Babych: Experience Senior Software Engineer, People - Ai
No ratings yet
Tim Babych: Experience Senior Software Engineer, People - Ai
2 pages
Stack14 IBM Cloud Runbook
100% (1)
Stack14 IBM Cloud Runbook
121 pages
QuickStart MTD4
No ratings yet
QuickStart MTD4
8 pages
NTPL Weaving PP User Manual
No ratings yet
NTPL Weaving PP User Manual
29 pages
Syllabus CEE9518 - Summer 2019 - v2
No ratings yet
Syllabus CEE9518 - Summer 2019 - v2
3 pages
CL ProfessionalSkills AE Kn1of2
0% (1)
CL ProfessionalSkills AE Kn1of2
31 pages
Software Testing Brochure
No ratings yet
Software Testing Brochure
27 pages
Part 27 AML STC Maintenance Manual: GTN Xi
No ratings yet
Part 27 AML STC Maintenance Manual: GTN Xi
149 pages
Sparsh Ai Reel
No ratings yet
Sparsh Ai Reel
5 pages
350 701 459qa
No ratings yet
350 701 459qa
193 pages
CSC128 Mini Project Final Report & Rubric
No ratings yet
CSC128 Mini Project Final Report & Rubric
4 pages
TS5010 3010 3020 Series FW 5.00 Changelog Mac en
No ratings yet
TS5010 3010 3020 Series FW 5.00 Changelog Mac en
25 pages
Research Article A Review of Different Comparative Studies On Mobile Operating System
No ratings yet
Research Article A Review of Different Comparative Studies On Mobile Operating System
5 pages
Impact Networking Mind Map
No ratings yet
Impact Networking Mind Map
1 page
Big
No ratings yet
Big
347 pages