Document Automation - Data Extraction Using Generative Ai 2024-06-13-20-53-14
Document Automation - Data Extraction Using Generative Ai 2024-06-13-20-53-14
All other customer or partner trademarks or registered trademarks are owned by those companies.
The information contained in this documentation is proprietary and confidential. Your use of this information
and Automation Anywhere Software products is subject to the terms and conditions of the applicable End-
User License Agreement and/or Nondisclosure Agreement and the proprietary and restricted rights notices
included therein.
You may print, copy, and use the information contained in this documentation for the internal needs of
your user base only. Unless otherwise agreed to by Automation Anywhere and you in writing, you may
not otherwise distribute this documentation or the information contained here outside of your organization
without obtaining Automation Anywhere’s prior written consent for each such distribution.
Examples and graphics are for representation purposes only and may not accurately reflect your specific
instance. We do not assume responsibility for their maintenance or accuracy.
Generative AI models can produce errors and/or misrepresent the information they generate. It is advisable
to verify the accuracy, reliability, and completeness of the content generated by the AI model.
Content
Automation 360
Document Automation - Data extraction using
generative AI
Document Automation for Automation 360 Cloud and On-Premises provides generative AI (GenAI) capability
to extract data seamlessly from unstructured and semi-structured documents without prior training. Create
learning instance with GenAI capability to process documents in English, using a large language model
(LLM).
Note: Generative AI models can produce errors and/or misrepresent the information they
generate. It is advisable to verify the accuracy, reliability, and completeness of the content
generated by the AI model.
Benefits
Enhance extraction accuracy in a learning instance by using the Search query for generative AI model
feature when defining form and table fields. Document Automation offers a default customizable query
based on your selected field. Transmitting your query to GenAI enhances and enables data extraction from
different document types without prior training. Leverage this innovation to enhance your document
processing capability.
Video demonstrates enhancements to Document Automation enabling users extract data from tables using
natural language queries.
When you create a learning instance for unstructured documents (such as: Contracts, Agreements, Reports,
Letters, and Emails), the GenAI-driven data extraction capability is automatically selected. While defining the
Form fields and Table fieldsfor your learning instance, you can leverage the Search query for generative
AI model option to customize your data extraction request.
For an address field, the GenAI query provides a default query such as: ‘What is the Property Address?’. You
can customize this query for more focused extraction to say: ‘What is the full Property Address with city,
state and zip code?'
On processing a document, using this learning instance, the GenAI capability will extract the complete
address, instead of just the street name and number. All you need to do is define the search query in the
model just once, and then for every document processed using this model, the data gets extracted with no
additional configuration.
When creating a learning instance for semi-structured documents such as Invoices, User-defined and
Purchase orders or supply-chain documents such as: Waybill, Bill of Lading, Arrival Notice, and Packing Lists,
you can leverage the GenAI-driven data extraction capability in addition to the native extraction capability
based on user-provided updates in the Validator.
Important: Privacy Notice: When the generative AI capability is selected, the query is sent to
a third-party service. Currently, the data is sent to Microsoft Azure OpenAI service, which is located
in the US and EU regions. If you do not want your data sent to a third-party service, we
recommend not using the unstructured and semi-structured document types that uses the
generative AI feature out-of-the-box.