0% found this document useful (0 votes)
16 views

Building A Smart Safety Data Sheet Parser Using NLP Lab

Uploaded by

Blidisel Alin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Building A Smart Safety Data Sheet Parser Using NLP Lab

Uploaded by

Blidisel Alin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Building a Smart Safety Data Sheet Parser Using

NLP Lab

Alin Blidisel Patrick Salomé


Platform Engineering Lead, Wisecube.AI Co-Founder and Chief Product Officer at BluePallet
About WISECUBE and BLUEPALLET

Wisecube delivers robust and BluePallet is a digital platform, combining


scalable Machine Learning data and machine learning algorithms, to
solutions with comprehensive facilitate easy interaction between chemical
architecture support, serving as a sellers and buyers.
reliable partner for end-to-end
technology empowerment. Key challenge was to automate the
exhaustive processing and interpretation of
complex chemical files, a crucial step to
drive efficiency and innovation in the
chemical industry.
The Problem
SDS/TDS/PDS sheet parsing involves extracting and interpreting information from chemical data sheets. This is

often difficult due to the large amount of complex data found in such sheets.
Challenges in Data Parsing Chemical data sheets
❑ Complicated entity extraction.

❑ Noisy data resulting in inaccurately parsed outcome.

❑ Lack of context.

❑ Intricate and highly technical information that can be difficult

to interpret.

❑ Different types of data formats and document structures

present further challenges.

❑ Data extraction requires high precision to ensure accuracy and

safety.
Efficiently parsing Safety Data Sheets (SDS)
❖ Complex Layouts and Formats

❖ Handling Varied Data Types

❖ Inconsistent Terminology and Language

❖ Data Quality and Completeness Issues

❖ Processing OCR-data

❖ Large Volume of Documents

❖ Contextual Understanding

❖ Evolution over Time


Data Parsing Strategy

❖ Data annotation (NLP Lab)

❖ Entity Recognition and Extraction

❖ Noise Handling and Data Cleansing

❖ Contextual Understanding

❖ Parsing Speed and Efficiency

❖ Human-in-the-Loop Validation
Wisecube is a Smart Parser
❖ Is a cutting-edge solution that includes NLP lab in the process how unstructured data is

parsed and transformed into structured formats.

❖ Utilizes sophisticated machine learning algorithms to automatically parse unstructured data

without requiring extensive human intervention.


Wisecube Smart Parser Architecture

❖ Parser mechanism

➢ Textract

➢ Tika

➢ NER

➢ LLM

➢ Confidence computation

❖ SmaSchema with detected fields and confidence


Wisecube Smart Parser Architecture using AWS MQ
Computing Confidence Level
❖ The data from various algorithms such as Tma, spaCy, ChatGPT.

❖ General guideline

➢ Generate Predictions

➢ Translate Predictions to Confidence Scores

➢ Combine Confidence Scores

■ Simple Averaging

■ Weighted Averaging

■ Stacking

➢ Confidence Level Estimation (a consolidated confidence level

that represents the predictions' certainty)


Wisecube/BluePallet Integration Architecture
How Smart Parser Works
❖ Data Annotation using John Snow Labs (NLP Lab)

❖ Integration of Tika for document analysis

❖ Using AWS Textract

❖ Large Language Models (LLMs)

❖ Named Entity Recognition (NER) Models

❖ NLP Lap NER models using annotations

❖ Human-in-the-Loop (HITL) Validation


Development for Smart Parser

1. DEVELOPMENT
Python 2. FRAMEWORK
Flash 3. TOOLS
AWS Textract
Amazon MQ
4. API
ChatGPT GraphQL API (Java +
AWS S3 Kotlin)
Tika
LLM and NER Models
Defined Rules
NLP Lab (JSL)
Sample response including confidence level
Automatic Integration with JSL
Why Natural Language Processing (NLP) Labs
By integrating NLP Labs' APIs into your applications, you can leverage their sophisticated models and vast

computational resources, giving you more time to focus on your application's specific business logic.
RAPID DEVELOPMENT

LOWER COSTS

CONTINUAL UPDATES

SECURITY AND PRIVACY

WIDE RANGE OF SERVICES

SCALABILITY

EASY INTEGRATION
Wisecube Smart Parser DEMO
Q&A

You might also like