0% found this document useful (0 votes)

17 views3 pages

Generate Insights From Unstructured Financial Data

This document discusses building a backend storage architecture for an AI system that analyzes financial documents. It involves preprocessing data through steps like removing stop words, lemmatization and vectorization. The preprocessed data is then converted to embeddings and stored in a vector database to enable identifying related information quickly based on embedding similarities. Knowledge graphs are also generated by extracting entities through named entity recognition and storing the relationships between entities in a graph database for generating insights from the unstructured financial data.

Uploaded by

Azharudeen salim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views3 pages

Generate Insights From Unstructured Financial Data

Uploaded by

Azharudeen salim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Earnings Call Transcipts

Sources News Articles

Investor Reports

UI Interface for Users

End Product Backend Storage Architecture

ML Models

To store embeddings
On daily basis, get latest Articles /
Why do we need it? Reports / Transcripts
Links between Embeddings to give
back related Information Quickly
Remove Stop words

Store
Lematization

Data Cleaning Normalize Data

dding
Converting Data to Embeddings Preprocessing Required
Vectorization
Glove?

Vector Store Store to Vector DB

enAI /

Cost Low Accuracy Low

n, refers to an AI-native open-source
er productivity and happiness. It is Mitigation Transfer Learning
Traditional NLP Approaches
representations of data efficiently. High No of Training Documents
Chroma Limitations
ns of documents and perform Needed No. Of Labeled Documents
uments based on their vector 2000 - 3000
Needed
Options Available
Overall Sentiment
Cost High Accuracy High
Pinecone

GPT 3.5 : 8193 tokens / ~6000

Faiss OpenAI / Pre trained LLM
words
Limitations? Token Limits

To identify links between Generate insights from Mitigation Vectorization

Information based on Probability / Why do we need it? unstructured financial
Repeated Occurence
data Products / Product Lines
Steps for Individual Document
converts to Triplets REBEL Clean Data and Create Graph Knowledge Graph People Names Designation
What are we identifying ?
Neo4j Geographical Information
Storing the Graph Information &
Visualizing it Interactively
Graph DB + PyViz NER Companies

Why is it Needed? REBEL Model

Data processing

Chains & Agents How are we doing it? AWS Comprehend

Chunking Spacy

Vector Store Rule Based Approach

LLM & Prompting Identifying Topics / Keywords
g Parameters Preprocessing Required Train Model based on Labeled
Data

pt Templates
Context Aware Sentiment Analysis How?
Fine tuning the LLM

s by Transfer GPT based Approach

Document Summary How?
seq-to-seq model

LLama Index? Rule based approach to identify Click on Question to navigate to

Key Questions asked
and list questions asked in the call? Response

Network Graph of Related Parties How? Neo4j

Identify Trending Topics

At Aggregate Level
Identify Risks

Depends on Vector Store

Document Q&A How scalable is it?

Dependency on Open AI Tokens to
pass in Vectors
Generate insights from unstructured financial data
1. Sources
1.1. Earnings Call Transcipts

1.2. News Articles

1.3. Investor Reports

2. End Product
2.1. UI Interface for Users

2.2. Backend Storage Architecture

2.3. ML Models
3. Steps for Individual Document
3.1. On daily basis, get latest Articles / Reports / Transcripts

3.2. Data Cleaning

3.2.1. Remove Stop words

3.2.2. Lematization

3.2.3. Normalize Data

3.2.4. Vectorization

3.2.5. Store to Vector DB

3.3. Overall Sentiment

3.3.1. Traditional NLP Approaches

3.3.1.1. Cost Low Accuracy Low

3.3.1.2. Limitations

3.3.1.2.1. High No of Training Documents Needed

3.3.1.2.1.1. Mitigation

3.3.1.2.1.1.1. Transfer Learning

3.3.1.2.1.2. No. Of Labeled Documents Needed

3.3.1.2.1.2.1. 2000 - 3000

3.3.2. OpenAI / Pre trained LLM

3.3.2.1. Cost High Accuracy High

3.3.2.2. Limitations?

3.3.2.2.1. Token Limits

3.3.2.2.1.1. GPT 3.5 : 8193 tokens / ~6000 words

3.3.2.2.1.2. Mitigation

3.3.2.2.1.2.1. Vectorization

3.4. Data processing

3.4.1. NER

3.4.1.1. What are we identifying ?

3.4.1.1.1. Products / Product Lines

3.4.1.1.2. People Names

3.4.1.1.2.1. Designation

3.4.1.1.3. Geographical Information

3.4.1.1.4. Companies

3.4.1.2. How are we doing it?

3.4.1.2.1. REBEL Model

3.4.1.2.2. AWS Comprehend

3.4.1.2.3. Spacy

3.4.2. Identifying Topics / Keywords

3.4.2.1. Rule Based Approach

3.4.2.2. Train Model based on Labeled Data

3.5. Context Aware Sentiment Analysis

3.5.1. How?
3.6. Document Summary

3.6.1. How?

3.6.1.1. GPT based Approach

3.6.1.2. seq-to-seq model

3.7. Key Questions asked

3.7.1. Rule based approach to identify and list questions asked in the call?

3.7.1.1. Click on Question to navigate to Response

3.8. Network Graph of Related Parties

3.8.1. How?

3.8.1.1. Neo4j

4. At Aggregate Level
4.1. Identify Trending Topics

4.2. Identify Risks

5. Document Q&A
5.1. How scalable is it?

5.1.1. Depends on Vector Store

5.1.2. Dependency on Open AI Tokens to pass in Vectors

6. LLM & Prompting
6.1. Why is it Needed?

6.2. Preprocessing Required

6.2.1. Chains & Agents

6.2.2. Chunking

6.2.3. Vector Store

6.2.4. Fine tuning the LLM

6.2.4.1. Adjusting Parameters

6.2.4.2. Prompt Templates

6.2.4.3. Add more Layers by Transfer Learning

6.2.4.3.1. Not possible for OpenAI

6.2.4.3.2. Can be done for Falcon 30B?

7. Knowledge Graph
7.1. Why do we need it?

7.1.1. To identify links between Information based on Probability / Repeated Occurence

7.2. Clean Data and Create Graph

7.2.1. REBEL

7.2.1.1. Does NER and converts to Triplets

7.3. Storing the Graph Information & Visualizing it Interactively

7.3.1. Neo4j

7.3.2. Graph DB + PyViz

8. LLama Index?
9. Vector Store
9.1. Why do we need it?

9.1.1. To store embeddings

9.1.2. Links between Embeddings to give back related Information Quickly

9.2. Preprocessing Required

9.2.1. Converting Data to Embeddings

9.2.1.1. Internally handled by Vector Store Creation

9.2.1.2. Word Embedding

9.2.1.3. Glove?

9.2.1.4. Interanlly handled by OpenAI / Hugging Face

9.3. Options Available

9.3.1. Chroma

9.3.1.1. Chroma, in the context of vectorization, refers to an AI-native open-source vector database focused on developer productivity and happiness. It is designed to store and retrieve vector representations of data
efficiently. Chroma allows you to create collections of documents and perform similarity searches to find similar documents based on their vector representations.

9.3.2. Pinecone

9.3.3. Faiss

Embeddings, Vector Databases, and Search in LLM
No ratings yet
Embeddings, Vector Databases, and Search in LLM
38 pages
The Rise of Vector Databases in The Age of LLMs
No ratings yet
The Rise of Vector Databases in The Age of LLMs
26 pages
Unit I Lesson 3 Computing The Mean of A Discrete Probability Distribution
100% (1)
Unit I Lesson 3 Computing The Mean of A Discrete Probability Distribution
24 pages
Generative AI Roadmap
No ratings yet
Generative AI Roadmap
36 pages
Stas Bekman - Machine Learning Engineering
No ratings yet
Stas Bekman - Machine Learning Engineering
308 pages
AI Engineer Interview Prep Guide
No ratings yet
AI Engineer Interview Prep Guide
16 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
11 pages
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Vector Databases
No ratings yet
Vector Databases
24 pages
Step 2 Ai Agents
No ratings yet
Step 2 Ai Agents
1 page
Well Posed Learning Problems and Applications of ML
100% (1)
Well Posed Learning Problems and Applications of ML
17 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
Algorithms Lab Viva Questions
No ratings yet
Algorithms Lab Viva Questions
2 pages
ALL Things On AI & Data Analytics
No ratings yet
ALL Things On AI & Data Analytics
15 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
Lecture 2-3
No ratings yet
Lecture 2-3
65 pages
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
No ratings yet
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
54 pages
Vector Databases
No ratings yet
Vector Databases
2 pages
Rapport Template Master-4
No ratings yet
Rapport Template Master-4
25 pages
AI Database Querying Solution
No ratings yet
AI Database Querying Solution
19 pages
(RMIT Hack-A-Venture 2024) AI Workshop
No ratings yet
(RMIT Hack-A-Venture 2024) AI Workshop
40 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Roadmap To Computer Vision
No ratings yet
Roadmap To Computer Vision
93 pages
The Case Against Vector Databases
No ratings yet
The Case Against Vector Databases
24 pages
Module 2
No ratings yet
Module 2
73 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
Generative Certification Notes-1
No ratings yet
Generative Certification Notes-1
22 pages
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
No ratings yet
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
2 pages
LangChain From 0 To 1 Public 1 PpuSgEN
No ratings yet
LangChain From 0 To 1 Public 1 PpuSgEN
39 pages
LLM Rag 2
No ratings yet
LLM Rag 2
17 pages
Lec 01
No ratings yet
Lec 01
71 pages
100 Must Know AI Terms 1744302579
No ratings yet
100 Must Know AI Terms 1744302579
12 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
ML Engine
No ratings yet
ML Engine
3 pages
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
No ratings yet
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
20 pages
Sponsored DZ RC 396 Getting Started Vector Databas
No ratings yet
Sponsored DZ RC 396 Getting Started Vector Databas
9 pages
Roadmap 2024 Genai
No ratings yet
Roadmap 2024 Genai
17 pages
Little Guide To Building Large Language Models in 2024
No ratings yet
Little Guide To Building Large Language Models in 2024
65 pages
Clevered AI Wizard Level 3
No ratings yet
Clevered AI Wizard Level 3
17 pages
Ai Notes
No ratings yet
Ai Notes
7 pages
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
No ratings yet
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
17 pages
Large Language Model (LLM) Interview Question and Answer Course
No ratings yet
Large Language Model (LLM) Interview Question and Answer Course
10 pages
What Is Vector
No ratings yet
What Is Vector
4 pages
5 Popular ML Combos
No ratings yet
5 Popular ML Combos
9 pages
Brolly AI - Generative AI - Online Training
No ratings yet
Brolly AI - Generative AI - Online Training
13 pages
Enhanced Stock Prediction Pipeline With RAG and Fine-Tuned LLM
No ratings yet
Enhanced Stock Prediction Pipeline With RAG and Fine-Tuned LLM
10 pages
14 Key Skills To Master Large Language Models 1729745509
No ratings yet
14 Key Skills To Master Large Language Models 1729745509
17 pages
CrateDB and LangChain
No ratings yet
CrateDB and LangChain
14 pages
Model Training and Fine Tuning
No ratings yet
Model Training and Fine Tuning
11 pages
Langchain N VDB
No ratings yet
Langchain N VDB
6 pages
Project
No ratings yet
Project
7 pages
Vector-DataBase in AI
No ratings yet
Vector-DataBase in AI
14 pages
Vector Database
No ratings yet
Vector Database
7 pages
Week 11 Chats
No ratings yet
Week 11 Chats
5 pages
Week 5 Large Language Models
No ratings yet
Week 5 Large Language Models
5 pages
Government Polytechnic, Washim: "Implement Modifier Caesar's Cipher With Shift of Any Key
No ratings yet
Government Polytechnic, Washim: "Implement Modifier Caesar's Cipher With Shift of Any Key
12 pages
Training For AI Engineer Interns
No ratings yet
Training For AI Engineer Interns
4 pages
Ways To Use LLM in Finance Organisation
No ratings yet
Ways To Use LLM in Finance Organisation
5 pages
AI ML Concepts Questions
No ratings yet
AI ML Concepts Questions
4 pages
059145a019c2fb - Operations Research Theory & Practice - Nvs Raju - Ch1!2!16 - Page-0001
No ratings yet
059145a019c2fb - Operations Research Theory & Practice - Nvs Raju - Ch1!2!16 - Page-0001
15 pages
Machine Translation
No ratings yet
Machine Translation
10 pages
Stability and Root Locus
No ratings yet
Stability and Root Locus
7 pages
19 Web Mining 2
No ratings yet
19 Web Mining 2
41 pages
Digital Certificates and Digital Signature
No ratings yet
Digital Certificates and Digital Signature
5 pages
Module 2 Review On Linear Equations and Inequalities
No ratings yet
Module 2 Review On Linear Equations and Inequalities
10 pages
AI March - 2024
No ratings yet
AI March - 2024
1 page
Interpolation Lagrange
No ratings yet
Interpolation Lagrange
10 pages
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
No ratings yet
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
5 pages
Chapter 2 The Classical Linear Regression Model (CLRM)
No ratings yet
Chapter 2 The Classical Linear Regression Model (CLRM)
20 pages
1999 - A Statistical Method For Practical Assessment of Sawability With Diamond Wire Cutting Machine of Ankara-Cubuk Andesites
No ratings yet
1999 - A Statistical Method For Practical Assessment of Sawability With Diamond Wire Cutting Machine of Ankara-Cubuk Andesites
4 pages
Lec 1 The Random Behavior of Asset Prices (Long) 20170821182630
No ratings yet
Lec 1 The Random Behavior of Asset Prices (Long) 20170821182630
20 pages
ISM Unit - 2
No ratings yet
ISM Unit - 2
9 pages
Slides Nancy Liao Brief Intro To Blockchain Iac 101217 1
No ratings yet
Slides Nancy Liao Brief Intro To Blockchain Iac 101217 1
11 pages
AP 2.2 Tanmay
No ratings yet
AP 2.2 Tanmay
5 pages
Flow Chart
No ratings yet
Flow Chart
9 pages
2011 - DC Motor Control Using Ant Colony Optimization PDF
No ratings yet
2011 - DC Motor Control Using Ant Colony Optimization PDF
32 pages
C Program Algorithm
No ratings yet
C Program Algorithm
3 pages
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
No ratings yet
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
12 pages
Tutorial 5, Design and Analysis of Algorithms, 2024
No ratings yet
Tutorial 5, Design and Analysis of Algorithms, 2024
2 pages
Sentiment Analysis For A Resource Poor Language-Roman Urdu
No ratings yet
Sentiment Analysis For A Resource Poor Language-Roman Urdu
15 pages
Probabilities
No ratings yet
Probabilities
2 pages
5 - Chapter 5 - Loops - Files
No ratings yet
5 - Chapter 5 - Loops - Files
3 pages
Heap Sort
No ratings yet
Heap Sort
1 page
Saddam Firas Sami
No ratings yet
Saddam Firas Sami
4 pages
Markov Chain Algorithm in Java
No ratings yet
Markov Chain Algorithm in Java
7 pages
Module4 Practice Problems
No ratings yet
Module4 Practice Problems
2 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)