unstructured.io

Software Development

San Francisco, CA 14,609 followers

Get your data RAG-ready. #ETLforLLMs

View all 66 employees

About us

At Unstructured, we're on a mission to give organizations access to all their data. We know the world runs on documents—from research reports and memos, to quarterly filings and plans of action. And yet, 80% of this information is trapped in inaccessible formats leading to inefficient decision-making and repetitive work. Until now. Unstructured captures this unstructured data wherever it lives and transforms it into AI-friendly JSON files for companies who are eager to fold AI into their business.

Website: http://www.unstructured.io/
External link for unstructured.io
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Founded: 2022
Specialties: nlp, natural language processer, data, unstructured, LLM, Large Language Model, AI, artifical intelegence, RAG, Data Base, Machine Learning, Open Source, API, Preprocessing Pipeline, Machine Learning Pipeline, and Data Pipeline

Locations

Primary

San Francisco, CA, US

Get directions

Employees at unstructured.io

See all employees

Updates

unstructured.io

14,609 followers
1mo
Report this post
🎉 We’re Live: Unstructured Serverless API is Here! We’re excited to announce that Unstructured Serverless API delivers: 💥 Simplified Onboarding and User Dashboard: Easily manage your keys, billing options, and monitor usage through an intuitive dashboard. 💥 New Per-Page Pricing: Enjoy reduced costs with a transparent and predictable pricing model. 💥 Improved Processing Throughput and Latency: Our latest generation of file transformation pipelines deliver a 5x speedup over our previous API. 💥 Enhanced Extraction Performance: Our new document transformation models deliver industry-leading extraction performance for over 25 file types. 💥 Revamped Documentation: We’ve completely rewritten our documentation, making it easier than ever to render your data RAG-ready. 👉 Sign up in seconds and get started today for FREE: https://lnkd.in/djRT-R_n #WhateverItIsWeCanStructureIt
4 Comments

Like Comment Share
unstructured.io

14,609 followers
1d
Report this post
🆕 📢 New in documentation! The Ingestion page goes over the tools and options available to you for ingesting batches of files with Unstructured. Learn how to process files from various source locations, including cloud storage and local directories, and send the processed data to target destination locations. https://lnkd.in/eg9iqzFD

Overview

docs.unstructured.io

Like Comment Share
unstructured.io

14,609 followers
1d Edited
Report this post
Have you tried our new Serverless API yet? It's FREE to get started with killer features: ✔Faster processing so you’re not stuck waiting ✔Better data extraction that gets the details right ✔Documentation that’s way easier to follow ✔ A dashboard that’s actually easy to use ✔Clear pricing—you pay by the page, no guessing And, to top it off, you get up to 1000 pages a day on us for the first 14 days. https://lnkd.in/e8MmX3ug #WhateverItIsWeCanStructureIt
Like Comment Share
unstructured.io

14,609 followers
3d
Report this post
If you missed Tuesday's webinar on Llama3.1 evaluation for RAG, you can check out the recording at https://lnkd.in/gpw9AgBy or the notebook at https://lnkd.in/gXg2ZGqj for a quick look at the results! Thanks for an insightful webinar, Nina Lopatina and Yujian Tang OSS4AI

Llama 3.1 vs Llama 3 for RAG performance on Unstructured Data

https://www.youtube.com/

Like Comment Share
unstructured.io

14,609 followers
5d
Report this post
⏰ New blog post : “Build a RAG chatbot for your personal ebook collection” There’s more to unstructured data than PDFs. In our latest blog post, you’ll learn how to: ✅ build an unstructured data ETL pipeline for EPUB files with Unstructured Serverless API, ✅ use MongoDB Atlas as a vector store and search index, ✅ orchestrate RAG with LangChain using llama 3.1 model pulled locally with Ollama, ✅ create an intuitive and interactive chatbot UI with Streamlit. Check it out: https://lnkd.in/e2nidMWf

Build a RAG chatbot to chat with your digital book collection – Unstructured

unstructured.io

1 Comment

Like Comment Share
unstructured.io

14,609 followers
5d
Report this post
If you are looking for a multi-agent framework for GraphRAG with unstructured documents, check out this recent cookbook from CAMEL-AI.org, combining Mistral AI Large 2 + Embed, Neo4j for a knowledge graph database, Qdrant for a vector database, and unstructured.io for preparing your unstructured data.
CAMEL-AI.org

560 followers
5d

🚀 GraphRAG: Boost Retrieval Accuracy and RAG Performance! Check out the diagram and tech stack below to see how this works. 🔹CAMEL-AI.org: A multi-agent system framework that facilitates the workflow and provides agents such as Knowledge Graph Agent and Chat Agent. 🔹Mistral AI: Provides the LLM Mistral Large 2 and an embedding model, Mistral Embed. 🔹Neo4j: Knowledge graph database. 🔹Qdrant: Vector database. 🔹unstructured.io: Gets the data ready for RAG. Thanks to Wendong Fan and Guohao Li for their contributions to this. 🤝 Thanks to Sophia Yang, Ph.D. and Mistral AI for collaborating on this project. 🤝 Try it out for yourself via this cookbook: https://lnkd.in/g26U57NJ
2 Comments

Like Comment Share
unstructured.io

14,609 followers
1w
Report this post
LangChain simplifies the creation of advanced RAG pipelines, while Unstructured processes all kinds of unstructured data and delivers it right into your preferred vector store. 🤝 Together, they provide a powerful toolkit for building any RAG application you can think of!
LangChain

248,985 followers
1w

Unstructured 🤝 LangChain With our latest partner package with unstructured.io, you can easily process a variety of file types into documents, which can be used for vector-store retrieval. The `langchain-unstructured == 0.1.0` package contains a production-ready hosted API from Unstructured, plus their open source local file processing. See the docs: https://lnkd.in/gbkK_TDb
1 Comment

Like Comment Share
unstructured.io

14,609 followers
1w
Report this post
How significant is the improvement of Llama3.1 over Llama3 in retrieval-augmented generation (RAG) tasks with unstructured text? Our small-scale experiment indicates substantial enhancements in faithfulness, answer similarity, and answer correctness. To hear about the results and methodology, join Nina Lopatina and Yujian Tang for a Tuesday tech talk at OSS4AI, July 30 at 9 am PT, registration link below. Or try out the linked colab notebook on any URL of your choice, and share your results in a comment below! 📆: https://lnkd.in/gViBb62u 📓: https://lnkd.in/gXg2ZGqj tech stack: #RAG #Llama3.1 #Ragas #LangChain #GPT4o #Unstructured

Llama 3.1 vs Llama 3 for RAG performance on Unstructured Data · Luma

lu.ma

Like Comment Share
unstructured.io

14,609 followers
1w
Report this post
If you want to turn unstructured documents into a knowledge graph at production scale, check out this Neo4j + Unstructured blog by Neo4j's Fanghua (Joshua) Yu and summary by Daniel Bukowski ✍ : https://lnkd.in/gVrwTJZV
Daniel Bukowski
1w Edited

Can you turn a structured document into a knowledge graph? Yes -- and here's how. 2023: Everyone was "chatting with a PDF." It was fun. It demonstrated what RAG was. But that's it. 2024: Here are 1,000 PDFs. We need a production-level RAG application that can identify nuanced differences about important policy or customer-related data. This is the most common RAG use case I have seen in 2024 -- a huge pile of documents about a topic with very high accuracy expectations. Knowledge graphs can certainly help, but what about getting high-quality data out of the documents themselves? That's where purpose-built tools like unstructured.io come in. My colleague Fanghua (Joshua) Yu has laid out a roadmap for integrating the output from unstructured.io into a Neo4j knowledge graph. Unstructured is a powerful open source library, API, and now document preprocessing commercial startup. Sure it can extract text and headers from documents. But it also excels at tables and images which are much more difficult to accurately preprocess. In addition to defining an approach to this common challenge, my colleague Joshua also provides a notebook with code to convert these elements into a free instance of Neo4j. Everything is there to try it yourself, for free! You can find Joshua's full post on the Neo4j Developer Blog, which includes links to his Github repo, here: https://buff.ly/4cTpwQJ Parsing collections of complex PDFs and loading them into a knowledge graph is one of the most common projects I have seen in 2024. Kudos to my colleague Joshua for his work to demonstrate how it can be done. Are there any other tools or approaches that have worked well with PDFs? Share what you like in the comments. Follow me Daniel Bukowski for daily posts about the intersection of graphs, data science, and GenAI. #neo4j #unstructured #graphrag #aura #llm
1 Comment

Like Comment Share
unstructured.io reposted this

Brian S. Raymond

ETL for LLMs
1w Edited
Report this post
I'm thrilled to share our latest piece, which dives deep into the challenges and opportunities in harnessing human-generated data with Generative AI (GenAI). We are at a critical inflection point in technological development, reminiscent of the transformative emergence of the Modern Data Stack a decade ago. Back then, enterprises poured resources into ETL tools, data warehouses, and BI tools to mine value from the vast volumes of structured data they generated daily. Fast forward to today, and we find ourselves at a similar juncture with GenAI. Enterprises are now grappling with the task of leveraging their exponentially larger (4-5x) stores of unstructured data in tandem with large language models (LLMs). The complexity of this endeavor cannot be understated. The diverse array of file formats, document layouts, and the complex "cocktails" of models required to render data "RAG ready" present formidable challenges. Yet, with these challenges come extraordinary opportunities. Data scientists and data engineers stand on the brink of unlocking this new category of data. The potential for growth and advancement with GenAI is immense but it requires effortlessly and rapidly joining human generated data with LLMs. This is an exciting era of discovery and we're just getting started.

unstructured.io

14,609 followers
2w Edited

💡 GenAI is poised to transform business operations, from marketing and customer service to product development and back office automation. Yet, a significant gap exists between the hype and ROI. In a recent Forbes article, Brian S. Raymond, CEO of Unstructured, highlights this challenge, emphasizing, "As important as the algorithms are, they’re only as good as the data available to them." Key takeaways: 💡GenAI is transforming marketing, customer service, HR, supply chain, and regulatory compliance. 💡Despite its potential, many companies struggle to leverage their unstructured data. 💡Overcoming this requires investment in GenAI-native preprocessing tools and skilled data engineering teams. GenAI's promise is immense, but unlocking its full potential anchors on rendering unstructured data GenAI-ready. Read the full article: https://lnkd.in/eqGZV_Rs

Council Post: How Accessing Unstructured Data Can Accelerate AI ROI And Improve Business Efficiency

social-www.forbes.com

2 Comments

Like Comment Share

Funding

unstructured.io 3 total rounds

Last Round

Series B Apr 14, 2024

US$ 40.0M

Investors

Menlo Ventures + 8 Other investors

See more info on crunchbase

unstructured.io

Software Development

San Francisco, CA 14,609 followers

Get your data RAG-ready. #ETLforLLMs

About us

Locations

Employees at unstructured.io

James Reid

Head of BizOps at Unstructured

John Newton

Co-Founder of Alfresco and Documentum. 40 years in Digital Transformation.

Robin Vasan

Enterprise Seed / Early Stage Investor

Rakesh Patel

AI/ML Product Leader

Updates

Overview

docs.unstructured.io

Llama 3.1 vs Llama 3 for RAG performance on Unstructured Data

https://www.youtube.com/

Build a RAG chatbot to chat with your digital book collection – Unstructured

unstructured.io

Llama 3.1 vs Llama 3 for RAG performance on Unstructured Data · Luma

lu.ma

Council Post: How Accessing Unstructured Data Can Accelerate AI ROI And Improve Business Efficiency

social-www.forbes.com

Join now to see what you are missing

Similar pages

Primer.ai

Contextual AI

LlamaIndex

LangChain

Cleanlab

Pinecone

Qdrant

Yurts

Perplexity

Weaviate

Funding