SlideShare a Scribd company logo
ToroDB
Open-source, MongoDB-compatible database,
built on top of PostgreSQL
Álvaro Hernández <aht@torodb.com>
ToroDB @NoSQLonSQL
DEMO!
ToroDB @NoSQLonSQL
About *8Kdata*
● Research & Development in databases
●
Consulting, Training and Support in PostgreSQL
●
Founders of PostgreSQL España, 3rd
largest PUG
in the world (>400 members as of today)
●
About myself: CTO at 8Kdata:
@ahachete
http://linkd.in/1jhvzQ3
www.8kdata.com
ToroDB @NoSQLonSQL
ToroDB in one slide
●
Document-oriented, JSON, NoSQL db
●
Open source (AGPL)
●
MongoDB compatibility (wire protocol
level)
●
Uses PostgreSQL as a storage backend
ToroDB @NoSQLonSQL
Why relational databases:
technical perspective
●
Document model is very appealing to
many. But all dbs started from scratch
●
DRY: why not use relational
databases? They are proven, durable,
concurrent and flexible
●
Why not base it on relational databases,
like PostgreSQL?
ToroDB @NoSQLonSQL
ToroDB
tables structure
ToroDB @NoSQLonSQL
ToroDB storage
●
Data is stored in tables. No blobs
●
JSON documents are split by hierarchy
levels into “subdocuments”, which
contain no nested structures. Each
subdocument level is stored separately
●
Subdocuments are classified by “type”.
Each “type” maps to a different table
ToroDB @NoSQLonSQL
ToroDB storage (II)
●
A “structure” table keeps the
subdocument “schema”
●
Keys in JSON are mapped to attributes,
which retain the original name
●
Tables are created dinamically and
transparently to match the exact types of
the documents
ToroDB @NoSQLonSQL
ToroDB storage internals
{
"name": "ToroDB",
"data": {
"a": 42, "b": "hello world!"
},
"nested": {
"j": 42,
"deeper": {
"a": 21, "b": "hello"
}
}
}
ToroDB @NoSQLonSQL
ToroDB storage internals
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}
ToroDB @NoSQLonSQL
ToroDB storage internals
select * from demo.t_3
┌─────┬───────┬────────────────────────────┬────────┐
│ did │ index │ _id │ name │
├─────┼───────┼────────────────────────────┼────────┤
│ 0 │ ¤ │ x5451a07de7032d23a908576d │ ToroDB │
└─────┴───────┴────────────────────────────┴────────┘
select * from demo.t_1
┌─────┬───────┬────┬──────────────┐
│ did │ index │ a │ b │
├─────┼───────┼────┼──────────────┤
│ 0 │ ¤ │ 42 │ hello world! │
│ 0 │ 1 │ 21 │ hello │
└─────┴───────┴────┴──────────────┘
select * from demo.t_2
┌─────┬───────┬────┐
│ did │ index │ j │
├─────┼───────┼────┤
│ 0 │ ¤ │ 42 │
└─────┴───────┴────┘
ToroDB @NoSQLonSQL
ToroDB storage internals
select * from demo.structures
┌─────┬────────────────────────────────────────────────────────────────────────────┐
│ sid │ _structure │
├─────┼────────────────────────────────────────────────────────────────────────────┤
│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │
└─────┴────────────────────────────────────────────────────────────────────────────┘
select * from demo.root;
┌─────┬─────┐
│ did │ sid │
├─────┼─────┤
│ 0 │ 0 │
└─────┴─────┘
ToroDB @NoSQLonSQL
ToroDB storage and I/O savings
29% - 68% storage required,
compared to Mongo 2.6
ToroDB @NoSQLonSQL
The software
ToroDB is written in Java, compatible with
versions 6 and above.
It has been tested on Oracle's VM, but we
will also test and verify it on Azul's VM.
It is currently a standalone JAR file but will
also be offered as an EAR, to easily
deploy to application servers.
ToroDB @NoSQLonSQL
Going beyond MongoDB
ToroDB @NoSQLonSQL
Going beyond MongoDB
MongoDB brought the document model
and several features that many love.
But can we go further than that?
Can't the foundation of relational
databases provide a basis for offering
new features on a NoSQL, document-like,
JSON database?
ToroDB @NoSQLonSQL
Going beyond MongoDB
●
Avoid schema repetition. Query-by-type
●
Cheap single-node durability
●
“Clean” reads
●
Atomic bulk operations
●
Highest concurrency
ToroDB @NoSQLonSQL
The schema-less fallacy
{
“name”: “Álvaro”,
“surname”: “Hernández”,
“height”: 200,
“hobbies”: [
“PostgreSQL”, “triathlon”
]
}
ToroDB @NoSQLonSQL
The schema-less fallacy
{
“name”: “Álvaro”,
“surname”: “Hernández”,
“height”: 200,
“hobbies”: [
“PostgreSQL”, “triathlon”
]
}
metadata → Isn't that... schema?
ToroDB @NoSQLonSQL
The schema-less fallacy: BSON
metadata → Isn't that... schema?
{
“name”: (string) “Álvaro”,
“surname”: (string) “Hernández”,
“height”: (number) 200,
“hobbies”: {
“0”: (string) “PostgreSQL” ,
“1”: (string) “triathlon”
}
}
ToroDB @NoSQLonSQL
The schema-less fallacy
●
It's not schema-less
●
It is “attached-schema”
●
It carries an overhead which is not 0
ToroDB @NoSQLonSQL
Schema-attached repetition
{ “a”: 1, “b”: 2 }
{ “a”: 3 }
{ “a”: 4, “c”: 5 }
{ “a”: 6, “b”: 7 }
{ “b”: 8 }
{ “a”: 9, “b”: 10 }
{ “a”: 11, “b”: 12, “j”: 13 }
{ “a”: 14, “c”: 15 }
Counting
“document
types” in
collections
of millions:
at most,
1000s of
different
types
ToroDB @NoSQLonSQL
Schema-attached repetition
How data is stored in schema-less
ToroDB @NoSQLonSQL
This is how we store in ToroDB
ToroDB @NoSQLonSQL
ToroDB: query “by structure”
●
ToroDB is effectively partitioning by
type
●
Structures (schemas, partitioning types)
are cached in ToroDB memory
●
Queries only scan a subset of the data.
●
Negative queries are served directly
from memory.
ToroDB @NoSQLonSQL
Cheap single-node durability
●
Without journaling, MongoDB is not
durable nor crash-safe
●
MongoDB requires “j: true” for true
single-node durability. But who
guarantees its consistent usage? Who
uses it by default?
j:true creates I/O storms equivalent to
SQL CHECKPOINTs
ToroDB @NoSQLonSQL
“Clean” reads
Oh really?
ToroDB @NoSQLonSQL
“Clean” reads
http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior
“MongoDB will allow clients to read the results of a
write operation before the write operation returns.”
“If the mongod terminates before the journal
commits, even if a write returns successfully, queries
may have read data that will not exist after the
mongod restarts.”
“Other database systems refer to these isolation
semantics as read uncommitted.”
ToroDB @NoSQLonSQL
“Clean” reads
Thus, MongoDB suffers from dirty reads.
Or probably better called “tainted
reads”.
What about $snapshot? Nope:
“The snapshot() does not guarantee that the data returned
by the query will reflect a single moment in time nor does it
provide isolation from insert or delete operations.”
http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors
ToroDB @NoSQLonSQL
ToroDB: going beyond MongoDB
●
Cheap single-node durability
PostgreSQL is 100% durable. Always.
And it's cheap (doesn't do I/O storms)
●
“Clean” reads
Cursors in ToroDB run in repeatable
read, read-only mode:
globalCursorDataSource.setTransactionIsolation("TRANSACTIO
N_REPEATABLE_READ");
globalCursorDataSource.setReadOnly(true);
ToroDB @NoSQLonSQL
Atomic operations
●
There is no support for atomic bulk
insert/update/delete operations
●
Not even with $isolated:
“Prevents a write operation that affects multiple documents
from yielding to other reads or writes […] You can ensure
that no client sees the changes until the operation completes
or errors out. The $isolated isolation operator does not
provide “all-or-nothing” atomicity for write
operations.”
http://docs.mongodb.org/manual/reference/operator/update/isolated/
ToroDB @NoSQLonSQL
High concurrency
●
MMAPv1 is still collection-locked
●
WiredTiger is document-locked
●
But still exclusive locks (MMAP). Most
relational databases have MVCC, which
means almost conflict-free readers and
writers at the same time
ToroDB @NoSQLonSQL
●
Atomic bulk operations
By default, bulk operations in ToroDB are
atomic. Use flag ContinueOnError: 1 to
perform non-atomic bulk operations
●
Highest concurrency
PostgreSQL uses MVCC. Readers and
writers do not block each other. Writers
block writers only for the same record
ToroDB: going beyond MongoDB
ToroDB @NoSQLonSQL
ToroDB: Developer Preview
●
ToroDB launched on October 2014, as
a Developer Preview. Support for CRUD
and most of the SELECT API
●
github.com/torodb
●
RERO policy. Comments, feedback,
patches... greatly appreciated
●
AGPLv3
ToroDB @NoSQLonSQL
ToroDB: Developer Preview
●
Clone the repo, build with Maven
●
Or download the JAR:
http://maven.torodb.com/jar/com/torodb/torodb/
0.20/torodb.jar
●
Usage:
java -jar torodb-0.20.jar –help
java -jar torodb-0.20.jar -d dbname -u dbuser -P 27017
Connect with normal mongo console!
ToroDB @NoSQLonSQL
ToroDB: Community Response
ToroDB @NoSQLonSQL
ToroDB: Community Response
ToroDB @NoSQLonSQL
ToroDB: Roadmap
●
Current Developer Preview is
single-node
●
Version 1.0:
➔
Expected Q4 2015
➔
Production-ready
➔
MongoDB Replication support
➔
Very high compatibility with Mongo API
ToroDB @NoSQLonSQL
ToroDB: Development priorities
#1 Offer MongoDB-like experience on
top of existing IT infrastructure, like
relational databases and app servers
#2 Go beyond current MongoDB
features, like in ACID and concurrency
#3 Great performance
ToroDB @NoSQLonSQL
ToroDB: Experimental research directions
●
User columnar storage (CitusDB)
●
Use Postgres-XL as a backend. This
requires us to distribute ToroDB's cache
(ehcache, Hazelcast)
●
Use pg_shard for sharding
ToroDB @NoSQLonSQL
Big Data speaking mongo:
Vertical ToroDB
What if we use CitusData's cstore to store
the JSON documents?
ToroDB @NoSQLonSQL
1.17% - 20.26% storage required,
compared to Mongo 2.6
Big Data speaking mongo:
Vertical ToroDB
Toro DB- Open-source, MongoDB-compatible database,  built on top of PostgreSQL

More Related Content

PDF
Case Studies on PostgreSQL
InMobi Technology
 
PPTX
Building Spark as Service in Cloud
InMobi Technology
 
PDF
PostgreSQL 9.5 - Major Features
InMobi Technology
 
PDF
PostgreSQL WAL for DBAs
PGConf APAC
 
PDF
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
Ontico
 
ODP
Logical replication with pglogical
Umair Shahid
 
PDF
Logical Replication in PostgreSQL - FLOSSUK 2016
Petr Jelinek
 
PDF
PostgreSQL and RAM usage
Alexey Bashtanov
 
Case Studies on PostgreSQL
InMobi Technology
 
Building Spark as Service in Cloud
InMobi Technology
 
PostgreSQL 9.5 - Major Features
InMobi Technology
 
PostgreSQL WAL for DBAs
PGConf APAC
 
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
Ontico
 
Logical replication with pglogical
Umair Shahid
 
Logical Replication in PostgreSQL - FLOSSUK 2016
Petr Jelinek
 
PostgreSQL and RAM usage
Alexey Bashtanov
 

What's hot (20)

PDF
Streaming huge databases using logical decoding
Alexander Shulgin
 
PDF
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
PDF
In-core compression: how to shrink your database size in several times
Aleksander Alekseev
 
PDF
Demystifying postgres logical replication percona live sc
Emanuel Calvo
 
PDF
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
PDF
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PostgreSQL-Consulting
 
PDF
Tuning Linux for Databases.
Alexey Lesovsky
 
PDF
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Mydbops
 
PPTX
Streaming replication in PostgreSQL
Ashnikbiz
 
PDF
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
PDF
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
PostgreSQL-Consulting
 
PDF
Really Big Elephants: PostgreSQL DW
PostgreSQL Experts, Inc.
 
PDF
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
PDF
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PDF
Как PostgreSQL работает с диском
PostgreSQL-Consulting
 
PDF
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
PDF
PostgreSQL HA
haroonm
 
PDF
Parallel Replication in MySQL and MariaDB
Mydbops
 
PDF
PostgreSQL 9.6 Performance-Scalability Improvements
PGConf APAC
 
Streaming huge databases using logical decoding
Alexander Shulgin
 
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
In-core compression: how to shrink your database size in several times
Aleksander Alekseev
 
Demystifying postgres logical replication percona live sc
Emanuel Calvo
 
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PostgreSQL-Consulting
 
Tuning Linux for Databases.
Alexey Lesovsky
 
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Mydbops
 
Streaming replication in PostgreSQL
Ashnikbiz
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
PostgreSQL-Consulting
 
Really Big Elephants: PostgreSQL DW
PostgreSQL Experts, Inc.
 
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
Как PostgreSQL работает с диском
PostgreSQL-Consulting
 
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
PostgreSQL HA
haroonm
 
Parallel Replication in MySQL and MariaDB
Mydbops
 
PostgreSQL 9.6 Performance-Scalability Improvements
PGConf APAC
 
Ad

Viewers also liked (20)

PDF
Attacking Web Proxies
InMobi Technology
 
PDF
Optimizer Hints
InMobi Technology
 
PDF
Introduction to cocoa sql mapper
mavelph
 
PDF
Building Machine Learning Pipelines
InMobi Technology
 
PPTX
Cloud Computing (CCSME 2015 talk) - mypapit
Mohammad Hafiz Cs Mypapit
 
DOCX
Available for sale
Invest Gurgaon Properties
 
PPTX
Igualdad libertad
aurora montejo
 
PPTX
Scenic royal kingdom of rajasthan tour itarnary for 9 Nights 10 Days
Rakesh Jaswal
 
PDF
Booklets
Macloi Flores
 
PPTX
CA World 2010 Wily Impact Awards - Axciom
Darrell Sandefur
 
PDF
Tools and Methodology for Research: Future of Science
Yannick Prié (Enseignement)
 
PPTX
інновації для сайта
26042013
 
PPTX
PHP Security Tips
Chris Tankersley
 
PPTX
Sanbenitofuneraria
Binder Inmobiliaria
 
PPTX
130811 高専カンファレンスin岐阜2
wata_bou7021
 
PPTX
Calidad 1 clases 081010
juanlex
 
PPT
Mobile, Mobile, Data
Tony Fish
 
POT
赛马会官方网址 SlideShare
dosroom
 
PPTX
SCHF 2012 - Fare SMM nel B2B: si può? L’esperienza SAP Italia
Freedata Labs
 
Attacking Web Proxies
InMobi Technology
 
Optimizer Hints
InMobi Technology
 
Introduction to cocoa sql mapper
mavelph
 
Building Machine Learning Pipelines
InMobi Technology
 
Cloud Computing (CCSME 2015 talk) - mypapit
Mohammad Hafiz Cs Mypapit
 
Available for sale
Invest Gurgaon Properties
 
Igualdad libertad
aurora montejo
 
Scenic royal kingdom of rajasthan tour itarnary for 9 Nights 10 Days
Rakesh Jaswal
 
Booklets
Macloi Flores
 
CA World 2010 Wily Impact Awards - Axciom
Darrell Sandefur
 
Tools and Methodology for Research: Future of Science
Yannick Prié (Enseignement)
 
інновації для сайта
26042013
 
PHP Security Tips
Chris Tankersley
 
Sanbenitofuneraria
Binder Inmobiliaria
 
130811 高専カンファレンスin岐阜2
wata_bou7021
 
Calidad 1 clases 081010
juanlex
 
Mobile, Mobile, Data
Tony Fish
 
赛马会官方网址 SlideShare
dosroom
 
SCHF 2012 - Fare SMM nel B2B: si può? L’esperienza SAP Italia
Freedata Labs
 
Ad

Similar to Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL (20)

PDF
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
Big Data Spain
 
PDF
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
PDF
High performance json- postgre sql vs. mongodb
Wei Shan Ang
 
PDF
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PDF
ArangoDB
ArangoDB Database
 
PDF
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
PDF
MongoDB vs. Postgres Benchmarks
EDB
 
PDF
Ceph Performance: Projects Leading up to Jewel
Colleen Corrice
 
PDF
Ceph Performance: Projects Leading Up to Jewel
Red_Hat_Storage
 
PPTX
Python And The MySQL X DevAPI - PyCaribbean 2019
Dave Stokes
 
PPTX
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...
Pôle Systematic Paris-Region
 
PDF
Mongodb
Thiago Veiga
 
PDF
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community
 
PDF
Mongo db transcript
foliba
 
PPTX
Oracle to Postgres Schema Migration Hustle
EDB
 
PDF
Quick overview on mongo db
Eman Mohamed
 
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
PPT
MongoDB Pros and Cons
johnrjenson
 
PDF
Grails and Neo4j
darthvader42
 
PDF
GR8Conf 2011: Neo4j Plugin
GR8Conf
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
Big Data Spain
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
High performance json- postgre sql vs. mongodb
Wei Shan Ang
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
MongoDB vs. Postgres Benchmarks
EDB
 
Ceph Performance: Projects Leading up to Jewel
Colleen Corrice
 
Ceph Performance: Projects Leading Up to Jewel
Red_Hat_Storage
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Dave Stokes
 
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...
Pôle Systematic Paris-Region
 
Mongodb
Thiago Veiga
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community
 
Mongo db transcript
foliba
 
Oracle to Postgres Schema Migration Hustle
EDB
 
Quick overview on mongo db
Eman Mohamed
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
MongoDB Pros and Cons
johnrjenson
 
Grails and Neo4j
darthvader42
 
GR8Conf 2011: Neo4j Plugin
GR8Conf
 

More from InMobi Technology (20)

PDF
Ensemble Methods for Algorithmic Trading
InMobi Technology
 
PPTX
Backbone & Graphs
InMobi Technology
 
PDF
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
PPTX
Reflective and Stored XSS- Cross Site Scripting
InMobi Technology
 
PDF
Introduction to Threat Modeling
InMobi Technology
 
PDF
HTTP Basics Demo
InMobi Technology
 
PDF
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
PPTX
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
 
PPTX
Security News Bytes Null Dec Meet Bangalore
InMobi Technology
 
PPTX
Matriux blue
InMobi Technology
 
PPTX
PCI DSS v3 - Protecting Cardholder data
InMobi Technology
 
PDF
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
PPTX
Shodan- That Device Search Engine
InMobi Technology
 
PPTX
Big Data BI Simplified
InMobi Technology
 
PDF
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
InMobi Technology
 
PPTX
Tez Data Processing over Yarn
InMobi Technology
 
PDF
Building Audience Analytics Platform
InMobi Technology
 
PPTX
Big Data and User Segmentation in Mobile Context
InMobi Technology
 
PDF
Freedom Hack Report 2014
InMobi Technology
 
PPTX
Hadoop fundamentals
InMobi Technology
 
Ensemble Methods for Algorithmic Trading
InMobi Technology
 
Backbone & Graphs
InMobi Technology
 
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
Reflective and Stored XSS- Cross Site Scripting
InMobi Technology
 
Introduction to Threat Modeling
InMobi Technology
 
HTTP Basics Demo
InMobi Technology
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
 
Security News Bytes Null Dec Meet Bangalore
InMobi Technology
 
Matriux blue
InMobi Technology
 
PCI DSS v3 - Protecting Cardholder data
InMobi Technology
 
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
Shodan- That Device Search Engine
InMobi Technology
 
Big Data BI Simplified
InMobi Technology
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
InMobi Technology
 
Tez Data Processing over Yarn
InMobi Technology
 
Building Audience Analytics Platform
InMobi Technology
 
Big Data and User Segmentation in Mobile Context
InMobi Technology
 
Freedom Hack Report 2014
InMobi Technology
 
Hadoop fundamentals
InMobi Technology
 

Recently uploaded (20)

PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Shreyas_Phanse_Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
SHREYAS PHANSE
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PPTX
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Software Development Methodologies in 2025
KodekX
 
Shreyas_Phanse_Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
SHREYAS PHANSE
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Doc9.....................................
SofiaCollazos
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

  • 3. ToroDB @NoSQLonSQL About *8Kdata* ● Research & Development in databases ● Consulting, Training and Support in PostgreSQL ● Founders of PostgreSQL España, 3rd largest PUG in the world (>400 members as of today) ● About myself: CTO at 8Kdata: @ahachete http://linkd.in/1jhvzQ3 www.8kdata.com
  • 4. ToroDB @NoSQLonSQL ToroDB in one slide ● Document-oriented, JSON, NoSQL db ● Open source (AGPL) ● MongoDB compatibility (wire protocol level) ● Uses PostgreSQL as a storage backend
  • 5. ToroDB @NoSQLonSQL Why relational databases: technical perspective ● Document model is very appealing to many. But all dbs started from scratch ● DRY: why not use relational databases? They are proven, durable, concurrent and flexible ● Why not base it on relational databases, like PostgreSQL?
  • 7. ToroDB @NoSQLonSQL ToroDB storage ● Data is stored in tables. No blobs ● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately ● Subdocuments are classified by “type”. Each “type” maps to a different table
  • 8. ToroDB @NoSQLonSQL ToroDB storage (II) ● A “structure” table keeps the subdocument “schema” ● Keys in JSON are mapped to attributes, which retain the original name ● Tables are created dinamically and transparently to match the exact types of the documents
  • 9. ToroDB @NoSQLonSQL ToroDB storage internals { "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  • 10. ToroDB @NoSQLonSQL ToroDB storage internals The document is split into the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  • 11. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.t_3 ┌─────┬───────┬────────────────────────────┬────────┐ │ did │ index │ _id │ name │ ├─────┼───────┼────────────────────────────┼────────┤ │ 0 │ ¤ │ x5451a07de7032d23a908576d │ ToroDB │ └─────┴───────┴────────────────────────────┴────────┘ select * from demo.t_1 ┌─────┬───────┬────┬──────────────┐ │ did │ index │ a │ b │ ├─────┼───────┼────┼──────────────┤ │ 0 │ ¤ │ 42 │ hello world! │ │ 0 │ 1 │ 21 │ hello │ └─────┴───────┴────┴──────────────┘ select * from demo.t_2 ┌─────┬───────┬────┐ │ did │ index │ j │ ├─────┼───────┼────┤ │ 0 │ ¤ │ 42 │ └─────┴───────┴────┘
  • 12. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.structures ┌─────┬────────────────────────────────────────────────────────────────────────────┐ │ sid │ _structure │ ├─────┼────────────────────────────────────────────────────────────────────────────┤ │ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │ └─────┴────────────────────────────────────────────────────────────────────────────┘ select * from demo.root; ┌─────┬─────┐ │ did │ sid │ ├─────┼─────┤ │ 0 │ 0 │ └─────┴─────┘
  • 13. ToroDB @NoSQLonSQL ToroDB storage and I/O savings 29% - 68% storage required, compared to Mongo 2.6
  • 14. ToroDB @NoSQLonSQL The software ToroDB is written in Java, compatible with versions 6 and above. It has been tested on Oracle's VM, but we will also test and verify it on Azul's VM. It is currently a standalone JAR file but will also be offered as an EAR, to easily deploy to application servers.
  • 16. ToroDB @NoSQLonSQL Going beyond MongoDB MongoDB brought the document model and several features that many love. But can we go further than that? Can't the foundation of relational databases provide a basis for offering new features on a NoSQL, document-like, JSON database?
  • 17. ToroDB @NoSQLonSQL Going beyond MongoDB ● Avoid schema repetition. Query-by-type ● Cheap single-node durability ● “Clean” reads ● Atomic bulk operations ● Highest concurrency
  • 18. ToroDB @NoSQLonSQL The schema-less fallacy { “name”: “Álvaro”, “surname”: “Hernández”, “height”: 200, “hobbies”: [ “PostgreSQL”, “triathlon” ] }
  • 19. ToroDB @NoSQLonSQL The schema-less fallacy { “name”: “Álvaro”, “surname”: “Hernández”, “height”: 200, “hobbies”: [ “PostgreSQL”, “triathlon” ] } metadata → Isn't that... schema?
  • 20. ToroDB @NoSQLonSQL The schema-less fallacy: BSON metadata → Isn't that... schema? { “name”: (string) “Álvaro”, “surname”: (string) “Hernández”, “height”: (number) 200, “hobbies”: { “0”: (string) “PostgreSQL” , “1”: (string) “triathlon” } }
  • 21. ToroDB @NoSQLonSQL The schema-less fallacy ● It's not schema-less ● It is “attached-schema” ● It carries an overhead which is not 0
  • 22. ToroDB @NoSQLonSQL Schema-attached repetition { “a”: 1, “b”: 2 } { “a”: 3 } { “a”: 4, “c”: 5 } { “a”: 6, “b”: 7 } { “b”: 8 } { “a”: 9, “b”: 10 } { “a”: 11, “b”: 12, “j”: 13 } { “a”: 14, “c”: 15 } Counting “document types” in collections of millions: at most, 1000s of different types
  • 23. ToroDB @NoSQLonSQL Schema-attached repetition How data is stored in schema-less
  • 24. ToroDB @NoSQLonSQL This is how we store in ToroDB
  • 25. ToroDB @NoSQLonSQL ToroDB: query “by structure” ● ToroDB is effectively partitioning by type ● Structures (schemas, partitioning types) are cached in ToroDB memory ● Queries only scan a subset of the data. ● Negative queries are served directly from memory.
  • 26. ToroDB @NoSQLonSQL Cheap single-node durability ● Without journaling, MongoDB is not durable nor crash-safe ● MongoDB requires “j: true” for true single-node durability. But who guarantees its consistent usage? Who uses it by default? j:true creates I/O storms equivalent to SQL CHECKPOINTs
  • 28. ToroDB @NoSQLonSQL “Clean” reads http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.” “Other database systems refer to these isolation semantics as read uncommitted.”
  • 29. ToroDB @NoSQLonSQL “Clean” reads Thus, MongoDB suffers from dirty reads. Or probably better called “tainted reads”. What about $snapshot? Nope: “The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.” http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors
  • 30. ToroDB @NoSQLonSQL ToroDB: going beyond MongoDB ● Cheap single-node durability PostgreSQL is 100% durable. Always. And it's cheap (doesn't do I/O storms) ● “Clean” reads Cursors in ToroDB run in repeatable read, read-only mode: globalCursorDataSource.setTransactionIsolation("TRANSACTIO N_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);
  • 31. ToroDB @NoSQLonSQL Atomic operations ● There is no support for atomic bulk insert/update/delete operations ● Not even with $isolated: “Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.” http://docs.mongodb.org/manual/reference/operator/update/isolated/
  • 32. ToroDB @NoSQLonSQL High concurrency ● MMAPv1 is still collection-locked ● WiredTiger is document-locked ● But still exclusive locks (MMAP). Most relational databases have MVCC, which means almost conflict-free readers and writers at the same time
  • 33. ToroDB @NoSQLonSQL ● Atomic bulk operations By default, bulk operations in ToroDB are atomic. Use flag ContinueOnError: 1 to perform non-atomic bulk operations ● Highest concurrency PostgreSQL uses MVCC. Readers and writers do not block each other. Writers block writers only for the same record ToroDB: going beyond MongoDB
  • 34. ToroDB @NoSQLonSQL ToroDB: Developer Preview ● ToroDB launched on October 2014, as a Developer Preview. Support for CRUD and most of the SELECT API ● github.com/torodb ● RERO policy. Comments, feedback, patches... greatly appreciated ● AGPLv3
  • 35. ToroDB @NoSQLonSQL ToroDB: Developer Preview ● Clone the repo, build with Maven ● Or download the JAR: http://maven.torodb.com/jar/com/torodb/torodb/ 0.20/torodb.jar ● Usage: java -jar torodb-0.20.jar –help java -jar torodb-0.20.jar -d dbname -u dbuser -P 27017 Connect with normal mongo console!
  • 38. ToroDB @NoSQLonSQL ToroDB: Roadmap ● Current Developer Preview is single-node ● Version 1.0: ➔ Expected Q4 2015 ➔ Production-ready ➔ MongoDB Replication support ➔ Very high compatibility with Mongo API
  • 39. ToroDB @NoSQLonSQL ToroDB: Development priorities #1 Offer MongoDB-like experience on top of existing IT infrastructure, like relational databases and app servers #2 Go beyond current MongoDB features, like in ACID and concurrency #3 Great performance
  • 40. ToroDB @NoSQLonSQL ToroDB: Experimental research directions ● User columnar storage (CitusDB) ● Use Postgres-XL as a backend. This requires us to distribute ToroDB's cache (ehcache, Hazelcast) ● Use pg_shard for sharding
  • 41. ToroDB @NoSQLonSQL Big Data speaking mongo: Vertical ToroDB What if we use CitusData's cstore to store the JSON documents?
  • 42. ToroDB @NoSQLonSQL 1.17% - 20.26% storage required, compared to Mongo 2.6 Big Data speaking mongo: Vertical ToroDB