SlideShare a Scribd company logo
Alexei Krasner
Nov 2015
PostgreSQL as MSSQL
Alternative
What is PostgreSQL
▪ Powerful, open source object-relational database system.
▪ 15 years of active development and strong reputation.
▪ Runs on all major operating systems (Linux, Unix, Mac OS,
Windows…).
▪ Enterprise class database.
▪ Large and responsive community.
▪ Winner of the 2015 Database Trends and Applications Readers
Choice:
– The most advanced open source database.
– Best relational database.
Lets Start With Standards
▪ Fully ACID compliant.
▪ Includes most of SQL:2008 data types along with storage of
binary objects.
▪ Conforms to the ANSI-SQL:2008 standard:
– Full support for subqueries (including sub-selects).
– Read-Committed and serializable transaction isolation levels.
– Full support for Primary keys, Foreign Keys, Joins, Views, Triggers, Stored
Procedures, Restrictions (check, unique and not null) and Cascading.
– Fully relational system catalog – multiple schema per database.
▪ Native programming interfaces: Java, .NET, C/C++, Perl,
Python, ODBC
Continue With a Little of Splurging
▪ Multi-Version Concurrency Control (MVCC).
▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with Point
in Time Recovery.
▪ Write Ahead Logging – fault tolerance.
▪ Performance:
– Sophisticated Query Planner/Optimizer.
– Compound, Unique, Partial and functional indexes.
▪ Supports:
– International character sets, multi-byte encodings, Unicode, locale awareness.
– Built-in Types – Geospatial, XML, JSONJSONB, Ranges and Arrays!
– NoSQL – Key-Value store with incredible performance and Full Text Search.
▪ Highly customizable and extensible.
Before We Dive – Generalized Search Tree (GiST)
▪ Advanced indexing system – different sorting and searching
algorithms:
– B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc.
– API for creating custom data types and extensible query methods for
search.
▪ Decide WHAT to persist, HOW to persist and a way to SEARCH
for it.
▪ Exceeds the general search algorithms using standard BR-
trees.
▪ Foundation for many public projects – OpenFTS and PostGIS
Features
Deep Dive
▪ MVCC
▪ Partitioning
▪ Useful Data Types
– Date and Time
– Interval
– Array
– Ranges
– JSON
– HSTORE
– XML
▪ PostGIS –
Geographic
▪ Full Text Search
▪ Server Side
Programming
▪ Backup and Restore
▪ High Availability,
Load Balancing and
Replication
– Sharding
▪ Big Data Readiness
Multi Version Concurrency Control - MVCC
▪ Reads should never block writes and vice
versa.
▪ Each transaction sees a snapshot of data
(version).
– Protection from viewing inconsistency –
transaction isolation.
▪ Avoidance of explicit locking solutions –
minimize lock contention.
▪ TableRow level locking mechanism is still
available – although proper MVCC usage
will provide performance benefits.
Partitioning – Table Inheritance
▪ Support of basic table partitioning via the table inheritance
concept.
– Includes known partitioning benefits:
▪ Improved heavy load query performance (on a single partition).
▪ Sequential scan of a partition instead of index usage.
▪ Bulk loads and deletes accomplished by adding or removing partitions.
▪ Infrequent data can be migrated to a cheaperslower storage solution.
– Range Partitioning:
▪ Table partitioned into “ranges” defined by a singleset key column (e.g.
dates).
– List Partitioning:
▪ Table partitioned into a list of discrete values as partitioning keys.
– Hundred partitions is an acceptable limit, thousands of partitions will
crucially harm performance.
Useful Data Types
▪ Date and Time – Date, Time, TimeStamp and TimeStamp with
zone.
– Converted to and from Unix time.
– Supports the INTERVAL type.
– Very convenient casting and conversion to text.
– Performance wise searching and sorting algorithms (including
zoneoffset).
▪ INTERVAL – representation of a period of time.
– Possible negative interval values (e.g. year ago).
– Intuitive arithmetic and persistence of time durations
– Easy casting and converting to relevant types.
– Performance wise searching and sorting algorithms on intervals.
Useful Data Types Cont.
▪ Array – supported as first-class datatype (actual field in a
table).
– Contain any datatype (sub arrays too).
– Parameters to functions as an array.
– Usages – Functions results, aggregations, getset array of data infrom
the application.
▪ Range – Supported as first-class datatype.
– Put range on TIME, INT or NUMERIC as a single data value.
– Possible dedicated indexes to support queries utilizing ranges.
– Exposed methods to define custom ranges.
Useful Data Types Cont.
▪ JSON – full support along with large dedicated set of utility
functions.
– Known JSONJSONB benefits – data transfer and integration standard.
– Transformation fromto types and tables.
– Retrieval and construction of JSON data.
– Parsing, casting and conversion.
▪ HSTORE – Fast key-value store as a datatype.
– NoSQL capabilities – flexibility of schema-less data store.
– Still ACID compliant.
– Interchange data between JSON and HSTORE.
Useful Data Types Cont.
▪ XML – Supported as a first-class datatype.
– Check well formedness + type-safe operations.
– Querying using Xpath.
– Producing XML content, Predicates, Processing, Mapping tables to XML
etc.
PostGIS
▪ Fully featured, reliable geospatial database project base on GiST
(Following ISO OGC)
▪ SQL types and functions to manage vector geometries (spatial data).
▪ Capabilities:
– Support for three dimensional data.
– Support for geospatial formats (KML, GeoJSON)
– Processing and analytics functions for vector and raster data.
– Map “rastering” and geo queries.
– Geo searches and reverse geo searches.
▪ Huge popularity and respect extension module – compered to ArcGIS
Full Text Search
▪ Online indexing of data and relevance ranking for database
searches.
▪ Good Enough:
– Stemming
– Ranking
– Multilingual
– Fuzzy searches (misspelling) Accent.
Server Side Programming
▪ Super Extensible – functions, data types, procedural
languages, operators, aggregates etc.
– Embedding Functions and Stored Procedures using procedural
– PL/pgSQL, PL/Tcl, PL/Perl, PL/Python
▪ Triggers – tables, views and foreign tables.
▪ Event Triggers – database global trigger.
▪ Rule System – Query modification based on given rules.
Backup and Restore
▪ Extremely flexible dump utility – migration, replication and
backups becomes more reliable, controllable and
configurable.
– Compressed format or plain SQL (human readable).
– Single table or whole database cluster.
▪ Approaches:
– SQL Dump – file with generated SQL commands. On restore the backed
up commands will be replayed.
– File system level backup – direct copy of PostgreSQL data files. Restore
will include reattaching the data files.
– Continuous archiving – backing up Write Ahead Log (WAL) files. On
restore log commands will be replayed.
High Availability, Load Balancing and Replication
Feature Shared Disk Failover
File System
Replication
Transaction Log
Shipping
Trigger-Based
Master-Standby
Replication
Statement-Based
Replication
Middleware
Asynchronous
Multimaster
Replication
Synchronous
Multimaster
Replication
Most Common
Implementation
NAS DRBD Streaming Repl. Slony pgpool-II Bucardo
Communication
Method
shared disk disk blocks WAL table rows SQL table rows
table rows and row
locks
No special hardware
required
X X X X X X
Allows multiple
master servers
X X X
No master server
overhead
X X X
No waiting for
multiple servers
X with sync off X X
Master failure will
never lose data
X X with sync on X X
Standby accept
read-only queries
with hot X X X X
Per-table granularity X X X
No conflict
resolution necessary
X X X X X
Sharding and Replication
▪ Pure Sharding:
– pg_shard – popular sharding extension for PostgreSQL.
▪ Running on Linux!
– BDR/UDR Project – Bi-Directional Replication which adds multi-master
replication to PostgreSQL.
▪ Running on Linux! Migration to windows only in a non-near future.
▪ Forked of the main PostgreSQL source.
– Postgres-XL – all purpose fully ACID open source scale-out db solution.
▪ Running on Linux!
▪ Forked of the main PostgreSQL source.
Sharding and Replication Cont.
▪ Via Replication:
– Hot Standby – Reducing read loads from Master to slaves (horizontal
scale).
– Streaming (or Bucardo, or other possible option) replication to slaves.
– Load balancing “write” queries to Master, “read” queries to slaves.
PostgreSQL and Big Data
▪ PostgreSQL was used a decade before Hadoop launched, for large
data volumes and complex analytics (as the only pure open source).
▪ Today heavily used in mid-sized warehouses and data-marts (1-10
TB).
▪ Source of code for many big data systems:
– Netezza (IBM).
– Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse.
– PipelineDB – open source, run SQL queries continuously on streaming data.
– EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres.
– Redshift (Amazon).
▪ PostgreSQL project continuously provide new features and better
performance to support big data usage.
PostgreSQL and Big Data – Features
▪ Serious NoSQL database competitor.
– JSONB advanced features and ongoing massive development plan .
– Extensions that provide NoSQL like API.
▪ Faster Sorts – text and long numeric sorting improvements.
▪ TABLESAMPLE – result set of pseudo-random number of rows
to provide a data glimpse for further analysis.
▪ Cubes, Rollups and Grouping Sets – summarizing and
exploring huge data sets in the OLAP way.
▪ BRIN indexes – much faster, suits for TBs size tables on
incrementally increasing value fields (like timestamps or
integers).
PostgreSQL and Big Data – Features Cont.
▪ Foreign Data Wrappers – linking external data (for querying
like local) for hybrid solutions.
– Foreign schema import.
– JOIN pushdowns
▪ Vacuum (garbage collection – deleting) – became parallel with
multi-process mode (maintaining several large tables at once).
▪ Scaling UP – Multicore scalability improvements.
Enterprise
Wise
▪ Open Source
▪ Reliability
▪ Authentication
▪ Logging
▪ Documentation
▪ Support
▪ Maintenance
Open Source
▪ Available under the open source license – PostgreSQL
License.
▪ Using, modifying and distributing in any openclose form.
▪ Extending and patching the relational database per
projectclient etc.
▪ Variety of modules, extensions and tools based on its open
source license.
Reliability
▪ PostgreSQL is relatively bug-free (compared to MSSQL).
▪ Very large community reporting, fixingworkarounds bugs.
▪ Constantly growing community
Authentication
▪ Trust Authentication.
▪ Password Authentication.
▪ GSSAPISSPI Authentication – using Kerberos.
▪ Ident Authentication.
▪ Peer Authentication.
▪ LDAP Authentication
▪ RADIUS Authentication.
▪ Certificate Authentication.
▪ Pluggable Authentication Modules.
Logging
▪ Logs in one place.
– Unlike MSSQL – error logs, event log, profiler log, agent log…
▪ Easily configurable logging level.
▪ Easily redirect to CSV files and shipped to tables.
▪ Easily redirect to System Log, Windows Event Log.
▪ Logs are human readable with a great sysadmin value.
Documentation
▪ There is nothing more to add than a link:
http://www.postgresql.org/docs/
Support
▪ Community based support – seems like a fast one too.
▪ Numerous companies specialized in enterprise support:
http://www.postgresql.org/support/professional_support/
▪ Enterprise database management companies like:
EnterpriseDB
▪ Total Cost of Ownership is significantly lower even with
enterprise support. (Based on reports. e.g. Gartner 2015).
vs.
MySQL
▪ ACID fully! compliant.
▪ Subqueries and Joins.
▪ Better locking mechanism.
▪ JSONJSONB support.
▪ NoSQL and Key-Value store.
▪ Advanced GIS abilities.
▪ Full Text Search abilities.
▪ Advanced and attractive data types.
▪ Way better and useful extensibility patterns.
▪ Licensing issues.
vs.
PostgreSQL
▪ Partitioning based on table inheritance
(Pros. and Cons.)
▪ Can be an overkill in case of simple read-
heavy operations. (Improved in newer
versions).
▪ Replication and Clustering (especially
multi-master). Not “there” yet, but on a
right track.
▪ Popularity – not as popular as MySQL (for
example) but gains popularity constantly,
as opposite to MySQL.
▪ Expertise issues – different syntax and
administration (compared to MSSQL).
THANK
YOU

More Related Content

PDF
PostgreSQL and MySQL
PostgreSQL Experts, Inc.
 
PDF
Converting from MySQL to PostgreSQL
John Ashmead
 
PDF
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
anandology
 
PPTX
When is MyRocks good?
Alkin Tezuysal
 
PDF
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
eLiberatica
 
PDF
Migrating to postgresql
botsplash.com
 
PDF
MySQL Performance - Best practices
Ted Wennmark
 
ODP
Introduction to PostgreSQL
Jim Mlodgenski
 
PostgreSQL and MySQL
PostgreSQL Experts, Inc.
 
Converting from MySQL to PostgreSQL
John Ashmead
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
anandology
 
When is MyRocks good?
Alkin Tezuysal
 
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
eLiberatica
 
Migrating to postgresql
botsplash.com
 
MySQL Performance - Best practices
Ted Wennmark
 
Introduction to PostgreSQL
Jim Mlodgenski
 

What's hot (20)

PDF
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Ted Wennmark
 
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
PDF
01 upgrade to my sql8
Ted Wennmark
 
PPTX
High performance and high availability proxies for MySQL
Mydbops
 
PPTX
MyDUMPER : Faster logical backups and restores
Mydbops
 
PDF
MySQL HA
Ted Wennmark
 
PDF
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
PDF
Galera cluster for high availability
Mydbops
 
PDF
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
M Malai
 
PPTX
MariaDB Galera Cluster
Abdul Manaf
 
PDF
MySQL 5.6 Performance
MYXPLAIN
 
PDF
InnoDB Cluster Experience (MySQL User Camp)
Mydbops
 
PDF
PGDay.Seoul 2016 lightingtalk
hyeongchae lee
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PDF
NoSQL databases
Marin Dimitrov
 
PDF
What’s New In PostgreSQL 9.3
Pavan Deolasee
 
PDF
MyRocks in MariaDB: why and how
Sergey Petrunya
 
PPT
Fudcon talk.ppt
webhostingguy
 
PDF
What's New in PostgreSQL 9.6
EDB
 
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Ted Wennmark
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
01 upgrade to my sql8
Ted Wennmark
 
High performance and high availability proxies for MySQL
Mydbops
 
MyDUMPER : Faster logical backups and restores
Mydbops
 
MySQL HA
Ted Wennmark
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
Galera cluster for high availability
Mydbops
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
M Malai
 
MariaDB Galera Cluster
Abdul Manaf
 
MySQL 5.6 Performance
MYXPLAIN
 
InnoDB Cluster Experience (MySQL User Camp)
Mydbops
 
PGDay.Seoul 2016 lightingtalk
hyeongchae lee
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
NoSQL databases
Marin Dimitrov
 
What’s New In PostgreSQL 9.3
Pavan Deolasee
 
MyRocks in MariaDB: why and how
Sergey Petrunya
 
Fudcon talk.ppt
webhostingguy
 
What's New in PostgreSQL 9.6
EDB
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Ad

Viewers also liked (6)

PDF
Lightening Talk - PostgreSQL Worst Practices
PGConf APAC
 
PDF
Lessons PostgreSQL learned from commercial databases, and didn’t
PGConf APAC
 
PDF
PostgreSQL on Amazon RDS
PGConf APAC
 
PDF
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
PDF
Lammasi pitch deck
Mladen Markovic
 
PDF
PostgreSQL WAL for DBAs
PGConf APAC
 
Lightening Talk - PostgreSQL Worst Practices
PGConf APAC
 
Lessons PostgreSQL learned from commercial databases, and didn’t
PGConf APAC
 
PostgreSQL on Amazon RDS
PGConf APAC
 
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
Lammasi pitch deck
Mladen Markovic
 
PostgreSQL WAL for DBAs
PGConf APAC
 
Ad

Similar to PostgreSQL as an Alternative to MSSQL (20)

PDF
Gcp data engineer
Narendranath Reddy T
 
PDF
Drill architecture 20120913
jasonfrantz
 
PDF
GCP Data Engineer cheatsheet
Guang Xu
 
PPTX
Nosql databases
Fayez Shayeb
 
PPTX
An AMIS Overview of Oracle database 12c (12.1)
Marco Gralike
 
PDF
Hoodie - DataEngConf 2017
Vinoth Chandar
 
PDF
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Ashnikbiz
 
PDF
Sql Server2008
Microsoft Iceland
 
PDF
An AMIS overview of database 12c
Getting value from IoT, Integration and Data Analytics
 
PDF
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
PPTX
Master.pptx
KarthikR780430
 
PPTX
No sql databases
swathika rajan
 
PPTX
ElasticSearch as (only) datastore
Tomas Sirny
 
PPTX
NoSQL.pptx
RithikRaj25
 
PPTX
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
PPT
Java Developers, make the database work for you (NLJUG JFall 2010)
Lucas Jellema
 
PDF
Introduction to ClustrixDB
I Goo Lee
 
PPTX
This is training for spark SQL essential
Sudesh64
 
Gcp data engineer
Narendranath Reddy T
 
Drill architecture 20120913
jasonfrantz
 
GCP Data Engineer cheatsheet
Guang Xu
 
Nosql databases
Fayez Shayeb
 
An AMIS Overview of Oracle database 12c (12.1)
Marco Gralike
 
Hoodie - DataEngConf 2017
Vinoth Chandar
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Ashnikbiz
 
Sql Server2008
Microsoft Iceland
 
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Master.pptx
KarthikR780430
 
No sql databases
swathika rajan
 
ElasticSearch as (only) datastore
Tomas Sirny
 
NoSQL.pptx
RithikRaj25
 
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Lucas Jellema
 
Introduction to ClustrixDB
I Goo Lee
 
This is training for spark SQL essential
Sudesh64
 

Recently uploaded (20)

PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PDF
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
TestNG for Java Testing and Automation testing
ssuser0213cb
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PDF
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
TestNG for Java Testing and Automation testing
ssuser0213cb
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Presentation about variables and constant.pptx
kr2589474
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Exploring AI Agents in Process Industries
amoreira6
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 

PostgreSQL as an Alternative to MSSQL

  • 1. Alexei Krasner Nov 2015 PostgreSQL as MSSQL Alternative
  • 2. What is PostgreSQL ▪ Powerful, open source object-relational database system. ▪ 15 years of active development and strong reputation. ▪ Runs on all major operating systems (Linux, Unix, Mac OS, Windows…). ▪ Enterprise class database. ▪ Large and responsive community. ▪ Winner of the 2015 Database Trends and Applications Readers Choice: – The most advanced open source database. – Best relational database.
  • 3. Lets Start With Standards ▪ Fully ACID compliant. ▪ Includes most of SQL:2008 data types along with storage of binary objects. ▪ Conforms to the ANSI-SQL:2008 standard: – Full support for subqueries (including sub-selects). – Read-Committed and serializable transaction isolation levels. – Full support for Primary keys, Foreign Keys, Joins, Views, Triggers, Stored Procedures, Restrictions (check, unique and not null) and Cascading. – Fully relational system catalog – multiple schema per database. ▪ Native programming interfaces: Java, .NET, C/C++, Perl, Python, ODBC
  • 4. Continue With a Little of Splurging ▪ Multi-Version Concurrency Control (MVCC). ▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with Point in Time Recovery. ▪ Write Ahead Logging – fault tolerance. ▪ Performance: – Sophisticated Query Planner/Optimizer. – Compound, Unique, Partial and functional indexes. ▪ Supports: – International character sets, multi-byte encodings, Unicode, locale awareness. – Built-in Types – Geospatial, XML, JSONJSONB, Ranges and Arrays! – NoSQL – Key-Value store with incredible performance and Full Text Search. ▪ Highly customizable and extensible.
  • 5. Before We Dive – Generalized Search Tree (GiST) ▪ Advanced indexing system – different sorting and searching algorithms: – B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc. – API for creating custom data types and extensible query methods for search. ▪ Decide WHAT to persist, HOW to persist and a way to SEARCH for it. ▪ Exceeds the general search algorithms using standard BR- trees. ▪ Foundation for many public projects – OpenFTS and PostGIS
  • 6. Features Deep Dive ▪ MVCC ▪ Partitioning ▪ Useful Data Types – Date and Time – Interval – Array – Ranges – JSON – HSTORE – XML ▪ PostGIS – Geographic ▪ Full Text Search ▪ Server Side Programming ▪ Backup and Restore ▪ High Availability, Load Balancing and Replication – Sharding ▪ Big Data Readiness
  • 7. Multi Version Concurrency Control - MVCC ▪ Reads should never block writes and vice versa. ▪ Each transaction sees a snapshot of data (version). – Protection from viewing inconsistency – transaction isolation. ▪ Avoidance of explicit locking solutions – minimize lock contention. ▪ TableRow level locking mechanism is still available – although proper MVCC usage will provide performance benefits.
  • 8. Partitioning – Table Inheritance ▪ Support of basic table partitioning via the table inheritance concept. – Includes known partitioning benefits: ▪ Improved heavy load query performance (on a single partition). ▪ Sequential scan of a partition instead of index usage. ▪ Bulk loads and deletes accomplished by adding or removing partitions. ▪ Infrequent data can be migrated to a cheaperslower storage solution. – Range Partitioning: ▪ Table partitioned into “ranges” defined by a singleset key column (e.g. dates). – List Partitioning: ▪ Table partitioned into a list of discrete values as partitioning keys. – Hundred partitions is an acceptable limit, thousands of partitions will crucially harm performance.
  • 9. Useful Data Types ▪ Date and Time – Date, Time, TimeStamp and TimeStamp with zone. – Converted to and from Unix time. – Supports the INTERVAL type. – Very convenient casting and conversion to text. – Performance wise searching and sorting algorithms (including zoneoffset). ▪ INTERVAL – representation of a period of time. – Possible negative interval values (e.g. year ago). – Intuitive arithmetic and persistence of time durations – Easy casting and converting to relevant types. – Performance wise searching and sorting algorithms on intervals.
  • 10. Useful Data Types Cont. ▪ Array – supported as first-class datatype (actual field in a table). – Contain any datatype (sub arrays too). – Parameters to functions as an array. – Usages – Functions results, aggregations, getset array of data infrom the application. ▪ Range – Supported as first-class datatype. – Put range on TIME, INT or NUMERIC as a single data value. – Possible dedicated indexes to support queries utilizing ranges. – Exposed methods to define custom ranges.
  • 11. Useful Data Types Cont. ▪ JSON – full support along with large dedicated set of utility functions. – Known JSONJSONB benefits – data transfer and integration standard. – Transformation fromto types and tables. – Retrieval and construction of JSON data. – Parsing, casting and conversion. ▪ HSTORE – Fast key-value store as a datatype. – NoSQL capabilities – flexibility of schema-less data store. – Still ACID compliant. – Interchange data between JSON and HSTORE.
  • 12. Useful Data Types Cont. ▪ XML – Supported as a first-class datatype. – Check well formedness + type-safe operations. – Querying using Xpath. – Producing XML content, Predicates, Processing, Mapping tables to XML etc.
  • 13. PostGIS ▪ Fully featured, reliable geospatial database project base on GiST (Following ISO OGC) ▪ SQL types and functions to manage vector geometries (spatial data). ▪ Capabilities: – Support for three dimensional data. – Support for geospatial formats (KML, GeoJSON) – Processing and analytics functions for vector and raster data. – Map “rastering” and geo queries. – Geo searches and reverse geo searches. ▪ Huge popularity and respect extension module – compered to ArcGIS
  • 14. Full Text Search ▪ Online indexing of data and relevance ranking for database searches. ▪ Good Enough: – Stemming – Ranking – Multilingual – Fuzzy searches (misspelling) Accent.
  • 15. Server Side Programming ▪ Super Extensible – functions, data types, procedural languages, operators, aggregates etc. – Embedding Functions and Stored Procedures using procedural – PL/pgSQL, PL/Tcl, PL/Perl, PL/Python ▪ Triggers – tables, views and foreign tables. ▪ Event Triggers – database global trigger. ▪ Rule System – Query modification based on given rules.
  • 16. Backup and Restore ▪ Extremely flexible dump utility – migration, replication and backups becomes more reliable, controllable and configurable. – Compressed format or plain SQL (human readable). – Single table or whole database cluster. ▪ Approaches: – SQL Dump – file with generated SQL commands. On restore the backed up commands will be replayed. – File system level backup – direct copy of PostgreSQL data files. Restore will include reattaching the data files. – Continuous archiving – backing up Write Ahead Log (WAL) files. On restore log commands will be replayed.
  • 17. High Availability, Load Balancing and Replication Feature Shared Disk Failover File System Replication Transaction Log Shipping Trigger-Based Master-Standby Replication Statement-Based Replication Middleware Asynchronous Multimaster Replication Synchronous Multimaster Replication Most Common Implementation NAS DRBD Streaming Repl. Slony pgpool-II Bucardo Communication Method shared disk disk blocks WAL table rows SQL table rows table rows and row locks No special hardware required X X X X X X Allows multiple master servers X X X No master server overhead X X X No waiting for multiple servers X with sync off X X Master failure will never lose data X X with sync on X X Standby accept read-only queries with hot X X X X Per-table granularity X X X No conflict resolution necessary X X X X X
  • 18. Sharding and Replication ▪ Pure Sharding: – pg_shard – popular sharding extension for PostgreSQL. ▪ Running on Linux! – BDR/UDR Project – Bi-Directional Replication which adds multi-master replication to PostgreSQL. ▪ Running on Linux! Migration to windows only in a non-near future. ▪ Forked of the main PostgreSQL source. – Postgres-XL – all purpose fully ACID open source scale-out db solution. ▪ Running on Linux! ▪ Forked of the main PostgreSQL source.
  • 19. Sharding and Replication Cont. ▪ Via Replication: – Hot Standby – Reducing read loads from Master to slaves (horizontal scale). – Streaming (or Bucardo, or other possible option) replication to slaves. – Load balancing “write” queries to Master, “read” queries to slaves.
  • 20. PostgreSQL and Big Data ▪ PostgreSQL was used a decade before Hadoop launched, for large data volumes and complex analytics (as the only pure open source). ▪ Today heavily used in mid-sized warehouses and data-marts (1-10 TB). ▪ Source of code for many big data systems: – Netezza (IBM). – Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse. – PipelineDB – open source, run SQL queries continuously on streaming data. – EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres. – Redshift (Amazon). ▪ PostgreSQL project continuously provide new features and better performance to support big data usage.
  • 21. PostgreSQL and Big Data – Features ▪ Serious NoSQL database competitor. – JSONB advanced features and ongoing massive development plan . – Extensions that provide NoSQL like API. ▪ Faster Sorts – text and long numeric sorting improvements. ▪ TABLESAMPLE – result set of pseudo-random number of rows to provide a data glimpse for further analysis. ▪ Cubes, Rollups and Grouping Sets – summarizing and exploring huge data sets in the OLAP way. ▪ BRIN indexes – much faster, suits for TBs size tables on incrementally increasing value fields (like timestamps or integers).
  • 22. PostgreSQL and Big Data – Features Cont. ▪ Foreign Data Wrappers – linking external data (for querying like local) for hybrid solutions. – Foreign schema import. – JOIN pushdowns ▪ Vacuum (garbage collection – deleting) – became parallel with multi-process mode (maintaining several large tables at once). ▪ Scaling UP – Multicore scalability improvements.
  • 23. Enterprise Wise ▪ Open Source ▪ Reliability ▪ Authentication ▪ Logging ▪ Documentation ▪ Support ▪ Maintenance
  • 24. Open Source ▪ Available under the open source license – PostgreSQL License. ▪ Using, modifying and distributing in any openclose form. ▪ Extending and patching the relational database per projectclient etc. ▪ Variety of modules, extensions and tools based on its open source license.
  • 25. Reliability ▪ PostgreSQL is relatively bug-free (compared to MSSQL). ▪ Very large community reporting, fixingworkarounds bugs. ▪ Constantly growing community
  • 26. Authentication ▪ Trust Authentication. ▪ Password Authentication. ▪ GSSAPISSPI Authentication – using Kerberos. ▪ Ident Authentication. ▪ Peer Authentication. ▪ LDAP Authentication ▪ RADIUS Authentication. ▪ Certificate Authentication. ▪ Pluggable Authentication Modules.
  • 27. Logging ▪ Logs in one place. – Unlike MSSQL – error logs, event log, profiler log, agent log… ▪ Easily configurable logging level. ▪ Easily redirect to CSV files and shipped to tables. ▪ Easily redirect to System Log, Windows Event Log. ▪ Logs are human readable with a great sysadmin value.
  • 28. Documentation ▪ There is nothing more to add than a link: http://www.postgresql.org/docs/
  • 29. Support ▪ Community based support – seems like a fast one too. ▪ Numerous companies specialized in enterprise support: http://www.postgresql.org/support/professional_support/ ▪ Enterprise database management companies like: EnterpriseDB ▪ Total Cost of Ownership is significantly lower even with enterprise support. (Based on reports. e.g. Gartner 2015).
  • 30. vs. MySQL ▪ ACID fully! compliant. ▪ Subqueries and Joins. ▪ Better locking mechanism. ▪ JSONJSONB support. ▪ NoSQL and Key-Value store. ▪ Advanced GIS abilities. ▪ Full Text Search abilities. ▪ Advanced and attractive data types. ▪ Way better and useful extensibility patterns. ▪ Licensing issues.
  • 31. vs. PostgreSQL ▪ Partitioning based on table inheritance (Pros. and Cons.) ▪ Can be an overkill in case of simple read- heavy operations. (Improved in newer versions). ▪ Replication and Clustering (especially multi-master). Not “there” yet, but on a right track. ▪ Popularity – not as popular as MySQL (for example) but gains popularity constantly, as opposite to MySQL. ▪ Expertise issues – different syntax and administration (compared to MSSQL).