0% found this document useful (0 votes)
77 views

PGSQL Oracle

PostgreSQL is similar to Oracle in supporting common table expressions, window functions, and recursive queries similarly to Oracle's CONNECT BY clause. However, PostgreSQL's procedural language PL/PgSQL is less powerful than Oracle's PL/SQL and requires rewrite of Oracle code. PostgreSQL's partitioning and optimizer also lag behind Oracle's capabilities. Porting Oracle applications to PostgreSQL requires significant changes to data models, queries, and application code.

Uploaded by

znothz
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

PGSQL Oracle

PostgreSQL is similar to Oracle in supporting common table expressions, window functions, and recursive queries similarly to Oracle's CONNECT BY clause. However, PostgreSQL's procedural language PL/PgSQL is less powerful than Oracle's PL/SQL and requires rewrite of Oracle code. PostgreSQL's partitioning and optimizer also lag behind Oracle's capabilities. Porting Oracle applications to PostgreSQL requires significant changes to data models, queries, and application code.

Uploaded by

znothz
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

PostgreSQL and Oracle

Introduction
Recently, driven by the ever more exorbitant prices of Oracle RDBMS components, many companies started looking for an alternative platform that would help them decrease the license cost of the Oracle RDBMS. Of course, as it became customary these days, the first place to look for an alternative is the open source community. Open source community maintains several open source databases, the most popular among them are MySQL and PostgreSQL. With Oracle Corp. acquisition of SUN Microsystems, Oracle also acquired MySQL. There are other versions, like MariaDB and Drizzle, but the general feeling is that MySQL is no longer a serious alternative. That leaves us with PostgreSQL. This article strives to clarify PostgreSQL strengths and weaknesses for such purpose. I will refrain from giving a verdict of the database suitability as a general Oracle replacement. It depends on the particular project and the nature of the database that is being replaced. Overall impression is that Oracle is an 18wheeler, while PostgreSQL is a half ton truck. Each is useful its own area, but one cannot realistically expect to replace one by another.

Characteristics of PostgreSQL
PostgreSQL is, in a way, very similar to Oracle. That similarity is intentional, as there are several commercial products which use that similarity for marketing purposes. The most frequent complaints of the Oracle developers, when trying out open source databases, are the problems with the CONNECT BY clause, subquery factoring and analytic functions, also known as window functions to PostgreSQL developers. PostreSQL supports subquery factoring or Common Table Expressions (CTE), known to us oraclites as WITH clause. It also supports very decent set of window functions like row_number, ntile, rank, dense_rank, lead, lag and some others. Furthermore, recursive WITH is a very good replacement for Oracle's CONNECT BY query. Incidentally, Oracle supports recursive WITH queries as of version 11.2. Below are both Oracle and PostgreSQL versions of the same recursive query:
Oracle 11.2: 1 with e(empno,mgr,lev) as ( 2 select empno,mgr,0 from emp where empno=7839 3 union all 4 select emp.empno,emp.mgr,e.lev+1 5 from emp,e 6 where emp.mgr=e.empno) 7* select * from e SQL> / EMPNO MGR LEV

7839 0 7566 7839 1 7698 7839 1 7782 7839 1 7499 7698 2 7521 7698 2 7654 7698 2 7788 7566 2 7844 7698 2 7900 7698 2 7902 7566 2 7934 7782 2 7369 7902 3 7876 7788 3 14 rows selected.

Oracle requires UNION ALL and prohibits the use of LEVEL keyword, so I replaced it by LEV. The result is essentially the same as if we did a self join with CONNECT BY prior empno= mgr. PostgreSQL can do the same thing:
PostgreSQL 9.0: mgogala=# with recursive e(empno,mgr,level) as ( mgogala(# select empno,mgr,0 from emp where empno=7839 mgogala(# union mgogala(# select emp.empno,emp.mgr,e.level+1 mgogala(# from emp,e mgogala(# where emp.mgr=e.empno) mgogala-# select * from e; empno | mgr | level -------+------+-----7839 | | 0 7566 | 7839 | 1 7698 | 7839 | 1 7782 | 7839 | 1 7499 | 7698 | 2 7521 | 7698 | 2 7654 | 7698 | 2 7788 | 7566 | 2 7844 | 7698 | 2 7900 | 7698 | 2 7902 | 7566 | 2 7934 | 7782 | 2 7369 | 7902 | 3 7876 | 7788 | 3 (14 rows) mgogala=#

PostgreSQL doesn't object to my using the word level in the query, it's not a keyword in PostgreSQL. It also doesn't require UNION ALL, like Oracle. The query is essentially the same, which simplifies porting all those proprietary CONNECT BY constructs to PostgreSQL. Similarity between the SQL syntax of Oracle and PostgreSQL is remarkable and should be of great help to Oracle developers, tasked with developing an application using PostgreSQL database.

PL/PgSQL Procedural extension


PostgreSQL procedural extension, PL/PgSQL is very similar to Oracle's own PL/SQL, but much smaller. Again, this similarity in both name and syntax is intentional. However, differences are very significant and there is no hope of using Oracle PL/SQL objects in the Postgres database. A complete rewrite is usually required. This rewrite is , however, helped to a significant degree by the similarity of the two languages. Here are some of the most notable differences. PL/PgSQL functions cannot contain COMMIT statement and procedures or packages are not supported, only functions. Until recently, when DO statement was introduced in PostgreSQL 9.0, it wasn't possible to execute an anonymous PL/PgSQL. PostgreSQL, however, is multi-lingual. It is possible to use Perl or Python as internal languages and write functions or triggers in those languages. There is no collection of administrative and utility packages like DBMS_LOB, DBMS_SPACE, UTL_FILE or UTL_MAIL. PL PgSQL is not as powerful as PL/SQL, but it can definitely be used for enforcing a complex business logic. On the other hand, coding rules are slightly different from Oracle. Below is an actual trigger function that was used for partitioning in a live, production database:
CREATE FUNCTION moreover_insert_trgfn() RETURNS trigger LANGUAGE plpgsql AS $$ DECLARE V_MONTH SMALLINT; V_YEAR VARCHAR(4); V_INS VARCHAR(512):='insert into moreover.moreover_documents_y'; V_VALS VARCHAR(128):=' VALUES($1.*)'; BEGIN SELECT EXTRACT(MONTH FROM NEW.created_at) INTO V_MONTH; SELECT EXTRACT(YEAR FROM NEW.created_at) INTO V_YEAR; IF (V_MONTH>=10) THEN V_INS:=V_INS||V_YEAR||'m'||V_MONTH||V_VALS; ELSE V_INS:=V_INS||V_YEAR||'m0'||V_MONTH||V_VALS; END IF; execute V_INS USING NEW; RETURN NULL; END; $$

This function computes the partition into which the row needs to be inserted, based on the month and year in the CREATED_AT field in the incoming row and executes insert using the variable NEW as a bind value. Oracle is much more rigorous when it comes to coding and doesn't easily allow tricks like this. PostgreSQL doesn't compile PL/PgSQL code, it stores it as a text literal. The code is parsed for each session that executes it and cached in the private memory of the parsing process. That means that there is no code sharing. If one process parses a PL/PgSQL function, other processes will not benefit from it because PostgreSQL doesn't have anything even remotely like Oracle shared pool. That applies to all procedural extension to PostgreSQL, not just PL/PgSQL. In other words, database functions written in Perl or Python are also stored as text literals, parsed as needed and cached on per process basis.

Partitioning
Partitioning in PostgreSQL leaves much to be desired. It is essentially the same technology used in Oracle7, with partition views. In PostgreSQL, partitioning is based on the inheritance mechanism. There is one parent table and partitions are separate tables that inherit the table structure from the parent. Rows are routed into different partitions using triggers. PostgreSQL can do partition elimination, but only during the parse phase. That means that it will not do partition pruning with an already parsed SQL and a bind variable. Also, there is no global indexes, global constraints or any other mechanism except of the very limited partition pruning. There is no global statistics, so the optimizer cannot do anything on the table level, it has to descend to partition level and optimize the SQL on per partition basis. Frequently, optimizer is not up to the task, so using partitions results in horribly inefficient queries that take a very long time to complete. Partitioning can be useful, but it requires a very careful design, which involves careful data model consideration and extremely careful query design. There is no parallel query or parallel DML, not even for the partitioned tables.

PostgreSQL optimizer
PostgreSQL has a very sophisticated cost based optimizer, somewhat different from Oracle's. In addition to the value histograms, that both databases collect, PostgreSQL also compiles the list of most frequent column values. The number of the most frequent values and the size of the histograms is determined by the default_statistics_target parameter and can range from 0 to 10000, much greater range than Oracle's. In version 9, it is possible to set the histogram size for each table column separately, using the ALTER TABLE statement. The same option is also present in Oracle 11g, which allows the DBA to set the statistics preferences on the table level. As all cost based optimizers, PostgreSQL optimizer is also sometimes counter intuitive and stubbornly refuses to use an index. To alleviate the pain, database designers have made available a myriad of set session mode options which turn on and off particular access methods. There are no hints, which is a very big problem for porting projects. If a query starts misbehaving, there is no way to just make a decision for the optimizer to use the particular index, one has to re-engineer sizable portion of the data model, just to get things running. This makes porting projects prohibitively expensive as one should expect to be forced to change the data model and essentially rewrite the application from scratch. This is particularly big problem with Java ORM software, like Hibernate, which tends to generate cumbersome and complex queries, which can be very tricky to optimize, even under the best of circumstances. Also, the lack of hints makes the application much harder to maintain. Every application contains SQL that will start causing troubles over time, because of the underlying data model changes or simply because of the growth. Oracle allows the developers to quickly plug in such holes with hints, without necessarily having to re-engineer the entire data model. PostgreSQL doesn't allow that choice. Performance problems are usually addressed by fixing the underlying data model, which can be prohibitively expensive and require prolonged downtime. Optimizer will not use indexes in case of running a parsed query with bind variables because it doesn't know anything about the values of the bind variables. The attitude of the PostgreSQL designers toward hints is extremely negative and it is unlikely that this very useful tool will be provided anytime soon. It is a curiosity, because almost all other relational databases support hints in one form or another, except the PostgreSQL. Performance of complex queries, involving many tables, can deteriorate dramatically, especially when ported from Oracle, where such queries are customary. The enmity of the PostgreSQL developers to hints is almost religious in nature. Curiously, some of the most vocal opponents of hints

work for EnterpriseDB, whose flagship product is a commercial fork of PostgreSQL which supports hints. PostgreSQL also doesn't support index organized tables or clusters. It does support bitmap indexes, but this is a murky subject which is usually avoided like a plague by the PostgreSQL experts. Until version 9, the general advice was that bitmap indexes are used at one's own risk. There is no star schema or star transformation or any other tool to help with data warehouse type tasks. PostgreSQL optimizer is geared toward OLTP type applications which are not overly complex and which do not include large tables, tens of gigabyte in size, or even larger. Of course, there are examples like Facebook and Skype, which use PostgreSQL as their backbone, but these are still exceptional things, done by top class professionals and very hard to achieve by any standard. Recently, Skype has been acquired by Microsoft and they started using SQL Server instead of PostgreSQL. The lack of parallelism and unwillingness to support hints makes PostgreSQL rather awkward when it comes to porting applications from Oracle. Also, PostgreSQL optimizer is rather poorly documented, there no books like those written by Jonathan Lewis or Tom Kyte in the PostgreSQL world. There is no method for overriding PostgreSQL optimizer decisions. PostgreSQL may be smart, but it's still not smarter than us humans. Very few optimizers can adequately deal with correlated data. PostgreSQL optimizer cannot deal with that. What does that mean, precisely? Let's say that we have a table with the information about a group of people. The information includes the date of birth and zodiac sign. There is 1/12 probability that any given person is a Capricorn. There is also 1/12 probability that any given person was born in May. The optimizer would reach the conclusion that the probability that a person born in May has Capricorn as his zodiac sign is 1/12 * 1/12 = 1/144. That is still a hefty percent that might result in the full table scan. The reality is, however, that people with Capricorn zodiac sign are born in January, not in May. Such query would return no results and PostgreSQL cannot know anything about it. Correlated and interdependent data columns are a very frequent occurrence, especially with financial data. This is far more frequent situation than it appears. And the conclusion that the optimizer can reach is always incorrect. Having no means to override optimizer decision means that we would have to remove all interdependent columns from our data model, just as the 3rd normal form would require. That is a lot of work and time, especially if the application that is being ported is modestly complex and contains foreign keys, interdependent columns, triggers, views and procedural code.

MVCC, Vacuum and IO architecture


To start this part, it is worth noticing that PostgreSQL tables are files. Each table is comprised of one or more files. When the table size exceeds 1GB, a new file is started. PostgreSQL also relies on the file system for block prefetch. PostgreSQL doesn't do asynchronous I/O or direct I/O. It also doesn't have any built in parallelism. Now, let's discuss Postgres variant of multi-versioning concurrency control, MVCC for short. PostgreSQL, unlike Oracle RDBMS, leaves the old values for the modified row in the table itself. It doesn't have anything like UNDO segments or UNDO tablespace, PostgreSQL just creates a new version of the row in the same table. Removal of the old rows from the table is done by process named vacuuming. Vacuuming also analyzes the table and gathers statistics. PostgreSQL can vacuum automatically and fairly aggressively, depending on the parameter settings. Despite that, old row versions can bloat the table significantly and slow down the I/O operations. On the plus side, there is nothing like ORA-01555: Snapshot too old in the PostgreSQL world.

Generally speaking, Oracle I/O is much more sophisticated and noticeably faster than PostgreSQL. Here is the result from the same machine and the table that was copied from Oracle to PostgreSQL using a Perl script.
Oracle 11.2: Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 -Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> set timing on SQL> select count(*) from t1; COUNT(*) 3000000 Elapsed: 00:00:00.62 Postgres 9.0: Timing is on. mgogala=# select count(*) from t1; count 3000000 (1 row) Time: 856.847 ms mgogala=#

The table is extremely simple and was created just for this comparison:
mgogala=# \d t1 Table "public.t1" Column | Type | Modifiers --------+------------------------------+----------c1 | bigint | not null c2 | character varying(30) | Indexes: "t1_pk" PRIMARY KEY, btree (c1)

In other words, oracle reads 3M very simple records in 620 milliseconds while PostgreSQL spends 856 milliseconds on the same table. The difference is rather significant. Both databases are running on the same machine, so the comparison is valid. PostgreSQL performance mailing list is full of complaints about the slowness of the full table scan but the complaints are not being addressed. The usual advice is to tune the IO subsystem. The bigger the table gets, the bigger the difference in full table scan speed is. There is no difference whatsoever when reading tables like EMP and DEPT. If the table is frequently updated, PostgreSQL full table scan gets even slower. The tune IO subsystem advice is often impractical and requires significant investments in hardware. Frequently, the advice includes solid state disks, better controllers, more CPU power or something of that nature, that would make most of the CIOs very unhappy.

Miscellaneous
Every relational database has a data dictionary. Oracle exposes its own data dictionary through the series of views, which are rather simple and well documented. Postgres dictionary is much more cryptic and much less documented than Oracle's. For instance, the pg_constraints table does not contain the id of a referenced primary key or unique key constraint for foreign key constraints. It contains a column list, which makes navigation through the dependency graph much harder. Monitoring tools are also significantly less usable than the tools available for Oracle. First and foremost, the fact that PostgreSQL has no shared pool means that it is not possible to see the execution plan of a SQL statement being executed by some application. It is possible to run explain plan, but one never knows what knobs were turned in the application itself and whether the execution environment is the same for the person monitoring as for the executing application being monitored. A positive thing is that it is possible to send a cancel query signal to a session, without killing the session. There is no such thing as performance trace, there is no wait event interface. PostgreSQL tuning is still a form of black magic, not an exact methodology. For Oracle, the methodology was laid out in Cary Millsap's groundbreaking book Optimizing Oracle Performance. Basically, Cary Millsap and Anjo Kolk introduced method and measurement to performance tuning. One could see where the time is spent, what kind of event does the application wait for and what to do about it. PostgreSQL has not developed any such methodology, PostgreSQL tuning is still a black art, of the kind that would get people burnt at stake in some less happy times. There are some extensions which provide execution plans for all SQL statements which execute longer than an preset time, but that is very messy indeed. PostgreSQL log file parsers like PgFouine leave a lot to be desired. PostgreSQL also has a problem with many connections. One should not try having 300 or so direct simultaneous connections to a PostgreSQL cluster (a PostgreSQL term for what Oracle calls instance). In such cases, a pooling tool like PgPool-II or PgBounce is a must. To tell the truth, these two tools do a good job and allow PostgreSQL to scale to very large number of connections. PostgreSQL can maintain a l standby database, which is extremely useful for high availability and performance, because reporting can be offloaded to the standby. PostgreSQL standby database is open for reading, which is similar to the Oracle Active Data Guard, a separately licensed option, available only with the version 11G.

Conclusion
PostgreSQL is a nice little open source database which can be used for offloading internal application systems from Oracle. It is very far from being on the same level with Oracle and it cannot replace Oracle in the enterprise server room, not by any stretch of the imagination. The main problem of PostgreSQL is the attitude of its developers who are resisting providing features like hints, demanded by the corporate DBA personnel. That results in very minor usage. Recent MongoDB user group had many more users and sponsors than the 2011 PgEast in New York City, which was a major flop. It was that attitude that made me cancel the ambitious porting projects that the company I used to work for has had. Simply said, providing the necessary level of service and performance wasn't possible in the allotted time and with the resources I had at my disposal. In order to become really useful PostgreSQL would need to have the following:

1. 2. 3. 4.

Optimizer hints Wait event interface Much improved partitioning. Some form of parallelism, at least for the partitioned tables.

While it is true that there are commercial versions of PostgreSQL (EnterpriseDB, VoltDB, Vertica, Greenplum, Netezza) providing the solution to some of the open source version problems, it is true that the open source version is rather limited in its present state. If we go commercial, we have to include a whole lot of other products, like Microsoft SQL Server and IBM DB2 into consideration, which is a completely different story. I have no experience with the commercial versions of PostgreSQL and I cannot make any claims about those versions.

You might also like