Pgpool II Tutorial
Pgpool II Tutorial
Pgina 1
pgpool-II Tutorial
Welcome to the Tutorial for pgpool-II. From here, you can learn how to install, setup, and run parallel queries or do replication using pgpoolII. We assume that you already know basic PostreSQL operations, so please refer to the PostgreSQL document if needed. Table of Contents 1. Let's Begin! 1.1. Installing pgpool-II 1.2. Configuration Files 1.3. Configuring PCP commands 1.4. Preparing Database Nodes 1.5. Starting/Stopping pgpool-II 2. Your First Replication 2.1. Configuring Replication 2.2. Checking Replication 3. Your First Parallel Query 3.1. Configuring Parallel Query 3.2. Configuring the System Database 3.3. Partitioning Rule Definition 3.4. Replication Rule Definition 3.5. Checking Parallel Query
1. Let's Begin!
First, we must learn how to install, configure pgpool-II and database nodes before using replication or parallel query.
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht
pgpool-II Tutorial
To encrypt your password into md5 hash format, use pg_md5 command, which is installed as a part of pgpool-II executables. pg_md5 takes text as an command line argument, and displays its md5-hashed text. For example, give "postgres" as the command line argument, at pg_md5 displays md5-hashed text to the standard output. $ /usr/bin/pg_md5 postgres e8a48653851e28c69d0506508fb27fc5 PCP commands are executed via network, so the port number must be configured with pcp_port parameter in pgpool.conf file. We will use the default 9898 for pcp_port in this tutorial. pcp_port = 9898
Pgina 2
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht
pgpool-II Tutorial
Pgina 3
Let's use a simple shell script to check the above on all the nodes. The following script will display the number of rows in branches, tellers, accounts, and history tables on all the nodes (5432, 5433, 5434). $ for port in 5432 5433 5434; do > echo $port > for table_name in branches tellers accounts history; do > echo $table_name > psql -c "SELECT count(*) FROM $table_name" -p $port bench_replication > done > done
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht
pgpool-II Tutorial
listen_addresses = '*' Attention: The replication is not done for the table that does the partitioning though a parallel Query and the replication can be made effective at the same time. Attention: You can have both partitioned tables and replicated tables. However a table cannot be a partioned and replicated one at the same time. Because the data structure of partioned tables and replicated tables are different, "bench_replication" database created in section "2. Your First Replication" cannot be reused in parallel query mode. replication_mode = true load_balance_mode = false OR replication_mode = false load_balance_mode = true In this section, we will set parallel_mode and load_balance_mode to true, listen_addresses to '*', replication_modeto false.
Pgina 4
Next, we must install dblink into "pgpool" database. dblink is one of the tools included in contrib directory in the PostgreSQL source code. To install dblink to your system, execute the following commands. $ USE_PGXS=1 make -C contrib/dblink $ USE_PGXS=1 make -C contrib/dblink install After dblink has been installed into your system, we will define dblink functions in "pgpool" database. If PostgreSQL is installed in /usr/ local/pgsql, dblink.sql (a file with function definitions) should have been installed in /usr/local/pgsql/share/contrib. Now, execute the following command to define dblink functions. $ psql -f /usr/local/pgsql/share/contrib/dblink.sql -p 5432 pgpool
Define a table called "dist_def", which has the partitioning rule, in database called "pgpool". After installing pgpool-II, you will have system_ db.sql, which is the psql script to generate the system database. $ psql -f /usr/local/share/system_db.sql -p 5432 -U pgpool pgpool dist_def table is created in pgpool_catalog schema. If you have configured system_db_schema to use other schema, you need to edit system_ db.sql accordingly. The definition for "dist_def" is as shown here, and the table name cannot be changed. CREATE TABLE pgpool_catalog.dist_def ( dbname text, -- database name schema_name text, -- schema name table_name text, -- table name col_name text NOT NULL CHECK (col_name = ANY (col_list)), -- distribution key-column col_list text[] NOT NULL, -- list of column names type_list text[] NOT NULL, -- list of column types dist_def_func text NOT NULL, -- distribution function name PRIMARY KEY (dbname, schema_name, table_name) ); A tuple stored in "dist_def" can be classified into two types. Distribution Rule (col_name, dist_def_func) Table's meta-information (dbname, schema_name, table_name, col_list, type_list)
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht
pgpool-II Tutorial
A distribution rule decides how to distribute data to a particular node. Data will be distributed depending on the value of "col_name" column. "dist_def_func" is a function that takes the value of "col_name" as its argument, and returns an integer which points to the appropriate database node ID where the data should be stored.
Pgina 5
A meta-information is used to rewrite queries. Parallel query must rewrite queries so that the results sent back from the backend nodes can be merged into one result.
If you want to use replicated tables in SELECT in parallel mode, you need to register information of such tables(replication rule) to a table called replicate_def. The replicate_def table has already been made when making it from the system_db.sql file when dist_def is defined. The replicate_def table is defined as follows. CREATE TABLE pgpool_catalog.replicate_def ( dbname text, -- database name schema_name text, -- schema name table_name text, -- table name col_list text[] NOT NULL, -- list of column names type_list text[] NOT NULL, -- list of column types PRIMARY KEY (dbname, schema_name, table_name) ); replicate_def includes table's meta data information(dbname, schema_name, table_name, col_list, type_list). All the query analysis and query rewriting process are depending on the information (table, column and type) stored in dist_def and/or replicate_def table. If the information is not correct, analysis and query rewriting process will produce wrong results.
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht
pgpool-II Tutorial
ARRAY['tid', 'bid', 'tbalance', 'filler'], ARRAY['integer', 'integer', 'integer', 'character(84)'] ); Replicate_def_pgbench.sql is prepared in sample directory. In the directory that progresses the source code to define a replicate rule by using this as follows The psql command is executed. $ psql -f sample/replicate_def_pgbench.sql -p 5432 pgpool
Pgina 6
Let's use a simple shell script to check the above on all the nodes and via pgpool-II. The following script will display the minimum and maximum values in accounts table using port 5432, 5433, 5434, and 9999. $ for port in 5432 5433 5434 9999; do > echo $port > psql -c "SELECT min(aid), max(aid) FROM accounts" -p $port bench_parallel > done
file://localhost/C:/Users/Juan/Downloads/pgpool-II%20Tutorial.mht