Citus Installation and Configuration

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Citus Installation and Con iguration

PostgreSQL 15.8
Citus 12.1-1

Prepared by: Ramesh S Raj ([email protected])


Contents

Citus Con iguration ................................................................................................................ 3


Con igure the Cluster .......................................................................................................... 3
Worker Nodes: ................................................................................................................ 4
Create and Distribute Data .................................................................................................. 5
Adding New Node to the cluster .......................................................................................... 7
Rebalancing the Data....................................................................................................... 7
Un-distributing the Data ................................................................................................ 10
Test Case .......................................................................................................................... 11
Con iguring Citus on an AWS RDS PostgreSQL instance: ..................................................... 12

Prepared by: Ramesh S Raj ([email protected])


Citus Con iguration
Citus is an extension that enables PostgreSQL to scale horizontally across multiple machines by
sharding tables and creating redundant copies.
Key features of a Citus cluster include:
- Creating distributed tables that are sharded across a cluster of PostgreSQL nodes, effectively
combining their CPU, memory, storage, and I/O resources.
- Replicating reference tables across all nodes to facilitate joins, foreign keys, and maximize read
performance from distributed tables.
- Routing and parallelizing SELECT, DML, and other operations on distributed tables across the
cluster using the distributed query engine.

Con igure the Cluster


Simulated a Cluster con iguration with the below Con iurations:
Here is the cluster’s list
nodeid | nodename | groupid | isactive
--------+------------+---------+----------
1 | 10.88.11.8 | 0|t
2 | 10.88.11.8 | 1|t
3 | 10.88.11.8 | 2|t
4 | 10.88.11.8 | 3|t
5 | 10.88.11.8 | 5|t

>select citus_set_coordinator_host(’10.88.11.8', 5432);


>Add rest of the nodes to the cluster.
Since I have used the same server for POC purpose, I am using the below command:
citus_add_node('10.88.11.8',5433);
else use pg_dist_node, if the nodes are distributed across remote servers.
Repeat the above for all the nodes in the cluster

Since I have created four separate clusters on the same node, you are seeing the same IP
address. Internally, the clusters are managed through different ports.

Prepared by: Ramesh S Raj ([email protected])


Node 0 serves as the primary node (/Coordinator Node), acting as the central controller in the
Citus cluster to manage query execution, data distribution, and transaction consistency. Nodes 2
through 5 are redundancy nodes.
Created a 2X redundancy across a cluster of four (4)
shardeg=# show citus.shard_replication_factor;
citus.shard_replication_factor
--------------------------------
2
(1 row)

Worker Nodes:

shardeg=# select * from master_get_active_worker_nodes();


node_name | node_port
------------+-----------
10.88.11.8 | 5435
10.88.11.8 | 5439
10.88.11.8 | 5434
10.88.11.8 | 5436
10.88.11.8 | 5433
(5 rows)

Prepared by: Ramesh S Raj ([email protected])


Create and Distribute Data

Login to the coordinator and execute the following commands


shardeg=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------------------+-------+----------+-------------+---------------+---------+-------------
public | pgbench_accounts | table | postgres | permanent | heap | 0 bytes |
public | pgbench_branches | table | postgres | permanent | heap | 0 bytes |
public | pgbench_history | table | postgres | permanent | heap | 0 bytes |
public | pgbench_tellers | table | postgres | permanent | heap | 0 bytes |
(4 rows)
Distribute the pgbench tables across nodes:
shardeg=# select create_distributed_table('pgbench_history', 'aid');
(1 row)
shardeg=# select create_distributed_table('pgbench_accounts', 'aid');
create_distributed_table
--------------------------
(1 row)
shardeg=# select create_distributed_table('pgbench_branches', 'bid');
create_distributed_table
--------------------------
(1 row)

shardeg=# select create_distributed_table('pgbench_tellers', 'tid');


create_distributed_table
--------------------------
(1 row)

Prepared by: Ramesh S Raj ([email protected])


The default number of shards created for a table is 32, which are distributed across the available
clusters.

shardeg=# select * from citus_shards;


table_name | shardid | shard_name | citus_table_type | colocation_id | nodename |
nodeport | shard_size
------------------+---------+-------------------------+------------------+---------------+------------+----------+----
--------
pgbench_accounts | 102328 | pgbench_accounts_102328 | distributed | 12 | 10.88.11.8 |
5433 | 125911040
pgbench_accounts | 102328 | pgbench_accounts_102328 | distributed | 12 | 10.88.11.8
| 5434 | 125911040
pgbench_branches | 102360 | pgbench_branches_102360 | distributed | 12 | 10.88.11.8
| 5433 | 8192
pgbench_branches | 102360 | pgbench_branches_102360 | distributed | 12 | 10.88.11.8
| 5434 | 8192
pgbench_history | 102309 | pgbench_history_102309 | distributed | 12 | 10.88.11.8 |
5436 | 0
pgbench_history | 102309 | pgbench_history_102309 | distributed | 12 | 10.88.11.8 |
5439 | 0
pgbench_tellers | 102414 | pgbench_tellers_102414 | distributed | 12 | 10.88.11.8 |
5435 | 8192
pgbench_tellers | 102414 | pgbench_tellers_102414 | distributed | 12 | 10.88.11.8 |
5436 | 8192

Prepared by: Ramesh S Raj ([email protected])


Adding New Node to the cluster

shardeg=# SELECT * from citus_add_node('10.88.11.8',5411);


citus_add_node
----------------
7
(1 row)
shardeg=# select * from master_get_active_worker_nodes();
node_name | node_port
------------+-----------
10.88.11.8 | 5435
10.88.11.8 | 5439
10.88.11.8 | 5434
10.88.11.8 | 5436
10.88.11.8 | 5411
10.88.11.8 | 5433
(6 rows)
shardeg=#

Rebalancing the Data

Create the index’s on the table.


shardeg=# create unique index pgbench_accounts_pk on pgbench_accounts(aid);
CREATE INDEX
shardeg=# create unique index pgbench_branches_pk on pgbench_branches(bid);
CREATE INDEX
shardeg=# create unique index pgbench_tellers_pk on pgbench_tellers(tid);
CREATE INDEX
shardeg=#

Prepared by: Ramesh S Raj ([email protected])


shardeg=# select * from rebalance_table_shards();
NOTICE: Moving shard 102333 from 10.88.11.8:5434 to 10.88.11.8:5411 ...
ERROR: ERROR: logical decoding requires wal_level >= logical
CONTEXT: while executing command on 10.88.11.8:5434
while executing command on localhost:5432
shardeg=#
The Above Error is because the replication was not set to Logical. Replication on all nodes
should be set to logical, as the rebalancing process will be based on polling.
When rebalancing is executed, the data is distributed across the new cluster.
shardeg=# select * from rebalance_table_shards(); -- with factor 2.
NOTICE: Moving shard 102333 from 10.88.11.8:5434 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102340 from 10.88.11.8:5435 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102329 from 10.88.11.8:5434 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102358 from 10.88.11.8:5433 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102354 from 10.88.11.8:5435 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102335 from 10.88.11.8:5436 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102349 from 10.88.11.8:5434 to 10.88.11.8:5411 ...
NOTICE: Moving shard 102337 from 10.88.11.8:5433 to 10.88.11.8:5411 ...

NOTICE: Moving shard 102331 from 10.88.11.8:5439 to 10.88.11.8:5411 ...


NOTICE: Moving shard 102345 from 10.88.11.8:5435 to 10.88.11.8:5411 ...

You'll notice that data rebalancing occurs from the sharded nodes, not from the coordinator
(primary) node.
Execute: Select * from citus_shard
table_name | shardid | shard_name | citus_table_type | colocation_id | nodename |
nodeport | shard_size
------------------+---------+-------------------------+------------------+---------------+------------+----------+----
--------
pgbench_accounts | 102328 | pgbench_accounts_102328 | distributed | 12 | 10.88.11.8
| 5433 | 146989056
pgbench_accounts | 102328 | pgbench_accounts_102328 | distributed | 12 | 10.88.11.8
| 5434 | 146989056

Prepared by: Ramesh S Raj ([email protected])


pgbench_accounts | 102329 | pgbench_accounts_102329 | distributed | 12 | 10.88.11.8
| 5435 | 147357696
pgbench_accounts | 102329 | pgbench_accounts_102329 | distributed | 12 | 10.88.11.8
| 5411 | 147357696
pgbench_branches | 102361 | pgbench_branches_102361 | distributed | 12 | 10.88.11.8
| 5411 | 57344
pgbench_branches | 102361 | pgbench_branches_102361 | distributed | 12 | 10.88.11.8
| 5435 | 57344

Prepared by: Ramesh S Raj ([email protected])


Altering the replication Factor
Recreated the `pg_bench` history table, rebalanced it, and replicated it onto the new node with a
replication factor of 3.
citus.shard_replication_factor
--------------------------------
3 (1 row)
pgbench_branches | 102391 | pgbench_branches_102391 | distributed | 12 | 10.88.11.8
| 5434 | 57344
pgbench_branches | 102391 | pgbench_branches_102391 | distributed | 12 | 10.88.11.8
| 5435 | 57344
pgbench_history | 102424 | pgbench_history_102424 | distributed | 13 | 10.88.11.8 |
5411 | 40960
pgbench_history | 102424 | pgbench_history_102424 | distributed | 13 | 10.88.11.8 |
5433 | 40960
pgbench_history | 102424 | pgbench_history_102424 | distributed | 13 | 10.88.11.8 |
5434 | 40960

Un-distributing the Data

Recreated the distribution, after deleting the original distribution.


table_name | shardid | shard_name | citus_table_type | colocation_id | nodename |
nodeport | shard_size
------------------+---------+-------------------------+------------------+---------------+------------+----------+----
--------
pgbench_accounts | 102488 | pgbench_accounts_102488 | distributed | 14 | 10.88.11.8
| 5411 | 146907136
pgbench_accounts | 102488 | pgbench_accounts_102488 | distributed | 14 | 10.88.11.8
| 5433 | 146907136
pgbench_accounts | 102488 | pgbench_accounts_102488 | distributed | 14 | 10.88.11.8
| 5434 | 146907136

Prepared by: Ramesh S Raj ([email protected])


Test Case
shardeg=# explain (analyze ) select * from pgbench_accounts where aid=30 limit 50;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=0.56..8.58 rows=1 width=352) (actual time=0.107..0.111 rows=1 loops=1)

-> Index Scan using pk_accounts on pgbench_accounts (cost=0.56..8.58 rows=1 width=352) (actual time=0.104..0.107 rows=1
loops=1)

Index Cond: (aid = 30)

Planning Time: 0.384 ms

Execution Time: 0.161 ms

shardeg=#

Data was being retrieved from the Primary Node.


Deleted the tables from the Primary (Co-ordinator).
Executed the same Query:

shardeg=# explain (analyze) select * from pgbench_accounts where aid=30 limit 50;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------

Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=12.721..12.724 rows=1 loops=1)

Task Count: 1

Tuple data received from nodes: 96 bytes

Tasks Shown: All

-> Task

Tuple data received from node: 96 bytes

Node: host=10.88.11.8 port=5411 dbname=shardeg

-> Limit (cost=0.42..8.44 rows=1 width=97) (actual time=0.168..0.171 rows=1 loops=1)

-> Index Scan using pk_accounts_102516 on pgbench_accounts_102516 pgbench_accounts (cost=0.42..8.44 rows=1


width=97) (actual time=0.162..0.164 rows=1 loops=1)

Index Cond: (aid = 30)

Planning Time: 0.671 ms

Execution Time: 0.256 ms

Planning Time: 2.036 ms

Execution Time: 12.831 ms

Prepared by: Ramesh S Raj ([email protected])


shardeg=#

Data is being retrieved from the shard node, while in both cases, the query is executed from the
primary node.

Con iguring Citus on an AWS RDS PostgreSQL instance:

Check Compatibility and Prerequisites


 RDS PostgreSQL Version:
PostgreSQL 11 or later
 Citus Extension Availability:
Citus cannot be installed or con igured directly on an AWS RDS PostgreSQL
instance. As a fully managed database service, AWS RDS imposes limitations on the
installation of custom extensions that are not natively supported by Amazon. While AWS
RDS does offer support for a range of PostgreSQL extensions, Citus is not included.
 Amazon EC2 to Deploy Citus:
Deply PostgreSQL with the Citus extension on Amazon EC2 instances. This
approach grants you complete control over the operating system and PostgreSQL
con iguration, enabling you to install and customize Citus according to your needs.

 Native Partitioning:
Consider leveraging PostgreSQL's native partitioning features within AWS RDS.
Although not as robust as Citus for handling distributed workloads, these features can
still enhance performance and manageability for large tables.

Prepared by: Ramesh S Raj ([email protected])

You might also like