In a setup of 3 datanodes and 1 coodinator on the same physical machine (for functional tests), DNs/Coordinator connect to the GTM using gtm_proxy a suggested by Koichi Suzuki to lessen the load on GTM. There are cases when 1st query after connecting works, but the subsequent eecution causes segfault of gtm_proxy:
Starting program: /d00/pgxc/bin/gtm_proxy -D data/gtm_proxy/
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7ddf700 (LWP 23158)]
[New Thread 0x7ffff73de700 (LWP 23159)]
[New Thread 0x7ffff69dd700 (LWP 23160)]
[New Thread 0x7ffff5fbb700 (LWP 23163)]
[New Thread 0x7ffff54d7700 (LWP 23164)]
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7ddf700 (LWP 23158)]
0x00000036a7a328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.6.x86_64
(gdb) bt
#0 0x00000036a7a328a5 in raise () from /lib64/libc.so.6
#1 0x00000036a7a34085 in abort () from /lib64/libc.so.6
#2 0x000000000040c517 in errfinish (dummy=<value optimized out>) at elog.c:368
#3 0x000000000040c9c3 in elog_finish (elevel=<value optimized out>, fmt=<value optimized out>) at elog.c:629
#4 0x0000000000419c1f in ProcessResponse (thrinfo=<value optimized out>, cmdinfo=0x7ffff0026fb0, res=0x7ffff0028ef0) at proxy_main.c:1899
#5 0x000000000041abef in GTMProxy_ThreadMain (argp=0x64b290) at proxy_main.c:1500
#6 0x000000000041d91b in GTMProxy_ThreadMainWrapper (argp=0x64b290) at proxy_thread.c:316
#7 0x00000036a7e07851 in start_thread () from /lib64/libpthread.so.0
#8 0x00000036a7ae811d in clone () from /lib64/libc.so.6
(gdb)
I bumped the number of worker_threads to 5 but it did not help, not sure if it's a misconfiguration or a genuine bug.
the log of the gtm_proxy looks like:
1:140737308882688:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737351907072:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.157 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737351907072:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737351907072:2013-05-07 17:30:51.158 CEST -ERROR: Failed to Register node
LOCATION: ProcessPGXCNodeCommand, proxy_main.c:2138
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737341417216:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737330927360:2013-05-07 17:30:51.158 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737351907072:2013-05-07 17:30:51.158 CEST -PANIC: Invalid response or synchronization loss
LOCATION: ProcessResponse, proxy_main.c:1899
1:140737341417216:2013-05-07 17:30:51.159 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737308882688:2013-05-07 17:30:51.159 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
and the config file:
# GENERAL PARAMETERS
#------------------------------------------------------------------------------
nodename = 'gtm_proxy1' # Specifies the node name.
# (changes requires restart)
listen_addresses = '*' # Listen addresses of this GTM.
# (changes requires restart)
port = 6666 # Port number of this GTM.
# (changes requires restart)
#------------------------------------------------------------------------------
# GTM PROXY PARAMETERS
#------------------------------------------------------------------------------
worker_threads = 5 # Number of the worker thread of this
# GTM proxy
# (changes requires restart)
#------------------------------------------------------------------------------
# GTM CONNECTION PARAMETERS
#------------------------------------------------------------------------------
# Those parameters are used to connect to a GTM server
gtm_host = 'localhost' # Listen address of the active GTM.
# (changes requires restart)
gtm_port = 20001 # Port number of the active GTM.
# (changes requires restart)
#------------------------------------------------------------------------------
# Behavior at GTM communication error
#------------------------------------------------------------------------------
gtm_connect_retry_interval = 2 # How long (in secs) to wait until the next
# retry to connect to GTM.
#
#
#------------------------------------------------------------------------------
# Other options
#------------------------------------------------------------------------------
#keepalives_idle = 0 # Keepalives_idle parameter.
#keepalives_interval = 0 # Keepalives_interval parameter.
#keepalives_count = 0 # Keepalives_count internal parameter.
#log_file = 'gtm_proxy.log' # Log file name
#log_min_messages = WARNING # log_min_messages. Default WARNING.
# Valid value: DEBUG, DEBUG5, DEBUG4, DEBUG3,
# DEBUG2, DEBUG1, INFO, NOTICE, WARNING,
# ERROR, LOG, FATAL, PANIC.
Cheers,
Chris
with logging at the debug level , the query that executes properly (no segfault yet) gives:
1:140737309546240:2013-05-08 11:45:12.614 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.615 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.615 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.615 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.615 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.615 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.615 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.616 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.616 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.616 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.616 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.616 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.616 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.617 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.617 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.617 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.617 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.617 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737309546240:2013-05-08 11:45:12.617 CEST -LOG: Node with the given ID number already exists
LOCATION: pgxcnode_add_info, register_common.c:249
1:140737309546240:2013-05-08 11:45:12.618 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737309546240:2013-05-08 11:45:12.618 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
1:140737330927360:2013-05-08 11:45:12.619 CEST -DEBUG: Recovery_PGXCNodeRegister Request info: type=3, nodename=co1, port=5435,datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:397
1:140737330927360:2013-05-08 11:45:12.619 CEST -DEBUG: Recovery_PGXCNodeRegister Node info: type=3, nodename=co1, port=5435, datafolder=/d00/pgxc/data/coord1, ipaddress=backupp, status=0
LOCATION: Recovery_PGXCNodeRegister, register_common.c:400
Hello,
This is quite critical for us, do you think you could have a look in situ - I could give you credentials to login and debug - I could not reproduce this behaviour with a simple testcase unfortunately.
Best regards,
Chris