postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
1	2 (1)	3 (6)	4 (19)	5	6 (15)	7 (2)
8 (2)	9 (22)	10 (20)	11 (20)	12 (14)	13 (12)	14 (2)
15	16 (14)	17 (17)	18 (4)	19 (8)	20 (2)	21 (3)
22	23 (8)	24 (1)	25	26 (2)	27 (1)	28
29	30 (7)	31 (3)

Flat | Threaded

Re: [Postgres-xc-developers] WIP patch for xc_watchdog

From: Michael P. <mic...@gm...> - 2012-07-03 23:38:15

On Wed, Jul 4, 2012 at 8:02 AM, Nikhil Sontakke <ni...@st...> wrote:

> >
> > Are there people with a similar opinion to mine???
> >
>
> +1
>
> IMO too we should not be making any too invasive internal changes to
> support monitoring. What would be better would be to maybe allow
> commands which can be scripted and which can work against each of the
> components.
>
This could be more easily manageable by creating new system functions for
monitoring in C to do that as an EXTENSION, pluggable as a contrib module


> For example, for the coordinator/datanode periodic "SELECT 1" commands
> should be good enough. Even doing an EXECUTE DIRECT via a coordinator
> to the datanodes will help.
>
> For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl
> ping" kinds of commands which will basically connect to them and see
> that they are responding ok.
>
That is an interesting idea.
Btw, we should definitely avoid any additional GUC parameters inside GTM
core code.
This avoids complicating cluster settings, and users may use a different
monitoring solution as the one proposed.


>
> Such interfaces make it really easy for monitoring solutions like
> nagios, zabbix etc. to monitor them. These tools have been used for a
> while now to monitor Postgres and it should be a natural logical
> evolution for users to see them being used for PG XC.
>
Completely agreed, we do not need to reinvent solutions already existing
and proved to be enough sufficient.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] WIP patch for xc_watchdog

From: Nikhil S. <ni...@st...> - 2012-07-03 23:02:43

>
> Are there people with a similar opinion to mine???
>

+1

IMO too we should not be making any too invasive internal changes to
support monitoring. What would be better would be to maybe allow
commands which can be scripted and which can work against each of the
components.

For example, for the coordinator/datanode periodic "SELECT 1" commands
should be good enough. Even doing an EXECUTE DIRECT via a coordinator
to the datanodes will help.

For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl
ping" kinds of commands which will basically connect to them and see
that they are responding ok.

Such interfaces make it really easy for monitoring solutions like
nagios, zabbix etc. to monitor them. These tools have been used for a
while now to monitor Postgres and it should be a natural logical
evolution for users to see them being used for PG XC.

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-developers] WIP patch for xc_watchdog

From: Michael P. <mic...@gm...> - 2012-07-03 05:04:45

Hum, I am honestly not a fan of this way of doing.
I just cannot get the meaning of touching the core code for a feature which
is doing only monitoring.
We should discuss with Postgres community about that and receive feedback
about the possible solutions we could use here, and really think a lot
before touching code parts that we haven't touched yet and might impact
PostgreSQL code itself if there are any side effects.
What I cannot get is why adding an internal chronometer when there are
already options available:
- Using a simple "SELECT 1" on the database
- Using pg_ctl status !
This implementation makes the core code dependent on monitoring features
when it should definitely be the opposite.
A database server monitoring shouldn't touch the core, but only use its
functionalities. And even if PostgreSQL would need such a feature, XC
should extend in a cluster way what is already existing in Postgres. So why
reinventing the wheel??
Also, this patch adds a total of 6 GUC parameters, 2 for GTM, 2 for
GTM-proxy and 2 for Coordinator/Datanode. It complicated too much the
feature.

Instead of creating so many dependencies with Postgres code, why not
creating a simple system function that returns back to client a
confirmation message at a given time interval?
Let's imagine the system function pgxc_watchdog(interval, cycle);
This function could be like this.
Datum pgxc_watchdog(interval time, int cycles)
{
    int i;
    for (i = 0; i < cycles; i++)
    {
         sleep(interval)
         /* Send back to client */
         send_back('SELECT 1 result'); /* Send back a result or something */
    }
}
There are a lot of benefits on doing that:
- do not touch the core for monitoring purposes (really really important to
my mind)
- reduce GUC parameter by 6.
- This implementation is portable and easy to maintain, you can also create
a similar function to check GTM status from an XC node.
- You do not need an additional external module that would need to read the
monitoring pulse => you connect with a given client to a PostgreSQL server,
and launch it through a driver, whatever it is, so it can be really easily
adapted to all kind of implementations and applications.

Are there people with a similar opinion to mine???

On Mon, Jul 2, 2012 at 2:25 PM, Koichi Suzuki <koi...@gm...>wrote:

> Hi,
>
> Eclosed is a WIP patch for xc_watchdog, for coordinator/datanode/gtm
> and gtm_proxy.  It is against the current master as of June 2nd,
> 2:00PM in JST.   I've tested them with gdb and found watchdog timer is
> incremented as expected.    I will write time detector and continue
> more test.
>
> Regards;
> ----------
> Koichi Suzuki
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Statements that cannot be run inside a transaction block

From: Michael P. <mic...@gm...> - 2012-07-03 04:46:40

On Fri, Jun 29, 2012 at 8:34 PM, Amit Khandekar <
ami...@en...> wrote:

> For utility statements in general, the coordinator propagates SQL
> statements to all the required nodes, and most of these statements get run
> on the datanodes inside a transaction block. So, when the statement fails
> on at least one of the nodes, the statement gets rollbacked on all the
> nodes due to the two-phase commit taking place, and therefore the cluster
> rollbacks to a consistent state.  But there are some statements which
> cannot be run inside a transaction block. Here are some important ones:
> CREATE/DROP DATABASE
> CREATE/DROP TABLESPACE
> ALTER DATABASE SET TABLESPACE
> ALTER TYPE ADD ... (for enum types)
> CREATE INDEX CONCURRENTLY
> REINDEX DATABASE
> DISCARD ALL
>
> So such statements run on datanodes in auto-commit mode, and so create
> problems if they succeed on some nodes and abort on other nodes.  For e.g.
> : CREATE DATABASE. If a datanode d1 returns with error, and any other
> datanode d2 has already returned back to coordinator with success, the
> coordinator can't undo the commit of d2 because this is already committed.
> Or if the coordinator itself crashes after datanodes commit but before the
> coordinator commits, then again we have the same problem. The database
> cannot be recreated from coordinator, since it is already created on some
> of the other nodes. In such a cluster state, administrator needs to connect
> to datanodes and do the needed cleanup.
>
> The committed statements can be followed by statements that undo the
> operation, for e.g. DROP DATABASE for a CREATE DATABASE. But here again
> this statement can fail for some reason. Also, typically for such
> statements, their UNDO counterparts themselves cannot be run inside a
> transaction block as well. So this is not a guaranteed way to bring back
> the cluster to a consistent state.
>
> To find out how we can get around this issue, let's see why these
> statements require to be run outside a transaction block in the first
> place. There are two reasons why:
>
> 1. Typically such statements modify OS files and directories which cannot
> be rollbacked.
>
> For DMLs, the rollback does not have to be explicitly undone. MVCC takes
> care of it. But for OS file operations, there is no automatic way. So such
> operations cannot be rollbacked. So in a transaction block, if a
> create-database is followed by 10 other SQL statements before commit, and
> one of the statements throws an error, ultimately the database won't be
> created but there will be database files taking up disk space, and this has
> happened just because the user has written the script wrongly.
>
> So by restricting such statement to be run outside a transaction block, an
> unrelated error won't cause garbage files to be created.
>
> The statement itself does get committed eventually as usual. And it can
> also get rolled back in the end. But maximum care has been taken in the
> statement function (for e.g. createdb) such that the chances of an error
> occurring *after* the files are created is least. For this, such a code
> segment is inside PG_ENSURE_ERROR_CLEANUP() with some error_callback
> function (createdb_failure_callback) which tries to clean up the files
> created.
>
> So the end result is that this window between files-created and
> error-occurred is minimized, not that such statements will never create
> such cleanup issues if run outside transaction block.
>
> Possible solution:
>
> So regarding Postgres-XC, if we let such statements to be run inside
> transaction block but only on remote nodes, what are the consequences? This
> will of course prevent the issue of the statement committed on one node and
> not the other. Also, the end user will still be prevented from running the
> statement inside the transaction. Moreover, for such statement, say
> create-database, the database will be created on all nodes or none, even if
> one of the nodes return error. The only issue is, if the create-database is
> aborted, it will leave disk space wasted on nodes where it has succeeded.
> But this will be caused because of some configuration issues like disk
> space, network down etc. The issue of other unrelated operations in the
> same transaction causing rollback of create-database will not occur anyways
> because we still don't allow it in a transaction block for the end-user.
>
> So the end result is we have solved the inconsistent cluster issue,
> leaving some chances of disk cleanup issue, although not due to
> user-queries getting aborted. So may be when such statements error out, we
> display a notice that files need to be cleaned up.
>
Could it be possible to store somewhere in the PGDATA folder of the node
involved the files that need to be cleaned up? We could use for this
purpose some binary encoding or something. Ultimately this would finish
just by being a list of files inside PGDATA to be cleaned up.
We could then create a system function that unlinks all the files whose
name have been stored on local node. As such a system function does not
interact with other databases it could be immutable in order to allow a
clean up from coordinator with EXECUTE DIRECT.


> We can go further ahead to reduce this window. We split the
> create-database operation. We begin a transaction block, and then let
> datanodes create the non-file operations first, like inserting pg_database
> row, etc, by running them using a new function call. Don't commit it yet.
> Then fire the last part: file system operations, this too using another
> function call. And then finally commit. This file operation will be under
> PG_ENSURE_ERROR_CLEANUP(). Due to synchronizing these individual tasks, we
> reduce the window further.
>
We need to be careful here with the impact of our code on PostgreSQL code.
It would be a pain to have a complecated implementation here for future
merges.


> 2. Some statements do internal commits.
>
> For e.g. movedb() calls TransactionCommit() after copying the files, and
> then removes the original files, so that if it crashes while removing the
> files, the database with the new tablespace is already committed and
> intact, so we just leave some old files.
>
> Such statements doing internal commits cannot be rolled back if run inside
> transaction block, because they already do some commits. For such
> statements, the above solution does not work. We need to find a separate
> way for these specific statements. Few of such statements include:
> ALTER DATABASE SET TABLESPACE
> CLUSTER
> CREATE INDEX CONCURRENTLY
>
> One similar solution is to split the individual tasks that get internally
> committed using different functions for each task, and run the individual
> functions on all the nodes synchronously. So the 2nd task does not start
> until the first one gets committed on all the nodes. Whether it is feasible
> to split the task is a question, and it depends on the particular command.
>
We would need a locking system for each task and each task step like what
is done for barrier.
Or a new communication protocol, once again like barriers. Those are once
again just ideas on the top of my mind.


>
> As of now, I am not sure whether we can do some common changes in the way
> transactions are implemented to find a common solution which does not
> require changes for individual commands. But I will investigate more.
>
Thanks.
-- 
Michael Paquier
http://michael.otacoo.com

2 messages has been excluded from this view by a project administrator.

Flat | Threaded

S	M	T	W	T	F	S
1	2 (1)	3 (6)	4 (19)	5	6 (15)	7 (2)
8 (2)	9 (22)	10 (20)	11 (20)	12 (14)	13 (12)	14 (2)
15	16 (14)	17 (17)	18 (4)	19 (8)	20 (2)	21 (3)
22	23 (8)	24 (1)	25	26 (2)	27 (1)	28
29	30 (7)	31 (3)