Re: TABLESAMPLE patch
От | Petr Jelinek |
---|---|
Тема | Re: TABLESAMPLE patch |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | TABLESAMPLE patch (Petr Jelinek <[email protected]>) |
Список | pgsql-hackers |
On 11/12/14 00:24, Petr Jelinek wrote: > Hello, > > Attached is a basic implementation of TABLESAMPLE clause. It's SQL > standard clause and couple of people tried to submit it before so I > think I don't need to explain in length what it does - basically returns > "random" sample of a table using a specified sampling method. > > I implemented both SYSTEM and BERNOULLI sampling as specified by SQL > standard. The SYSTEM sampling does block level sampling using same > algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly. > > There is API for sampling methods which consists of 4 functions at the > moment - init, end, nextblock and nexttuple. I added catalog which maps > the sampling method to the functions implementing this API. The grammar > creates new TableSampleRange struct that I added for sampling. Parser > then uses the catalog to load information about the sampling method into > TableSampleClause which is then attached to RTE. Planner checks for if > this parameter is present in the RTE and if it finds it it will create > plan with just one path - SampleScan. SampleScan implements standard > executor API and calls the sampling method API as needed. > > It is possible to write custom sampling methods. The sampling method > parameters are not limited to just percent number as in standard but > dynamic list of expressions which is checked against the definition of > the init function in a similar fashion (although much simplified) as > function calls are. > > Notable lacking parts are: > - proper costing and returned row count estimation - given the dynamic > nature of parameters I think for we'll need to let the sampling method > do this, so there will have to be fifth function in the API. > - ruleutils support (it needs a bit of code in get_from_clause_item > function) > - docs are sparse at the moment > Forgot the obligatory: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: