Re: The science of optimization in practical terms?
От | Robert Haas |
---|---|
Тема | Re: The science of optimization in practical terms? |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | Re: The science of optimization in practical terms? (decibel <[email protected]>) |
Ответы |
Re: The science of optimization in practical terms?
Re: The science of optimization in practical terms? |
Список | pgsql-hackers |
> Actually, a simple algorithm that might work really well would be to > calculate relation cache odds as ( number of page accesses for relation / > number of page accesses for all relations ) * ( sum(relpages)*BLKSZ / > eff_cache_size ), where number of page accesses would be both from relcache > and not. I don't think that formula makes any sense. If effective_cache_size is in the denominator, then increasing it will make the odds of finding the page in cache go down. > One thing this doesn't address though is the report from a few > months ago that accessing small tables is still faster with an index scan, > even if we know the whole thing is in cache (I don't remember if that was > ever resolved...) I'm not sure if this is what you're referring to, but there was a relatively recent post on, I believe, -performance, where a bitmap index scan that hit almost the entire table beat out a seqscan. I don't think there was any further discussion and I'm still mystified as to how it's possible. > Another idea would be to look at an efficient way to measure how long it > actually takes to pull data from the OS. This has been suggested in the > past, but the idea there was to measure every block access, and the concern > was the overhead of the timing calls. But what if we sampled instead? Or, > what if we read multiple blocks at one time in the cases where we knew we'd > need to (seqscan and an index scan needing more than one tuple). Another > option would by an async IO process that is responsible for measuring this > stuff; I believe some people have played with async IO and gotten good > results. > > Of course, on dtrace platforms we could just plug into dtrace... > >> You might also run into >> problems with relations that have "hot" segments that are accessed >> frequently and stay cached, and "cold" segments that are never >> touched: if 20% of the relation is in cache, but that's the only 20% >> of the relation we ever access, then our hit rate will be 100% rather >> than 20%. > > Yes, but that would be accurate :) No, we'd predict the hit rate to be 20%, but the real hit rate would be 100%. > In reality, I think we need to re-visit the idea of evaluating how close a > chosen query plan is matching reality as we're running. If we thought we'd > be seeing a 100% hit rate but in reality it's much lower we could re-plan > (of course this probably only makes sense for queries that take many > seconds). I don't think it's going to be very practical to re-plan the query in its entirety, because then you'd have to somehow undo all of the work you'd done thus far (including side effects, if any), which might not be possible and certainly isn't easy. What might be practical is to bail out of a nested loop that turns out to iterate more times than expected and hash the inner rel, or even sort the remaining portion of the outer rel and the entire inner rel and then merge-join them. The problem is that these sorts of changes can alter the order in which results are generated, and if the parent node is something like a merge-join that needs the results to be ordered in a particular way, then you've got a problem. Furthermore, it's not easy to decide when to switch strategies. If you decide to switch to a hash join just prior to what would have been the last iteration of the nested loop, you lose. I'm interested to know whether anyone else shares my belief that nested loops are the cause of most really bad plans. What usually happens to me is that the planner develops some unwarranted optimism about the number of rows likely to be generated by the outer side of the join and decides that it's not worth sorting the inner side or building a hash table or using an index, and that the right thing to do is just rescan the inner node on every pass. When the outer side returns three or four orders of magnitude more results than expected, ka-pow! Another approach to this problem might be to try to make the planner a little more cautious about choosing nested loops in the first place. Given a column a with the values 1 .. 10^6, the planner estimates the number of rows for a = X as 1, a in (X1..Xn) as n, a not in (X1..Xn) AS 10^6-n, and a < X for all X < 100 as 100. These are all pretty reasonable estimates (one could hope to get a different result for a < 5 than a < 100). But as soon as you use some operation that the planner knows nothing about, the guesses get really bad: CREATE TABLE big (id serial, x text); INSERT INTO big (x) SELECT random() FROM generate_series(1,1000000); ANALYZE; EXPLAIN SELECT * FROM big WHERE id % 2 = 0 AND (id + 0) % 2 = 0 AND (id - 0) % 2 = 0; QUERY PLAN ------------------------------------------------------------------------------Seq Scan on big (cost=0.00..36375.00 rows=1width=22) Filter: (((id % 2) = 0) AND (((id + 0) % 2) = 0) AND (((id - 0) % 2) = 0)) The fact that the selectivity of an unknown expression is arbitrarily set to 0.005 is a defensible choice that doesn't happen to match my experience, but the idea that with three unknown expressions the selectivity is now (0.005)^3 = 0.000000125 seems hard to justify. That's a sufficiently small selectivity that it will often result in nestloop plans, and they tend not to be good when the real selectivity turns out to be, say, 0.1. >> But even a primitive algorithm would probably be a lot better than >> what we have now. I'm guessing that there are a lot of databases where >> either the whole database fits in cache, or a decent chunk of >> relatively small core relations fit in cache and then there are some >> big or infrequently-used ones that don't. ...Robert
В списке pgsql-hackers по дате отправления:
Следующее
От: Kris JurkaДата:
Сообщение: Re: [Pljava-dev] Re: Should creating a new base type require superuser status?