|  | 
| 345 | 345 | Received: from renoir.op.net ([email protected]  [209.152.193.4]) | 
| 346 | 346 | 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087 | 
| 347 | 347 | 	for <[email protected] >; Tue, 19 Oct 1999 10:31:08 -0400 (EDT) | 
| 348 |  | -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14  $) with ESMTP id KAA27535 for <[email protected] >; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) | 
|  | 348 | +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15  $) with ESMTP id KAA27535 for <[email protected] >; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) | 
| 349 | 349 | Received: from localhost (majordom@localhost) | 
| 350 | 350 | 	by hub.org (8.9.3/8.9.3) with SMTP id KAA30328; | 
| 351 | 351 | 	Tue, 19 Oct 1999 10:12:10 -0400 (EDT) | 
|  | 
| 454 | 454 | Received: from renoir.op.net ([email protected]  [209.152.193.4]) | 
| 455 | 455 | 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130 | 
| 456 | 456 | 	for <[email protected] >; Tue, 19 Oct 1999 21:25:26 -0400 (EDT) | 
| 457 |  | -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14  $) with ESMTP id VAA10512 for <[email protected] >; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) | 
|  | 457 | +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15  $) with ESMTP id VAA10512 for <[email protected] >; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) | 
| 458 | 458 | Received: from localhost (majordom@localhost) | 
| 459 | 459 | 	by hub.org (8.9.3/8.9.3) with SMTP id VAA50745; | 
| 460 | 460 | 	Tue, 19 Oct 1999 21:07:23 -0400 (EDT) | 
|  | 
| 1006 | 1006 | Received: from renoir.op.net ([email protected]  [207.29.195.4]) | 
| 1007 | 1007 | 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165 | 
| 1008 | 1008 | 	for <[email protected] >; Fri, 16 Jun 2000 17:31:01 -0400 (EDT) | 
| 1009 |  | -Received: from hub.org ([email protected]  [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14  $) with ESMTP id RAA13110 for <[email protected] >; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) | 
|  | 1009 | +Received: from hub.org ([email protected]  [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15  $) with ESMTP id RAA13110 for <[email protected] >; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) | 
| 1010 | 1010 | Received: from hub.org (majordom@localhost [127.0.0.1]) | 
| 1011 | 1011 | 	by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477; | 
| 1012 | 1012 | 	Fri, 16 Jun 2000 17:13:36 -0400 (EDT) | 
| @@ -3032,3 +3032,133 @@ Curt Sampson  <[email protected] >   +81 90 7737 2974   http://www.netbsd.org | 
| 3032 | 3032 |     Don't you know, in this new Dark Age, we're all light.  --XTC | 
| 3033 | 3033 | 
 | 
| 3034 | 3034 | 
 | 
|  | 3035 | +From [email protected]  Wed Apr 24 23:19:23 2002 | 
|  | 3036 | + | 
|  | 3037 | +Received: from angelic.cynic.net ([202.232.117.21]) | 
|  | 3038 | +	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g3P3JM414917 | 
|  | 3039 | +	for <[email protected] >; Wed, 24 Apr 2002 23:19:22 -0400 (EDT) | 
|  | 3040 | +Received: from localhost (localhost [127.0.0.1]) | 
|  | 3041 | +	by angelic.cynic.net (Postfix) with ESMTP | 
|  | 3042 | +	id 1F36F870E; Thu, 25 Apr 2002 12:19:14 +0900 (JST) | 
|  | 3043 | +Date: Thu, 25 Apr 2002 12:19:14 +0900 (JST) | 
|  | 3044 | +From: Curt Sampson <[email protected] > | 
|  | 3045 | +To: Bruce Momjian <[email protected] > | 
|  | 3046 | +cc: PostgreSQL-development <[email protected] > | 
|  | 3047 | +Subject: Re: Sequential Scan Read-Ahead | 
|  | 3048 | + | 
|  | 3049 | + | 
|  | 3050 | +MIME-Version: 1.0 | 
|  | 3051 | +Content-Type: TEXT/PLAIN; charset=US-ASCII | 
|  | 3052 | +Status: OR | 
|  | 3053 | + | 
|  | 3054 | +On Wed, 24 Apr 2002, Bruce Momjian wrote: | 
|  | 3055 | + | 
|  | 3056 | +> >     1. Not all systems do readahead. | 
|  | 3057 | +> | 
|  | 3058 | +> If they don't, that isn't our problem.  We expect it to be there, and if | 
|  | 3059 | +> it isn't, the vendor/kernel is at fault. | 
|  | 3060 | + | 
|  | 3061 | +It is your problem when another database kicks Postgres' ass | 
|  | 3062 | +performance-wise. | 
|  | 3063 | + | 
|  | 3064 | +And at that point, *you're* at fault. You're the one who's knowingly | 
|  | 3065 | +decided to do things inefficiently. | 
|  | 3066 | + | 
|  | 3067 | +Sorry if this sounds harsh, but this, "Oh, someone else is to blame" | 
|  | 3068 | +attitude gets me steamed. It's one thing to say, "We don't support | 
|  | 3069 | +this." That's fine; there are often good reasons for that. It's a | 
|  | 3070 | +completely different thing to say, "It's an unrelated entity's fault we | 
|  | 3071 | +don't support this." | 
|  | 3072 | + | 
|  | 3073 | +At any rate, relying on the kernel to guess how to optimise for | 
|  | 3074 | +the workload will never work as well as well as the software that | 
|  | 3075 | +knows the workload doing the optimization. | 
|  | 3076 | + | 
|  | 3077 | +The lack of support thing is no joke. Sure, lots of systems nowadays | 
|  | 3078 | +support unified buffer cache and read-ahead. But how many, besides | 
|  | 3079 | +Solaris, support free-behind, which is also very important to avoid | 
|  | 3080 | +blowing out your buffer cache when doing sequential reads? And who | 
|  | 3081 | +at all supports read-ahead for reverse scans? (Or does Postgres | 
|  | 3082 | +not do those, anyway? I can see the support is there.) | 
|  | 3083 | + | 
|  | 3084 | +And even when the facilities are there, you create problems by | 
|  | 3085 | +using them.  Look at the OS buffer cache, for example. Not only do | 
|  | 3086 | +we lose efficiency by using two layers of caching, but (as people | 
|  | 3087 | +have pointed out recently on the lists), the optimizer can't even | 
|  | 3088 | +know how much or what is being cached, and thus can't make decisions | 
|  | 3089 | +based on that. | 
|  | 3090 | + | 
|  | 3091 | +> Yes, seek() in file will turn off read-ahead.  Grabbing bigger chunks | 
|  | 3092 | +> would help here, but if you have two people already reading from the | 
|  | 3093 | +> same file, grabbing bigger chunks of the file may not be optimal. | 
|  | 3094 | + | 
|  | 3095 | +Grabbing bigger chunks is always optimal, AFICT, if they're not | 
|  | 3096 | +*too* big and you use the data. A single 64K read takes very little | 
|  | 3097 | +longer than a single 8K read. | 
|  | 3098 | + | 
|  | 3099 | +> >     3. Even when the read-ahead does occur, you're still doing more | 
|  | 3100 | +> >     syscalls, and thus more expensive kernel/userland transitions, than | 
|  | 3101 | +> >     you have to. | 
|  | 3102 | +> | 
|  | 3103 | +> I would guess the performance impact is minimal. | 
|  | 3104 | + | 
|  | 3105 | +If it were minimal, people wouldn't work so hard to build multi-level | 
|  | 3106 | +thread systems, where multiple userland threads are scheduled on | 
|  | 3107 | +top of kernel threads. | 
|  | 3108 | + | 
|  | 3109 | +However, it does depend on how much CPU your particular application | 
|  | 3110 | +is using. You may have it to spare. | 
|  | 3111 | + | 
|  | 3112 | +> 	http://candle.pha.pa.us/mhonarc/todo.detail/performance/msg00009.html | 
|  | 3113 | + | 
|  | 3114 | +Well, this message has some points in it that I feel are just incorrect. | 
|  | 3115 | + | 
|  | 3116 | +    1. It is *not* true that you have no idea where data is when | 
|  | 3117 | +    using a storage array or other similar system. While you | 
|  | 3118 | +    certainly ought not worry about things such as head positions | 
|  | 3119 | +    and so on, it's been a given for a long, long time that two | 
|  | 3120 | +    blocks that have close index numbers are going to be close | 
|  | 3121 | +    together in physical storage. | 
|  | 3122 | + | 
|  | 3123 | +    2. Raw devices are quite standard across Unix systems (except | 
|  | 3124 | +    in the unfortunate case of Linux, which I think has been | 
|  | 3125 | +    remedied, hasn't it?). They're very portable, and have just as | 
|  | 3126 | +    well--if not better--defined write semantics as a filesystem. | 
|  | 3127 | + | 
|  | 3128 | +    3. My observations of OS performance tuning over the past six | 
|  | 3129 | +    or eight years contradict the statement, "There's a considerable | 
|  | 3130 | +    cost in complexity and code in using "raw" storage too, and | 
|  | 3131 | +    it's not a one off cost: as the technologies change, the "fast" | 
|  | 3132 | +    way to do things will change and the code will have to be | 
|  | 3133 | +    updated to match." While optimizations have been removed over | 
|  | 3134 | +    the years the basic optimizations (order reads by block number, | 
|  | 3135 | +    do larger reads rather than smaller, cache the data) have | 
|  | 3136 | +    remained unchanged for a long, long time. | 
|  | 3137 | + | 
|  | 3138 | +    4. "Better to leave this to the OS vendor where possible, and | 
|  | 3139 | +    take advantage of the tuning they do." Well, sorry guys, but | 
|  | 3140 | +    have a look at the tuning they do. It hasn't changed in years, | 
|  | 3141 | +    except to remove now-unnecessary complexity realated to really, | 
|  | 3142 | +    really old and slow disk devices, and to add a few thing that | 
|  | 3143 | +    guess workload but still do a worse job than if the workload | 
|  | 3144 | +    generator just did its own optimisations in the first place. | 
|  | 3145 | + | 
|  | 3146 | +> 	http://candle.pha.pa.us/mhonarc/todo.detail/optimizer/msg00011.html | 
|  | 3147 | + | 
|  | 3148 | +Well, this one, with statements like "Postgres does have control | 
|  | 3149 | +over its buffer cache," I don't know what to say. You can interpret | 
|  | 3150 | +the statement however you like, but in the end Postgres very little | 
|  | 3151 | +control at all over how data is moved between memory and disk. | 
|  | 3152 | + | 
|  | 3153 | +BTW, please don't take me as saying that all control over physical | 
|  | 3154 | +IO should be done by Postgres. I just think that Posgres could do | 
|  | 3155 | +a better job of managing data transfer between disk and memory than | 
|  | 3156 | +the OS can. The rest of the things (using raw paritions, read-ahead, | 
|  | 3157 | +free-behind, etc.) just drop out of that one idea. | 
|  | 3158 | + | 
|  | 3159 | +cjs | 
|  | 3160 | +--  | 
|  | 3161 | +Curt Sampson  <[email protected] >   +81 90 7737 2974   http://www.netbsd.org | 
|  | 3162 | +    Don't you know, in this new Dark Age, we're all light.  --XTC | 
|  | 3163 | + | 
|  | 3164 | + | 
0 commit comments