Harry Read Me
Harry Read Me
/cru/dpe1a/f014
/cru/tyn1/f014
Nearly 11,000 files! And about a dozen assorted 'read me' files addressing
fromdpe1a/data/stnmon/doc/oldmethod/f90_READ_ME.txt
fromdpe1a/code/linux/cruts/_READ_ME.txt
fromdpe1a/code/idl/pro/README_GRIDDING.txt
(yes, they all have different name formats, and yes, one does begin '_'!)
tmean:
fromdpe1a/data/cruts/database/+norm/tmp.0311051552.dtb
fromdpe1a/data/cruts/database/+norm/tmp.0311051552.dts
(yes.. that is a directory beginning with '+'!)
in the '_READ_ME.txt' file). Had to make some changes to allow for the
move back to alphas (different field length from the 'wc -l' command).
file system (all path widths were 80ch, have been globally changed to 160ch)
seem to like files being in any directory other than the current one!!
straight to GRIM. This will include non-land cells but for comparison
purposes that shouldn't be a big problem... [edit] noo, that's not gonna
work either, it asks for a 'template grim filepath', no idea what it wants
(as usual) and a serach for files with 'grim' or 'template' in them does
not bear useful fruit. As per usual. Giving up on this approach altogether.
7. Removed 4-line header from a couple of .glo files and loaded them into
Matlab. Reshaped to 360r x 720c and plotted; looks OK for global temp
(anomalies) data. Deduce that .glo files, after the header, contain data
This should allow us to deduce the meaning of the co-ordinate pairs used to
describe each cell in a .grim file (we know the first number is the lon or
column, the second the lat or row - but which way up are the latitudes? And
There is another problem: the values are anomalies, wheras the 'public'
.grim files are actual values. So Tim's explanations (in _READ_ME.txt) are
incorrect..
did include normals lines at the start of every station. How handy - naming
two different files with exactly the same name and relying on their location
crua6[/cru/cruts/rerun1/data/cruts/rerun1work] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
.tmp
tmp.0311051552.dtb
1961,1990
25
rr2.txt
1901,2002
> Operating...
crua6[/cru/cruts/rerun1/data/cruts/rerun1work]
IDL>
quick_interp_tdm2,1901,2002,'rr2glofiles/rr2grid.',1200,gs=0.5,dumpglo='dumpglo',pts_
prefix='rr2txtfiles/rr2.'
Defaults set
1901
% Compiled module: MAP_SET.
1902
(etc)
2002
IDL>
only.
palpable sense of relief pervades the office :-) It's also the
earlier??
uealogin1[/cru/cruts/rerun1/data/cruts/rerun1work] ./grimcmp
Files compared:
1. cru_ts_2_10.1961-1970.tmp
2. glo2grim1.out
theory.
only reading as much of each glo file as was needed was really
We are now on-beam and initial results are very very promising:
uealogin1[/cru/cruts/rerun1/data/cruts/rerun1work] ./grimcmp3x
Files compared:
1. cru_ts_2_10.1961-1970.tmp
2. glo2grim3.out
..so all correlations are >= 0.9 and all but one are >=0.96!
say we are producing the data Tim produced. The variations can
16. So, it seemed like a good time to start a Precip run. With
ho, ho, ho. The first problem was that anomdtb kept crashing:
crua6[/cru/cruts/rerun1/data/cruts/rerun2work] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.pre
> Will calculate percentage anomalies.
pre.0312031600.dtb
1961,1990
25
rr2pre.txt
1901,2002
> Operating...
crua6[/cru/cruts/rerun1/data/cruts/rerun2work]
code is not 'good' enough for bloody Sun!! Pages of warnings and
..so the data value is unbfeasibly large, but why does the
-400002 3513 3672 309 HAMA SYRIA 1985 2002 -999 -999
6190 842 479 3485 339 170 135 106 0 9 243 387 737
1988 1044 769 797 399 11 903 218 0 0 163 517 1181
1989 269 62 293 3 13 0 0 0 0 101 292 342
18. Ran the IDL gridding routine for the precip files:
quick_interp_tdm2,1901,2002,'rr2preglofiles/rr2pregrid.',450,gs=0.5,dumpglo='dumpglo',
pts_prefix='rr2pretxtfiles/rr2pre.'
IDL>
quick_interp_tdm2,1901,1910,'rr2glofiles2/rr2grid.',1200,gs=0.5,dumpglo='dumpglo',pts
_prefix='rr2txtfiles/rr2.'
% Syntax error.
lim=glimit(/all)
% Syntax error.
r=area_grid(pts2(n,1),pts2(n,0),pts2(n,2),gs*2.0,bounds,dist,angular=angular)
^
% Syntax error.
IDL>
.. WHAT?! Now it's not precompiling its functions for some reason!
Eventually (the following day) I found glimit and area_grid, they are
have no idea why they're not compiling! I manually compiled them with
IDL>
quick_interp_tdm2,1901,1910,'rr2glofiles2/rr2grid.',1200,gs=0.5,dumpglo='dumpglo',pts
_prefix='rr2txtfiles/rr2.'
1901
% $MAIN$
IDL>
IDL>
quick_interp_tdm2,1901,1910,'rr2glofiles2/rr2grid.',1200,gs=0.5,dumpglo='dumpglo',pts
_prefix='rr2txtfiles/rr2.'
Defaults set
1901
% QUICK_INTERP_TDM2 215
/cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
% $MAIN$
IDL>
..so it looks like a path problem. I wondered if the NFS errors that have
been plagueing crua6 work for some time now might have prevented IDL from
adding the correct directories to the path? After all the help file does
mention that IDL discards any path entries that are inaccessible.. so if
the timeout is a few seconds that would explain it. So I restarted IDL,
and PRESTO! It worked. I then tried the precip veriosn - and it worked
too!
IDL>
quick_interp_tdm2,1901,2002,'rr2preglofiles/rr2pregrid.',450,gs=0.5,dumpglo='dumpglo',
pts_prefix='rr2pretxtfiles/rr2pre.'
Defaults set
1901
1902
(etc)
2001
2002
IDL>
However..
contained a fatal data error (see 17. above), then surely it has been
altered since Tim last used it to produce the precipitation grids? But
if that's the case, why is it dated so early? Here are the dates:
/cru/dpe1a/f014/data/cruts/database/+norm/pre.0312031600.dtb
/cru/tyn1/f014/ftpfudge/data/cru_ts_2.10/data_dec/cru_ts_2_10.1961-1970.pre.Z
- directory date is 22 Jan 2004 (original date not preserved in zipped file)
So what's going on? I don't see how the 'final' precip file can have been
produced from the 'final' precipitation database, even though the dates
imply that. The obvious conclusion is that the precip file must have been
produced before 23 Dec 2003, and then redated (to match others?) in Jan 04.
20. Secondary Variables - Eeeeeek!! Yes the time has come to attack what even
Tim seems to have been unhappy about (reading between the lines). To assist
me I have 12 lines in the gridding ReadMe file.. so par for the course.
frs_gts_tdm.pro
rd0_gts_tdm.pro
vap_gts_anom.pro
In other words, the *anom.pro scripts are much more recent than the *tdm
scripts. There is no way of knowing which Tim used to produce the current
public files. The scripts differ internally but - you guessed it! - the
descriptions at the start are identical. WHAT IS GOING ON? Given that the
again for tmp and pre, and (for the first time) for dtr. This time, the
IDL>
quick_interp_tdm2,1901,2002,'idlbinout/idlbin',1200,gs=2.5,dumpbin='dumpbin',pts_pref
ix='tmp_txt_4idl/tmp.'
1991
And produces output files (in, in this case, 'idlbinout/'), like this:
These changes rolled back to the quoted command lines, to avoid confusion.
1991
Finally for the primaries, the first stab at dtr. Ran anomdtb with the
Screen output:
IDL>
quick_interp_tdm2,1901,2002,'idlbin_dtr/idlbin_dtr',750,gs=2.5,dumpbin='dumpbin',pts_
prefix='dtr_txt_4idl/dtr.'
1991
And.. at this point, I read the ReadMe file properly. I should be gridding at
2.5 degrees not 0.5 degrees! For some reason, secondary variables are not
derived from the 0.5 degree grids. Re-did all three generations (the sample
command lines and outputs above have been altered to reflect this, to avoid
confusion).
Tried running frs_gts_tdm but it complained it couldn't find the normals file:
IDL>
frs_gts_tdm,dtr_prefix='idlbin_dtr/idlbin_dtr',tmp_prefix='idlbin_tmp/idlbin_tmp',1901,2
002,outprefix='syngrid_frs/syngrid_frs'
IDL>
frs_gts,dtr_prefix='idlbin_dtr/idlbin_dtr',tmp_prefix='idlbin_tmp/idlbin_tmp',1901,2002,o
utprefix='syngrid_frs/syngrid_frs'
% FRS_GTS 18 /cru/cruts/fromdpe1a/code/idl/pro/frs_gts_tdm.pro
% $MAIN$
IDL>
/cru/cruts/fromdpe1a/data/grid/twohalf/glo25.frs.6190
..and altered the IDL prog to read it.. same error! Turns out it's preferring
IDL>
frs_gts,dtr_prefix='idlbin_dtr/idlbin_dtr',tmp_prefix='idlbin_tmp/idlbin_tmp',1901,2002,o
utprefix='syngrid_frs/syngrid_frs'
yes
% FRS_GTS 21 /cru/cruts/fromdpe1a/code/idl/pro/frs_gts_tdm.pro
% $MAIN$
IDL>
So what is this mysterious variable 'nf' that isn't being set? Well strangely,
it's in Mark N's 'rdbin.pro'. I say strangely because this is a generic prog
that's used all over the place! Nonetheless it does have what certainly looks
like a bug:
39 info=fstat(lun)
42 nlat=sqrt(info.size/48.0)
43 gridsize=180.0/nlat
55
defxyz,lon,lat,gridsize,grid=grid,nf=nf,had=had,echam=echam,gfdl=gfdl,ccm=ccm,csiro
=csiro
57 grid=fix(grid)
58 ;read data
59 readu,lun,grid
60 close,lun
61 spawn,string('rm -f ',fff)
63 openr,lun,fname
64 ; check file size and work out grid spacing if gridsize isn't set
66 info=fstat(lun)
69 gridsize=180.0/nlat
72 endif
In other words, 'nf' is set in the first conditional set of statements, but in
(set #73,#74; used #68). So I shifted #73 and #74 to between #64 and #65, and..
Er, perhaps.
Lots of screen output, and lots of files. A set of synthetic grids in 'syngrid_frs/' as
requested, typically:
..but also a set of some binariy files in the working directory! They look like this:
Having read the program it looks as though the latter files are absolutes,
whereas the former are anomalies. With this in mind, they are renamed:
Then - a real setback. Looked for a database file for frost.. nothing. Is
this a real secondary parameter? Answer: yes. Further digging revealed that
as usual, but it does seem to avoid the use of the 'pts_prefix' option.. so
I set it, and it at least *ran* for the full term (though very slow compared
to primary variables)!
IDL>
quick_interp_tdm2,1901,2002,'glo_frs_grids/frs.grid.',750,gs=0.5,dumpglo='dumpglo',no
stn=1,synth_prefix='syngrid_frs/syngrid_frs'
It does produce output grids. Without converting to absolutes with the normals file,
Then, I moved on to rd0 (wet-day frequency). This time, when I searched for the
normals files required ('glo.pre.norm' and 'glo.rd0.norm'), I could not (as before)
find exact matches. The difference this time is that the program checks that the
./gts/cld/glo/glo.cld.norm.Z
./gts/dtr/glo_old/glo.dtr.norm.Z
./gts/frs/glo.frs.norm.Z
./gts/frs/glo/glo.frs.norm.Z
./gts/pre/glo_quick_abs/glo.pre.norm.Z
./gts/pre/glo_quick_log/glo.pre.norm.Z
./gts/pre/glo_spl/glo.pre.norm.Z
./gts/rad/glo/glo.rad.norm.Z
./gts/rd0/glo/glo.rd0.norm.Z
./gts/rd0/glo_old/glo.rd0.norm.Z
./gts/sunp/glo/glo.sunp.norm
./gts/sunp/means/glo.sunp.norm.Z
./gts/tmp/glo/glo.tmp.norm.Z
./gts/tmp/glo_old/glo.tmp.norm.Z
find: cannot open < ./gts/tmp/station_list >
./gts/vap/glo/glo.vap.norm.Z
./gts/wnd/glo/glo.wnd.norm.Z
cp /cru/mark1/f080/gts/frs/glo.frs.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/cld/glo/glo.cld.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/dtr/glo_old/glo.dtr.norm.Z
/cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/pre/glo_quick_log/glo.pre.norm.Z
/cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/rad/glo/glo.rad.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/rd0/glo/glo.rd0.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
There were two 'sunp' norm files, but one was 0 bytes in length.
cp /cru/mark1/f080/gts/sunp/means/glo.sunp.norm.Z
/cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/tmp/glo/glo.tmp.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/vap/glo/glo.vap.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/wnd/glo/glo.wnd.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
The synthetics generation was then re-run for frs (records above have
been modified to reflect this).
IDL>
rd0_gts,1901,2002,1961,1990,outprefix="syngrid_rd0/syngrid_rd0",pre_prefix="idlbin_p
re/idlbin_pre"
2001
yes
filesize= 248832
gridsize= 2.50000
2002
yes
filesize= 248832
gridsize= 2.50000
IDL>
However, all synthetic grids appear to have been written OK, including 2002.
Grid generation proceeded without error:
IDL>
quick_interp_tdm2,1901,2002,'glo_rd0_grids/rd0.grid.',450,gs=0.5,dumpglo='dumpglo',n
ostn=1,synth_prefix='syngrid_rd0/syngrid_rd0'
Onto vapour pressure, and the crunch. For here, the recommended program for
; ** temp and dtr monthly anomalies on 2.5deg grid, including normal period
So, we face a situation where some synthetics are built with 0.5-degree
normals, and others are built with 2.5-degree normals. I can find no
documentation of this. There are '*_anom.pro' versions of the frs and rd0
programs, both of which use 2.5-degree normals, however they are dated
Jan 2004, and Tim's Read_Me (which refers to the '*_tdm.pro' 0.5-degree
versions) is dated end March 2004, so we have to assume these are his
best suggestions.
The 2.5 normals are found here:
> ls -l /cru/cruts/fromdpe1a/data/grid/twohalf/
total 1248
readme.txt:
(end)
IDL>
vap_gts_anom,dtr_prefix='idlbin_dtr/idlbin_dtr',tmp_prefix='idlbin_tmp/idlbin_tmp',190
1,2002,outprefix='syngrid_vap/syngrid_vap',dumpbin=1
Producing screen output like this:
On, without further ado, to the gridding. For this secondary, there *are* database
files, so the 'nostn' option is not used, and anomdtb.f is wheeled out again
crua6[/cru/cruts/rerun1/data/cruts/rerun_vap] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.vap
vap.0311181410.dtb
1961,1990
25
vap.txt
1901,2002
> Operating...
IDL>
quick_interp_tdm2,1901,2002,'glo_vap_grids/vap.grid.',1000,gs=0.5,dumpglo='dumpglo'
,synth_prefix='syngrid_vap/syngrid_vap',pts_prefix='../rerun_vap/vap_txt_4idl/vap.'
Defaults set
1901
1902
% QUICK_INTERP_TDM2 88
/cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
% $MAIN$
IDL>
This turns out to be because of the sparcity of VAP station measurements in the
early years. The program cannot handle anom files of 0 length, even though it
checks the length! Bizarre. The culprit is 'vap.1902.03.txt', the only month to
have no station reading at all (45 months have only 1 however). I decided to mod
the program to use the 'nostn' option if the length is 0. Hope that's right - the
synthetics are read in first and the station data is added to that grid so this
Defaults set
1901
1902
1903
(..etc..)
mentions frs, rd0 (which I'm assuming == wet) and vap. How, then, do I
also:
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspc.pro
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspcann.pro
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspcann9196.pro
Loading just the first program opens up another huge can o' worms. The
pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; output can then be used as dummy input to splining program that also
So, to me this identifies it as the program we cannot use any more because
lost the coefficients file and never found it again (despite searching on tape
archives at UEA) and never recreated it. This hasn't mattered too much, because
the synthetic cloud grids had not been discarded for 1901-95, and after 1995
But, (Lord how many times have I used 'however' or 'but' in this file?!!), when
you look in the program you find that the coefficient files are called:
rdbin,a,'/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/a.25.7190',gridsize=2.5
rdbin,b,'/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/b.25.7190',gridsize=2.5
crua6[/cru/cruts] ls
fromdpe1a/data/grid/cru_ts_2.0/_makecld/_constants/_7190/spc2cld/_ann/
crua6[/cru/cruts] ls
fromdpe1a/data/grid/cru_ts_2.0/_makecld/_constants/_7190/spc2cld/_mon/
So.. we don't have the coefficients files (just .eps plots of something). But
what are all those monthly files? DON'T KNOW, UNDOCUMENTED. Wherever I look,
there are data files, no info about what they are other than their names. And
that's useless.. take the above example, the filenames in the _mon and _ann
directories are identical, but the contents are not. And the only difference
is that one directory is apparently 'monthly' and the other 'annual' - yet
pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; creates cld anomaly grids at dtr grid resolution
; output can then be used as dummy input to splining program that also
; mean_gts,'~/m1/gts/dtr/glo25/glo25.dtr.',nor1,nor2
; mean_gts_tdm,'/cru/mark1/f080/gts/dtr/glo25/glo25.dtr.',nor1,nor2
;; rdbin,dtrnor,'~/m1/gts/dtr/glo25/glo25.dtr.'+string(nor1-1900,nor2-1900,form='(2i2.2)')
;dtrnorstr='/cru/mark1/f080/gts/dtr/glo25/glo25.dtr.'+string(nor1-1900,nor2-
1900,form='(2i2.2)')
;rdbin,dtrnor,dtrnorstr
rdbin,a,'/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/a.25.7190',gridsize=2.5
rdbin,b,'/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/b.25.7190',gridsize=2.5
These are the files that have been lost according to the gridding read_me
(see above).
for 1901-1995 have now been discarded. This means that the cloud data prior
to 1996 are static.
Edit: have just located a 'cld' directory in Mark New's disk, containing
For 1901 to 1995 - stay with published data. No clear way to replicate
process as undocumented.
to cloud percentage! So what the hell did Tim do?!! As I keep asking.
result is in oktas*10 and ranges from 0 to 80, so the new result will
Next problem - which database to use? The one with the normals included
is not appropriate (the conversion progs do not look for that line so
spc.0312221624.dtb
spc.94-00.0312221624.dtb
I find that they are broadly similar, except the normals lines (which
both start with '6190') are very different. I was expecting that maybe
the latter contained 94-00 normals, what I wasn't expecting was that
thet are in % x10 not %! Unbelievable - even here the conventions have
not been followed. It's botch after botch after botch. Modified the
hopefully has some of the 94-00 normals in. I just wish I knew more.
Conversion was hampered by the discovery that some stations have a mix
crua6[/cru/cruts/rerun1/data/cruts/rerun_cld] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.cld
cldfromspc.94000312221624.dtb
1994,2000
25
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
cldfromspc.txt
1994,2002
> Operating...
crua6[/cru/cruts/rerun1/data/cruts/rerun_cld]
IDL>
quick_interp_tdm2,1994,2002,'glo_from_idl/cld.',600,gs=0.5,pts_prefix='txt_4_idl/cldfro
mspc.',dumpglo='dumpglo'
Defaults set
1994
1995
1996
1997
1998
1999
2000
2001
2002
IDL>
IDL>
quick_interp_tdm2,1901,2002,'glo_dtr_grids/dtr.',750,gs=0.5,pts_prefix='dtr_txt_4idl/dtr.
',dumpglo='dumpglo'
That went off without any apparent hitches, so I wrote a fortran prog,
'maxminmaker.for', to produce tmn and tmx grids from tmp and dtr. It ran.
However - yup, more problems - when I checked the inputs and outputs I found
that in numerous instances there was a value for mean temperature in the grid,
with no corresponding dtr value. This led to tmn = tmx = tmp for thos cells.
NOT GOOD.
Actually, what was NOT GOOD was my grasp of context. Oh curse this poor
memory! For the IDL gridding program produces ANOMALIES not ACTUALS.
gridded and with headers). After some experiments realised that the .glo
anomalies are in degrees, but the normals are in 10ths of a degree :-)
comparison simply measures absolute differences between old and new, and
categorises as either (1) identical, (2) within 0.5 degs, (3) within 1 deg,
These are very promising. The vast majority in both cases are within 0.5
degrees of the published data. However, there are still plenty of values
TMP:
DTR:
DTR fares perhaps even better, over half are spot-on, though about
I tried using the 'stn' option of anomdtb.for. Not completely sure what
crua6[/cru/cruts/rerun1/data/cruts/rerun_pre] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
.pre
pre.0312031600H.dtb
1961,1990
25
pre.fromanomdtb.stn
450
cru_ts_2_10.1961-1970.pre
1901,2002
> Operating...
crua6[/cru/cruts/rerun1/data/cruts/rerun_pre]
22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
Firstly, we need to identify the updated data files. I acquired the following:
iran_asean_GHCN_WWR-CD_save50_CLIMAT_MCDW_updat_merged renamed to
pre.0611301502.dat
Next step, convert the various db formats to the CRU TS one. Made a visual
losing the 'extra' fields that have been tacked onto the headers willy-nilly
as they are undocumented. Furthermore the two extra fields in the CRU TS
mandatory blank spaces, and for variations in the two extra fields. Sample
Produced by headgetter.for
position missed
8 0
14 0
21 0
26 0
47 0
61 0
66 0
71 0
78 2
Unidentifiable 0
Unidentifiable 0
ENDS
Produced by headgetter.for
position missed
8 0
14 0
21 0
26 0
47 0
61 0
66 0
71 0
78 154
Unidentifiable 0
Unidentifiable 0
ENDS
a few violations of the boundary between the two extra fields, particularly
641080 -330 1735 324 BANDUNDU DEM REP CONGO 1961 1990 -99908
642200 -436 1525 445 KINSHASA/BINZA DEM REP CONGO 1960 1990
-99920
So the first extra field is apparently unused! It would be a handy place for
On to a more detailed look at the cru precip format; not sure whether there
are two extra fields or one, and what the sizes are. A quick hack through
the headers is not pleasing. There appears to be only one field, but it can
have up to nine (9) digits in it, and at least three missing value codes:
*unimpressed*
This is irritating as it means precip has only 9 fields and I can't do a
like two extra fields (i6,i7) with mvcs of 999999 and 8888888 respectively.
Isn't that marvellous? These can't even be read with a consistent header format!
So, the approach will be to read exactly ONE extra field. For cru tmp that
will be the i2+i4 anders/best-start codes as one. For cru pre it will be
the amazing multipurpose, multilength field. For cru tmnx it will be the
Conversions/corrections performed:
Temperature
Converted tmp.0611301507.dat to tmp.0612081033.dat
BEFORE
911900 209 1564 20 HI*KAHULUI WSO (PUU NENE) 1954 1990 101954
-999.00
AFTER
Precipitation
BEFORE
AFTER
(DL later reported that the name wasintended to signify that the data had been
corrected by a factor of 0.9 when data from another station was incorporated
Started work on mergedb.for, which should merge a primary database with and incoming
database of the same (CRU TS) format. Quite complicated. No operator interventions,
just a log file of failed attempts - but hooks left in for op sections in case this
23. Interrupted work on mergedb.for in order to trial a precip gridding for 3.0. This
required another new proglet, addnormline.for, which adds a normals line below each
header. It fills in the normals values if the condisions are met (75% of values, or
Initial results promising.. ran it for precip, it added normals lines OK, a total of
15942 with 6003 missing value lines. No errors, and no ops interventions because the
Tried running anomdtb.f90.. failed because it couldn't find the .dts file! No matter
the .dtb file, all missing values are retained, all other values are replaced with
Wrote 'falsedts.for' to produce dummy .dts files with all zeros in place of real
Tried running anomdtb.f90 again. This time it crashed at record #1096. Wrote a proglet
'findstn.for' to find the n-th station in a dtb file, pulled out 1096:
6190 2094 2015 2874 3800 4619 3032 5604 3718 4626 5820 5035 3049
1951 3330 2530 2790 5660 4420 4030 1700 2640 8000 5950 6250 2020
1979 110 1920 1150 5490 3140 308067100 2500 4860 4280 4960 1600
Uh-oh! That's 6.7m of rain in July 1979? Looks like a factor-of-10 problem. Confirmed
high values are realistic. However I did notice that the missing value code was -10
instead of -9999! So modified db2dtb.for to fix that and re-produced the precip database
as pre.0612181214.dat. This then had to have normals recalculated for it (after fixing
#1096).
Finally got it through anomdtb.for AND quick_interp_tdm2 - without crashing! IDL was
even
IDL>
quick_interp_tdm2,1901,2006,'preglo/pregrid.',450,gs=0.5,dumpglo='dumpglo',pts_prefix
='preanoms/pre.'
Defaults set
1901
1903
(etc)
2005
2006
All good. Wrote mergegrids.for to create the more-familiar decadal and full-series
Firstly, wrote mmeangrid.for and cmpmgrids.m to get a visual comparison of old and
new precip grids (old being CRU TS 2.10). This showed variations in 'expected' areas
Next, Phil requested some statistical plots of percentage change in annual totals,
and long-term trends. Wrote 'anntots.for' to convert monthly gridded files into
yearly totals files. Then tried to write precipchecker.m to do the rest in Matlab..
it wasn't having it, OUT OF MEMORY! Bah. So wrote 'prestats.for' to calculate the
final stats, for printing with an emasculated precipchecker.m. BUT.. it wouldn't
work, and on investigating I found 200-odd stations with zero precipitation for
the entire 1901-2006 period! Modified anntots.for to dump a single grid with those
Zero cells in North Africa and the Western coast of South America. None in the
Next step, produce a list of cell centres of the offending cells. wrote a quick
file and a list of lat/lon values, extracts all stations lying inside the cells
listed.
Uh-oh. Looked in the new pre db and found 15 stations for 257 zero cells! They are:
6064000 2650 840 559 FORT POLIGNAC ALGERIA 1925 2006 -999
-999.00
6262000 2080 3260 470 STATION NO. 6 SUDAN 1950 1988 -999
-999.00
8541800 -2053 -7018 52 IQUIQUE DIEGO ARACEN CHILE 1989 2006 -999
-999.00
8700494 -707 -7957 150 CAYALTI PERU 1934 1959 -999 -999.00
8700562 -1203 -7703 137 LIMA PERU 1929 1963 -999 -999.00
8700581 -1207 -7717 13 LA PUNTA (NA PERU 1939 1963 -999 -999.00
9932040 2810 670 381 FT FLATTER ALGERIA 1925 1965 -999 -999.00
Looked for the same zero cell stations in the old pre db (pre.0312031600.dtb) and only
found 10:
-603550 2810 670 381 FT FLATTER ALGERIA 1925 1965 -999 -999.00
606400 2650 841 558 ILLIZI/ILLIRANE ALGERIA 1925 2002 -999 -999
626200 2075 3255 468 STATION NO. 6 SUDAN 1950 1988 -999 -999.00
846910 -1375 -7628 7 PISCO (CIV/MIL) PERU 1942 2002 -999 -999
So why does the old db result in no 'zero' cells, and the new db give us over 250? I
wondered if normals might be the answer, but none of the 10 stations from the old db
6190 19 59 36 18 5 0 3 0 0 1 10 5
6190 3 0 3 0 0 1 1 3 1 4 0 0
6190 1 3 0 0 0 2 2 2 2 0 0 0
So these alone ought to guarantee three of the cells being nonzero - they should have
the bloody normals in! So the next check has to be the climatology, that which provides
A check of the gridded climatology revealed that all 257 'zero cells' have their
climatologies set to zero, too. This was partially checked in the GRIM-format
climatology
just in case!
Next, a focus: on CHIMBOTE (see header line above). This has real data (not just zeros).
full timeseries for that cell from the published 2.10 (1901-2002) GRIM file:
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 7 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
2 0 0 0 2 0 2 0 0 5 0 3
2 0 0 0 2 0 0 0 0 5 0 3
0 0 0 0 2 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 2
0 0 0 0 2 5 6 0 0 0 0 3
2 3 0 0 0 0 17 0 0 4 0 3
2 0 0 0 3 0 2 0 0 2 0 3
0 0 0 0 0 0 14 0 0 9 0 0
0 0 0 0 0 0 0 0 0 2 0 2
0 0 0 0 0 0 12 0 0 0 0 5
0 0 0 0 0 0 0 0 0 3 0 2
0 0 0 0 0 0 10 0 0 0 0 2
0 0 0 0 3 0 11 0 0 2 0 3
0 0 0 0 2 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 4 0 0
3 0 0 0 0 0 15 0 0 0 0 2
0 0 0 0 0 0 0 0 0 4 3 2
5 0 0 0 0 0 0 0 0 12 0 3
0 0 2 2 4 2 0 0 2 3 0 3
0 0 0 0 3 0 0 2 0 2 2 3
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 7 0 3 0 0 0 0 0 0
0 0 2 3 0 0 0 4 0 0 12 0
0 0 9 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 6 2 0 0 0 6 0 0 0 0 0
0 0 0 0 0 0 0 2 2 0 0 0
0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 0 2 2 0 0 0 0 0 0
0 2 0 0 0 0 0 2 0 0 0 0
0 0 0 7 0 0 0 2 0 0 0 3
2 0 7 0 0 2 0 0 2 0 0 0
0 0 0 7 0 0 2 2 2 0 0 0
8 0 2 0 0 0 0 2 0 0 0 0
0 7 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 2 0 0
0 0 0 0 2 0 0 0 0 0 0 0
2 0 0 0 0 2 0 0 0 0 10 0
3 0 0 0 0 0 9 0 0 0 0 3
0 0 0 0 0 0 0 0 0 3 0 5
4 0 0 2 10 2 0 0 0 0 0 4
0 0 0 0 0 0 0 0 2 5 0 0
0 0 0 0 0 0 9 0 0 0 0 0
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 0 0 0 5 0 0
3 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 3 0 0 0
0 0 0 0 8 2 0 0 0 0 0 3
0 0 2 0 2 0 0 0 0 0 2 3
0 0 0 0 2 0 2 0 0 5 0 2
0 0 0 0 2 0 2 0 0 0 0 0
0 0 0 0 0 0 0 0 0 5 0 0
0 0 0 0 2 0 0 2 0 0 0 0
2 0 2 0 0 0 0 0 0 5 0 0
0 0 0 0 11 0 2 0 0 4 0 3
2 3 2 0 13 0 0 0 0 0 0 0
2 6 0 3 0 0 0 0 2 3 0 7
2 0 0 0 2 0 0 0 0 0 0 3
0 0 0 0 0 0 2 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 3
Only one thing for it.. examine the attempt at regenerating 2.10.
Update: aha! Phil pointed out that for precip the climatology
crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Enter the path (if any) for the output files: pregrid/
pregrid.01.1901.glo
pregrid.02.1901.glo
(etc)
Decided to read Mitchell & Jones 2005 again. Noticed that the
crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.pre
pre.0612181221.dtb
pre.0612181221.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb
25
pre4sd.txt
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts
made it to here
But the actual number of accepted values is more than TWICE 2.10!
Of course, the same 257 gridcells are zeros, because the multiplicative
For reference, these are the results for the 3 SD limit of 3.00:
made it to here
Read_Me says.
The process now is to read in the header lines AND line numbers from
the main database, and to then process the incoming database one record
at a time. It's more logical and haivng the line numbers will speed
The biggest immediate problem was the loss of an hour's edits to the
well, it compiles OK, and even runs enthusiastically. However there are
(even later)
25. Wahey! It's halfway through April and I'm still working on it. This
surely is the worst project I've ever attempted. Eeeek. I think the main
hadn't had to write it to add the 1991-2006 temperature file to the 'main'
one, it would probably have been a lot simpler. But that one operation has
proved so costly in terms of time, etc that the program has had to bend
brilliant idea to try and kill two birds with one stone - I should have
realised that one of the birds was actually a pterodactyl with a temper
problem.
Success!
crua6[/cru/cruts/version_3_0/db/testmergedb] ./mergedb
**************************************************
* MERGEDB *
* *
* *
* *
* *
* *
* 1. mergedb.0704201343.f098xxxx.act *
* *
* mergedb.0704201210.f098xxxx.ops *
* Actions Completed! *
**************************************************
..well, 'success' in the sense that it ran and apparently all the data's
in the right place, in tmp.0704251819.dtb.
26. OK, now to merge in the US stations. First, wrote 'us2cru' to convert
worked OK. Then used 'addnormline' to, well - add a normals line. Only 17
out of 1035 stations ended up with missing normals, which is pretty good!
Now, I knew that using mergedb as it stands would not work. It expects to
be updating the existing records, and actions like 'addnew' require OPS
confirm additions where there's no WMO match and the data density is OK,
say 50% or higher. Unfortunately, that didn't work either, and rather than
which adds two non-overlapping databases. The resultant file, with all
27. Well, enough excuses - time to remember how to do the anomalising and
gridding things! Fisrtly, ran 'addnormline' just to ensure all normals are
up to date. The result was 8 new sets of normals, so well worth doing. The
database is now:
tmp.0704292158.dtb
Ran 'anomdtb' - got caught out by the requirement for a companion '.dts'
file again, ran 'falsedts.for' and carried on.. would still be nice to be
Output:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/temp] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.tmp
tmp.0704292158.dtb
tmp.0704292158.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb
25
tmp.txt
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts
tmp.0704292158.dts
tmp.0704292158.dts
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts
made it to here
crua6[/cru/cruts/version_3_0/primaries/temp]
<END QUOTE>
.. which is a trifle worrying! And looking at the .txt files, they look
Now, do those first two columns look like lat & lon to you? Me neither,
here's what the old version of the same file looks like:
In fact, the first two columns never get outside of +/- 30. Oh bugger.
The function 'sort' was used to sort the database so that any duplicate
lines would be together - then 'uniq' was used to pull out duplicates.
There were quite a few dupes, and one or two triples too, like these:
725837 408 1158 1549 NV ELKO FAA AP 1930 1990 101930 -999.00
725910 401 1223 103 RED BLUFF USA 1878 2006 101878 -999.00
Looking at the last two.. it seems that 725910 has 725837's data!
1977 71 124 118 184 167 275 283 280 230 190 126 99
1978 107 114 149 144 208 248 289 282 232 220 118 72
1979 85 99 139 150 218 256 282 258 253 189 117 94
1980 99 121 119 156 192 216 275 262 241 196 128 102
725837! It then reverts to the original range for the rest of the run.
So.. did the merging program do this? Unfortunately, yes. Check dates:
crua6[/cru/cruts/version_3_0/db/testmergedb]
The first file is the 1991-2006 update file. The second is the original
It has *inherited* data from the previous station, where it had -9999
/goes off muttering to fix mergedb.for for the five hundredth time
did find the problem. I was clearing the data array but not close enough
to the action - when stations were being passed through (ie no data to
add to them) they were not being cleaned off the array afterwards. Meh.
Wrote a specific routine to clear halves of the data array, and back to
square one. Re-ran the ACT file to merge the x-1990 and 1991-2006 files.
Created an output file exactly the same size as the last time (phew!)
but with..
285516
crua6[/cru/cruts/version_3_0/db/testmergedb] wc -l tmp.0704292355.dtb
285829 tmp.0704292355.dtb
14881,14886c14881,14886
< 1965-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1966-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1967-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1968-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1969-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1970-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
---
> 1965 -221 -177 -234 -182 -5 6 24 36 -15 -91 -100 -221
> 1966 -272 -194 -248 -192 -66 10 27 45 -12 -75 -139 -228
> 1967 -201 -243 -196 -158 -26 1 40 30 -18 -89 -183 -172
> 1968 -253 -256 -253 -107 -42 10 46 33 -21 -64 -134 -195
> 1969 -177 -202 -248 -165 -33 8 42 50 -1 -89 -157 -204
> 1970 -237 -192 -217 -160 -87 6 30 25 -5 -55 -143 -222
ie, what should have been missing data is now missing data again:
200436,200445c200436,200445
< 1981-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1982-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1983-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1984-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
---
> 1982 -49 -14 32 57 114 164 206 214 148 74 11 -23
> 1983 -9 -1 54 59 114 167 204 223 170 104 25 -19
> 1984 -83 -46 22 55 126 154 222 215 159 63 32 -62
> 1985 -57 -29 17 89 122 181 244 188 121 79 -11 -50
> 1987 -59 -5 30 97 131 177 193 192 153 101 21 -35
> 1988 -65 -15 29 80 108 184 222 198 138 116 8 -57
> 1989 -113 -54 53 94 113 164 215 186 143 78 8 -24
> 1990 -24 -30 49 100 100 166 214 194 177 77 9 -97
Hurrah!
crua6[/cru/cruts/version_3_0/db/testmergedb] ./simpleaddnew
crua6[/cru/cruts/version_3_0/db/testmergedb]
1835 92 73 141 187 260 279 281 288 241 195 183 106
6190 84 100 142 180 224 257 274 270 245 191 145 104
has more missing data and so forth. By 1870 they have diverged, so
in this case it's probably OK.. but what about the others? I just do
not have the time to follow up everything. We'll have to take 210
that gave the 210 figure excluded any lines with two or more
identical but for one or more missing values in one of the stations.
between the original database and the US database, with just a couple
in the 1991-2006 update file. One surprise was that stations I'm sure
unsettling!
Rather foolishly, perhaps, I decided to have a go at interactively
<BEGIN QUOTE>
made it to here
h.ann
crua6[/cru/cruts/version_3_0/primaries/temp]
<END QUOTE>
could start this project again and actually argue the case for
OK.. the .ann file was simply that it refuses to overwrite any
Here the two WMO codes look OK (though others are -999 which
seems unlikely) but the two lat/lon pairs? Ooops. Here are the
actual headers:
712600 465 845 187 Sault Ste Marie A CANADA 1945 2006 361945 -999.00
Not sure why the lats & lons are a factor of 10 too low - may
<BEGIN QUOTE>
made it to here
<END QUOTE>
The lats & lons look the same.. but a lot less duplicates!
why not compare the two bespoke log files (as excerpted above)?
but the log file from the run of the original database does not!
200
2572
1809
So 200 duplication events are unique to the older database,
and 2572 are unique to the new database - with 1809 common
of those with the first WMO as -999: this is the key. The
28. With huge reluctance, I have dived into 'anomdtb' - and already I have
I have found that the WMO Code gets set to -999 if *both* lon and lat are
* LoadCTS multiplies non-missing lons by 0.1, so they range from -18 to +18
with missing value codes passing through AS LONG AS THEY ARE -9999. If they
are -999 they will be processed and become -99.9. It is not clear why lats
* The subroutine 'Anomalise' in anomdtb checks lon and lat against a simple
'MissVal', which is defined as -999. This will catch lats of -999 but not
lons of -9999.
* This does still not explain how we get so many -999 codes.. unless we don't
* If the code is -999 because lat and lon are both missing - how the bloody
.. ah, OK. well for a start, the last point above does not apply - not one
case of the code being set to -999 because of lat/lon missing. In fact, I
hate to admit it, bit it is *sort of* clever - the code is set to -999 to
not make a distance comparison if either code is -999. So HOW COME loads of
The plot thickens.. I changed the exclusion tests in the duplication loops
from:
if (AStn(XAStn).NE.MissVal) then
to:
if (int(AStn(XAStn)).NE.-999) then
This made NO DIFFERENCE. So having tested to ensure that the first of the
pair hasn't already been used - we then use it! What's more I've noticed
that it's usually the one 'incorporated' in the previous iteration!
Consider:
Here we can see (check the first set of lat/lons) that, after being
160707, 160800 and 160811! So the same data could end up in three
station may hop all over the place in <8km steps, collecting data as
no surprise seeing as their lats & lons are rubbish!!! Oh Tim what
have you done, man? [actually - what he's done is to let missing
lats & lons through. Missing lon code is -1999 not -9999 so these
All that said, the biggest worry is still the lats & lons themselves.
They just don't look realistic. Lats appear to have been reduced by
a factor of 10 too, even though I can't find the code for that. And
Of course not! It's just over 50km. I do not understand why the lats
& lons have been scaled, when the stated distance threshold has not.
subroutine LoadCTS
(StnInfo,StnLocal,StnName,StnCty,Code,Lat,Lon,Elv,OldCode,Data,YearAD,&
NmlData,DtbNormals,CallFile,Hulme,Legacy,HeadOnly,HeadForm,LongType,Sile
nt,Extra,PhilJ, &
YearADMin,YearADMax,Source,SrcCode,SrcSuffix,SrcDate, &
LatMV,LonMV,ElvMV,DataMV,LatF,LonF,ElvF,NmlYr0,NmlYr1,NmlSrc,NmlInc
)
call LoadCTS
(StnInfoA,StnLocalA,StnNameA,StnCtyA,Code=AStn,OldCode=AStnOld, &
Lat=ALat,Lon=ALon,Elv=AElv,DtbNormals=DtbNormalsA, &
Data=DataA,YearAD=AYearAD,CallFile=LoadFileA,silent=1)
! get .dtb file
.. we see that Legacy is not passed. This means that.. (from LoadCTS):
if (present(Legacy)) then
end if
made it to here
Hurrah! Looking at the log it is still ignoring the -999 Code and re-intgrating stations..
but not to any extent worth worrying about. Not when duplications are down to 1.3% :-)))
Then got a mail from PJ to say we shouldn't be excluding stations inside 8km anyway -
yet
that's in IJC - Mitchell & Jones 2005! So there you go. Ran again with 0km as the
distance:
made it to here
a very dangerous thing - I decided to see what difference it made, turning off the
proximity
crua6[/cru/cruts/version_3_0/primaries/temp] wc -l */*1962.12.txt
2773 oldtxt/old.1962.12.txt
3269 tmptxt0km/tmp.1962.12.txt
3308 tmptxt8km/tmp.1962.12.txt
So.. 'oldtxt' is before I fixed the lat/lon scaling problem. But look at the last two - I
/gets out of huff and goes into house, checks things and thinks hard
Okay, I guess if we don't do the roll-duplicates-together thing, then we could lose data
because the 'rolled' station (ie the one subsumed into its neighbour) might have useful
29. I suddenly thought - what about the Australian data? But luckily that's just tmax/tmin
generation for v3.0 and re-do the anomalies with the new anomdtb. At 8km, we got the
made it to here
made it to here
made it to here
Happy? well.. no. Because something is happening for precip that does not happen for
temp! But of course. Here are the first few lines from various 1962.12 text files..
tmptxt8km/tmp.1962.12.txt
tmptxt0km/tmp.1962.12.txt
pretxt8km/pre.1962.12.txt
pretxt0km/pre.1962.12.txt
..As a result of fixing the lats and lons for temperature, and indeed
correction factor is expecting 100 not 10, but why isn't this a problem
for temperature?! Went back and ran exactly the same version of anomdtb
on temperature - exactly the same as last time (2nd from top above). So
stored values for the first station (-511900, BIRI) should be 61 and 10.6,
sounds about right for Norway. The bit in anomdtb (actually the subroutine
'Dumping', LOL) that writes the .txt files just writes directly from the
lat & lon at key stages - they were too high throughout, so LoadCTS assumed
holding them at x100 from their true values, ie 61.0 -> 6100. It was about
now that I spotted something I'd not thought to examine before: precip
Temperature header:
Precipitation header:
100100 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00
So.. this begs the question, how does the software suite know which it's got?
By rights it should look at the most extreme values for each.. something tells
me that's not the case. Decided to look at the ranges of values for different
-176000 3520 3330 220 NICOSIA CYPRUS 1932 1974 -999 nocode
-176000 3520 3330 220 NICOSIA CYPRUS 1932 1974 -999 nocode
Without going any further, it's obvious that LoadCTS is going to have to auto-
sense the lat and lon ranges. Missing value codes can then be derived - if it
always returns actual (unscaled) degrees (to one or two decimal places) then
any value lower than -998 will suffice for both parameters. However, this does
make me wonder why it wasn't done like that. Is there a likelihood of the
programs being used on a spatial subset of stations? Say, English? Then lon
would never get into double figures, though lat would.. well let's just hope
Okay.. so I wrote extra code into LoadCTS to detect Lat & Lon ranges. It excludes any
values for which the modulus of 100 is -99, so hopefully missing value codes do not
conribute. The factors are set accordingly (to 10 or 100). I had to default to 1 which
is a pity. Once you've got the factors, detection of missing values can be a simple
out-of-range test.
small section of code that converts PJ-style reversed longitudes, or 0-360 ones, to
regular -180 (W) to +180 (E). This code is switched on by the presence of the
'LongType' flag in the LoadCTS call - the trouble is, THAT FLAG IS NEVER SET BY
again. Just another thing I cannot understand, and another reason why this should all
crua6[/cru/cruts/version_3_0/db/testmergedb] ./revlons
<END QUOTE>
Thus the 'final' temperature database is now tmp.0705101334.dtb.
Re-ran anomdtb - with working lat/lon detection and missing lat/lon value
detection - for both precip and temperature. This should ensure that all
WMO codes are present and all lats and lons are correct.
Temp:
<BEGIN QUOTE>
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.tmp
tmp.0705101334.dtb
1961,1990
25
tmp.txt
1901,2006
> Operating...
<END QUOTE>
Precip:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.pre
pre.0612181221.dtb
1961,1990
25
pre.txt
> Operating...
<END QUOTE>
IDL>
quick_interp_tdm2,1901,2006,'tmpglo/tmpgrid.',1200,gs=0.5,dumpglo='dumpglo',pts_pre
fix='tmp0km0705101334txt/tmp.'
then glo2abs, then mergegrids, to produce monthly output grids. It apparently worked:
-rw------- 1 f098 cru 138964083 May 13 20:42 cru_ts_3_00.1901.2006.tmp.dat.gz
As a reminder, these output grids are based on the tmp.0705101334.dtb database, with no
Decided to (re-) process precip all the way, in the hope that I was in the zone or
IDL>
quick_interp_tdm2,1901,2006,'preglo/pregrid.',450,gs=0.5,dumpglo='dumpglo',pts_prefix
='pre0km0612181221txt/pre.'
Then glo2abs, then mergegrids.. all went fine, apparently.
Wrote 'makedtr.for' to tackle the thorny problem of the tmin and tmax databases not
being kept in step. Sounds familiar, if worrying. am I the first person to attempt
to get the CRU databases in working order?!! The program pulls no punches. I had
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./makedtr
<END QUOTE>
Yes, the difference is a lot more than seven! And the program helpfully dumps a listing
Unfortunately, it hadn't worked either. It turns out that there are 3518 stations in
each database with a WMO Code of ' 0'. So, as the makedtr program indexes on the
Rewrote as makedtr2, which uses the first 20 characters of the header to match:
<BEGIN QUOTE>
<END QUOTE>
The big jump in the number of 'surplus' stations is because we are no longer
automatically
matching stations with WMO=0.
Here's what happened to the tmin and tmax databases, and the new dtr database:
position missed
8 1
14 1
21 0
26 0
47 1
61 0
66 0
71 0
78 0
Why?!! Well the sad answer is.. because we've got a date wrong. All three 'header'
problems
..and as we know, this is not a conventional header. Oh bum. But, but.. how? I know we
do
muck around with the header and start/end years, but still..
Wrote filtertmm.for, which simply steps through one database (usually tmin) and
looks for a 'perfect' match in another database (usually tmax). 'Perfect' here
means a match of WMO Code, Lat, Lon, Start-Year and End-Year. If a match is
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./filtertmm
working..
<END QUOTE>
converter that takes one or more of the packs and creates CRU-format databases
of them. Edit: nope, thought some more and the *best* strategy is a program
that takes *pairs* of Aus packs and updates the actual databases. Bearing in
mind that these are trusted updates and won't be used in any other context.
From Dave L - who incorporated the initial Australian dump - for the tmin/tmax
bulletins,
Actually.. although I was going to assume that filtertmm had done the synching job OK, a
brief look at the Australian stations in the databases showed me otherwise. For instance,
I pulled all the headers with 'AUSTRALIA' out of the two 0705182204 databases. Now
because
these were produced by filtertmm, we know that the codes (if present), lats, lons and
dates
will all match. Any differences will be in altitude and/or name. And so they were:
336
..so roughly 100 don't match. They are mostly altitude discrepancies, though there are an
74c74
---
---
> 0 -4230 14650 595 TARRALEAH CHALET AUSTRALIA 2000 2006 -999
-999.00
Examples of the second kind (name mismatch) are most concerning as they may well be
and tmax, early and late (before and after filtertmm.for). We see there are two
TARRALEAH entries
in each of the four files. We see that 'TARRALEAH VILLAGE' only appears in the tmin
file. We see,
most importantly perhaps, that they are temporally contiguous - that is, each pair could
join with
minimal overlap, as one is 1991-2000 and the other 2000-2006. Also, we note that the
'early' one
of each pair has a slightly different longitude and altitude (the former being the thing that
95018, 051201051231, -42.30, 146.45, 18.0, 00, 31, 31, 585, TARRALEAH
VILLAGE
So we can resolve this case - a single station called TARRALEAH VILLAGE, running
from 1991 to 2006.
But what about the others?! There are close to 1000 incoming stations in the bulletins,
must
every one be identified in this way?!! Oh God. There's nothing for it - I'll have to write a
prog
to find matches for the incoming Australian bulletin stations in the main databases. I'll
have to
use the databases from before the filtertmm application, so *0705182204.dtb. And it will
only
need the Australian headers, so I used grep to create *0705182204.dtb.auhead files. The
other
input is the list of stations taken from the monthly bulletins. Now these have a different
number
of stations each month, so the prog will build an array of all possible stations based on the
crua6[/cru/cruts/version_3_0/db] wc -l *auhead
1518 glseries_tmn_final_merged.auhead
1518 tmn.0611301516.dat.auhead
1518 tmn.0612081255.dat.auhead
1518 tmn.0702091139.dtb.auhead
1518 tmn.0705152339.dtb.auhead
1426 tmn.0705182204.dtb.auhead
Actually, stopped work on that. Trying to match over 800 'bulletin' stations against over
3,000
database stations *in two unsynchronised files* was just hurting my brain. The files have
to be
properly synchronised first, with a more lenient and interactive version of filtertmm. Or...
could I use mergedb?! Pretend to merge tmin into tmax and see what pairings it
managed? No
..unfortunately, not. Because when I tried, I got a lot of odd errors followed by a crash.
The
reason, I eventually deduced, was that I didn't build mergedb with the idea that WMO
codes might
be zero (many of the australian stations have wmo=0). This means that primary matching
on WMO
code is impossible. This just gets worse and worse: now it looks as though I'll have to
find WMO
Codes (or pseudo-codes) for the *3521* stations in the tmin file that don't have one!!!
OK.. let's break the problem down. Firstly, a lot of stations are going to need WMO
codes, if
available. It shouldn't be too hard to find any matches with the existing WMO coded
stations in
the other databases (precip, temperature). Secondly, we need to exclude stations that
aren't
of 0 as 'missing'? Had a look, and it does check that the code isn't -999 OR 0.. but not
when
preallocating flags in subroutine 'countscnd'. Fixed that and tried running it again..
exactly
the same result (crash). I can't see anything odd about the station it crashes on:
0 -2810 11790 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999
-999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2000 339 344 280 252 214 202 189 196 262 291 316 377
2001 371 311 310 300 235 212 201 217 249 262 314 333
2002-9999-9999 339 297 258 209 205 212 246 299 341 358
2003 365 367 336 296 249 195 193 200 238 287 325 368
2004 395 374 321 284 219 214 173 188 239 309 305 370
2005 389 396 358 315 251 182 189 201 233 267 332 341
.. it's very similar to preceding (and following) stations, and the station before has even
less real data (the one before that has none at all and is auto-deleted). The nature of the
crash is 'forrtl: error (65): floating invalid' - so a type mismatch possibly. The station has
tmn.0702091139.dtb:
0 -2810 11780 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999
-999.00
tmx.0702091313.dtb:
0 -2810 11790 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999
-999.00
7600, 070401070430, -28.12, 117.84, 16.0, 00, 30, 30, 407, MOUNT
MAGNET AERO
Note that the altitude matches (as distinct from the station below).
Naturally, there is a further 'MOUNT MAGNET' station, but it's probably distinct:
tmn.0702091139.dtb:
9442800 -2807 11785 427 MOUNT MAGNET (MOUNT AUSTRALIA 1956 1992
-999 -999.00
tmx.0702091313.dtb:
9442800 -2807 11785 427 MOUNT MAGNET (MOUNT AUSTRALIA 1957 1992
-999 -999.00
I am at a bit of a loss. It will take a very long time to resolve each of these 'rogue'
stations. Time I do not have. The only pragmatic thing to do is to dump any stations that
are
too recent to have normals. They will not, after all, be contributing to the output. So I
knocked out 'goodnorm.for', which simply uses the presence of a valid normals line to
sort.
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./goodnorm
FINISHED.
crua6[/cru/cruts/version_3_0/db/dtr] ./goodnorm
FINISHED.
<END QUOTE>
Essentially, two thirds of the stations have no normals! Of course, this still leaves us with
a lot more stations than we had for tmean (goodnorm reported 3316 saved, 1749 deleted)
though
still far behind precipitation (goodnorm reported 7910 saved, 8027 deleted).
I suspect the high percentage lost reflects the influx of modern Australian data. Indeed,
nearly
3,000 of the 3,500-odd stations with missing WMO codes were excluded by this
operation. This means
that, for tmn.0702091139.dtb, 1240 Australian stations were lost, leaving only 278.
This is just silly. I can't dump these stations, they are needed to potentially match with the
1. Attempt to pair bulletin stations with existing in the tmin database. Mark pairings in
the
2. Run an enhanced filtertmm to synchronise the tmin and tmax databases, but
prioritising the
'paired' stations from step 1 (so they are not lost). Mark the same pairings in the tmax
problem: what to do with a positive match between a bulletin station and a zero-wmo
database
station? The station must have a real WMO code or it'll be rather hard to describe the
match!
Got a list of around 12,000 wmo codes and stations from Dave L; unfortunately there was
a problem
So.. current thinking is that, if I find a pairing between a bulletin station and a zero-coded
Australain station in the CRU database, I'll give the CRU database station the Australian
local
(bulletin) code twice: once at the end of the header, and once as the WMO code
*multiplied by -1*
to avoid implying that it's legitimate. Then if a 'proper' code is found or allocated later,
the
mapping to the bulletin code will still be there at the end of the header. Of course, an
initial
check will ensure that a match can't be found, within the CRU database, between the
zero-coded
though really it's (2x,i6,a8) as I remember the Anders code being i2 and the real start year
being
i4 (both from the tmean database). This will mean post-processing existing databases of
course,
A brief (hopefully) diversion to get station counts sorted. David needs them so might as
well sort
the procedure. In the upside-down world of Mark and Tim, the numbers of stations
contributing to
each cell during the gridding operation are calculated not in the IDL gridding program -
oh, no! -
but in anomdtb! Yes, the program which reads station data and writes station data has a
second,
crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.pre
> Will calculate percentage anomalies.
pre.0612181221.dtb
1961,1990
25
But then, we choose a different output, and it all shifts focus and has to ask all the IDL
questions!!
pre.stn
450
> Submit a grim that contains the appropriate grid.
clim.6190.lan.pre
1901,2006
> Operating...
outputs from the regular anomdtb runs - ie, the monthly files of valid stations. After all
we need
to know the station counts on a per month basis. We can use the lat and lon, along with
the
correlation decay distance.. shouldn't be too awful. Just even more programming and
work. So before
I commit to that, a quick look at the IDL gridding prog to see if it can dump the figures
instead:
after all, this is where the actual 'station count' information is assembled and used!!
..well that was, erhhh.. 'interesting'. The IDL gridding program calculates whether or not
a
station contributes to a cell, using.. graphics. Yes, it plots the station sphere of influence
then
checks for the colour white in the output. So there is no guarantee that the station number
files,
which are produced *independently* by anomdtb, will reflect what actually happened!!
Well I've just spent 24 hours trying to get Great Circle Distance calculations working in
Fortran,
with precisely no success. I've tried the simple method (as used in Tim O's geodist.pro,
and the
more complex and accurate method found elsewhere (wiki and other places). Neither
give me results
with that. Also decided that the approach I was taking (pick a gridline of latitude and
reverse-
engineer the GCD algorithm so the unknown is the second lon) was overcomplicated,
when we don't
need to know where it hits, just that it does. Since for any cell the nearest point to the
station
will be a vertex, we can test candidate cells for the distance from the appropriate vertex to
the
The problem is, really, the huge numbers of cells potentially involved in one station,
particularly
at high latitudes. Working out the possible bounding box when you're within cdd of a
pole (ie, for
tmean with a cdd of 1200, the N-S extent is over 20 cells (10 degs) in each direction.
Maybe not a
serious problem for the current datasets but an example of the complexity. Also, deciding
on the
potential bounding box is nontrivial, because of cell 'width' changes at high latitudes (at
61 degs
North, the half-degree cells are only 27km wide! With a precip cdd of 450 km this means
the
bounding box is dozens of cells wide - and will be wider at the Northern edge!
Clearly a large number of cells are being marked as covered by each station. So in
densely-stationed
areas there will be considerable smoothing, and in sparsely-stationed (or empty) areas,
there will be
possibly untypical data. I might suggest two station counts - one of actual stations
contributing from
within the cell, one for stations contributing from within the cdd. The former being a
subset of the
latter, so the latter could be used as the previous release was used.
Well, got stncounts.for working, finally. And, out of malicious interest, I dumped the first
station's
coverage to a text file and counted up how many cells it 'influenced'. The station was at
10.6E, 61.0N.
The total number of cells covered was a staggering 476! Or, if you prefer, 475 indirect
and one direct.
Ran for the first month (01/1901). Compared the resulting grid with that from CRU TS
2.1. Seems to
Wrote 'makelsmask.for' to, well, make a land-sea mask. It'll work with any gridded
data file that uses -999 for sea. The mask is called 'lsmask.halfdeg.dat'. Adapted
seems to use the inbuilt 'TRIGRID' function to interpolate the grid, so there's no way of
getting
the station count for a particular cell that way anyway. Not that it would mean much,
since there
is bound to be some kind of weighting (it's not clear what that weighting is, though, from
the IDL
website). So the figures in the station count files are really rather loose. What might be
useful
as a companion dataset would be the ACTUAL station counts. Counts for cells with
stations actually
Managed a full run of stncounts. It took over five and a half hours, which is a bit much!
Back to the gridding. I am seriously worried that our flagship gridded data product is
produced by
Delaunay triangulation - apparently linear as well. As far as I can see, this renders the
station
counts totally meaningless. It also means that we cannot say exactly how the gridded data
is arrived
at from a statistical perspective - since we're using an off-the-shelf product that isn't
documented
sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures
perhaps?
Was too much effort expended on homogenisation, that there wasn't enough time to write
a gridding
procedure? Of course, it's too late for me to fix it too. Meh.
Well, it's been a real day of revelations, never mind the week. This morning I
discovered that proper angular weighted interpolation was coded into the IDL
routine, but that its use was discouraged because it was slow! Aaarrrgghh.
There is even an option to tri-grid at 0.1 degree resolution and then 'rebin'
to 720x360 - also deprecated! And now, just before midnight (so it counts!),
having gone back to the tmin/tmax work, I've found that most if not all of the
Australian bulletin stations have been unceremoniously dumped into the files
without the briefest check for existing stations. A classic example would be
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2000 245 243 243 232 184 143 138 155 193 231 249 249
2001 245 247 241 216 156 167 163 129 201 238 246 247
2002 244 246 230 208 167 122 92 119 202 217 248 259
2003 253 249 222 220 169 151 144 158 203 216 248 250
2004 252 247 244 209 202 135 129 140 176 230 248 257
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971 254 249 239 218 166 147 142 169 214 246 253 241
1972 246 244 226 198 175 158 126 182 200 222 244 259
1973 255 259 252 232 215 186 171 189 216 240 256 246
1974 247 243 240 217 183 144 134 171 216 247 248 246
1975 239 239 237 216 180 157 168 171 223 233 243 246
1976 235 244 227 190 148 142 142 144 177 236 252 250
1977 253 249 245 218 177 135 130 137 187 226 250 248
1978 247 244 239 199 218 174 162 186 195 233 245 253
1979 247 246 238 217 205 166 147 178 216 234 248 254
1980 249 245 240 221 186 161 141 171 192 241 249 252
1981-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1982-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1983-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1984-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1991 248 244 234 224 169 160 160 140 210 225 252 260
1992 253 251 247 239 206-9999 141 173 218 237 246 260
1993 247-9999 242 225 207 172 149 170 204 237 249 258
1994 253-9999 214 196 171 140 130 141 171 222 248 247
1995 245 249 234 205 186 155 148 151 198 217 245 244
1996 245 238 220 208 159 166 136 161 179 225 233 247
1997 245 243 217 195 186 149 138 156 195 230 242 247
1998 248 250 245 229 188 167 177 158 200 247 253 250
1999 250 245 242 216 144 150 123-9999 188 239 240 251
2000 245 243 243 232 184 143 138 154 194 231 249 249
Now, I admit the lats and lons aren't spot on. But c'mon, what are the chances
of them being different? The two year 2000s are almost identical. What about:
Or:
I'd be content to leave it - but I have to match the bulletins! And I can match
to the long, stable series or to the loose, flapping ones put in for the
So.. in the end I matched to the 2000-2006 stations, where they actually did match.
Unfortunately the huge bulk of the bulletins still had to have new entries created for
them, which is a shame, and begs the question of why the Australian update bulletins
For some reason, the auminmatch program is causing no end of grief. I thought I'd
managed a complete run, and it did produce a good-looking tmin database with lots of
-1009 -6628 11054 12 KURI BAY AUSTRALIA 2007 2007 -999 1009
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
-1020 -6628 11054 51 TRUSCOTT AUSTRALIA 2007 2007 -999 1020
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
However, it doesn't seem to have put the bulletin codes on the (a8) header field, for
Not sure why this is yet.. but have found also that there are cases of duplicated lat/lon
pairs,
so multiple matches are being made.. argh.. will have to further augment auminmatch.
Not happy.
An interesting aside.. David was looking at the v3.00 precip to help National Geographic
with
an enquiry. I produced a second 'station' file with the 'honest' counts (see above) and he
used
that to mask out cells with a 0 count (ie that only had indirect data from 'nearby' stations).
There were some odd results.. with certain months havign data, and others being missing.
After
considerable debate and investigation, it was understood that anomdtb calculates normals
on a
monthly basis. So, where there are 7 or 8 missing values in each month (1961-1990), a
station
may end up contributing only in certain months of the year, throughout its entire run!
This was
noticed in the Seychelles, where only October has real data (the remaining months being
relaxed
to the climatology but excluded by David using the 'tight' station mask). There is no easy
solution, because essentially it's an honest result: only October has sufficient values to
form
a normal, so only October gets anomalised. It's an unfortunate concidence that it's the
only
station in the cell, but it's not the only one. A 'solution' could be for anomdtb to get a bit
more involved in the gridding, to check that if a cell only has one station (for one or more
years) then it's all-or-nothing. Maybe if only one month has a normal then it's dumped
and the
whole reverts to climatology. Maybe if 4 or more months have normals.. maybe if >0
months have
normals and the rest can be brought in with a minor relaxation of the '75% rule'.. who
knows.
matches and group them together. The user then processes one group at a time, pairing up
matches until the potential for further matches is zero (or the user decides it is). Uses a
FSM to work out each chain (all db matches for a bulletin, then all bulletins that match
each of those db stations, then.. etc). To understand it, either read the code (especially
the comments) or just look at this mind-boggling example from the first run of it:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Bulletin stations: 8
Database stations: 18
Bulletin stations: 7
Database stations: 17
Bulletin stations: 6
Database stations: 16
Bulletin stations: 5
Database stations: 15
Bulletin stations: 4
Database stations: 14
Bulletin stations: 3
Database stations: 13
Bulletin stations: 2
Database stations: 12
Bulletin stations: 1
Database stations: 11
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Bulletin stations: 1
Database stations: 2
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
for a second chance sometimes! So more debugging.. fixed. Also added a test
before the user gets a chain, to anticipate what the user (er, I) would do. For
match, as they're the ones David L put in from the Aus update files. I, er, the
user then gets ambiguities and nearby but unconnected stations. Fine, until you
Bulletin stations: 2
Database stations: 3
Looking in the files I see that Bulletin 58009 is 'BYRON BAY (CAPE BYRON
LIGHTHOUSE)',
and 58216 is 'BYRON BAY (CAPE BYRON AWS)'. But the database stubs that have
been
entered have not been intelligently named, just truncated - so I have no way of
knowing which is which! CRU NEEDS A DATA MANAGER. In this case I had to
assume that
the updates were processed in .au code order, so 1-1 and 2-2. Argh. A few doubles
found, too:
Bulletin stations: 1
Database stations: 3
Bulletin stations: 1
1. 90186 -3829 14245 71 WARRNAMBOOL AIRPORT
Database stations: 4
And the results? Strictly average, I thought.. but I'd forgotten to count the extra
232
514
12
In other words, all that sweat was worth it - 746 stations matched automatically, and
a further 12 manually! Only (797-758=) 39 bulletins unmatched! Wheeee! And here they
are:
-6072 -2303 11504 111 EMU CREEK STATION AUSTRALIA 2007 2007 -999
6072
-12044 -3355 12070 220 MUNGLINUP WEST AUSTRALIA 2007 2007 -999
12044
-12241 -2888 12132 370 LEONORA AERO AUSTRALIA 2007 2007 -999
12241
-22801 -3575 13659 143 CAPE BORDA COMPARISO AUSTRALIA 2007 2007
-999 22801
-48243 -2943 14797 154 LIGHTNING RIDGE VISI AUSTRALIA 2007 2007
-999 48243
-56037 -3053 15167 987 ARMIDALE (TREE GROUP AUSTRALIA 2007 2007
-999 56037
-70263 -3475 14970 670 GOULBURN TAFE AUSTRALIA 2007 2007 -999
70263
-82170 -3655 14600 171 BENALLA AIRPORT AUSTRALIA 2007 2007 -999
82170
-88023 -3723 14591 230 LAKE EILDON AUSTRALIA 2007 2007 -999
88023
-200001 -2166 15027 209 MIDDLE PERCY ISLAND AUSTRALIA 2007 2007
-999 200001
-200288 -2904 16794 112 NORFOLK ISLAND AERO AUSTRALIA 2007 2007
-999 200288
-200790 -1045 10569 261 CHRISTMAS ISLAND AER AUSTRALIA 2007 2007
-999 200790
-200838 -3922 14698 116 HOGAN ISLAND AUSTRALIA 2007 2007 -999
200838
[edit: found another fault, had to re-run. Headers weren't being modded if the WMO code
was
already there]
32. The next stage *heart falls* will be to synchronise tmax *against* tmin, sweeping
up duplicates in the process. How long's THIS gonna take? Well actually, it might be
fairly easy,
if we use a similar approach. We can base it all around the user being given a 'cloud' of
related stations to pick pairs from, only they will be uniquely numbered so that two from
the
same database can be selected. The user can in this way 'pair up' stations in groups.
Of course, this comes with the downside of complexity (and therefore bugs). And both
databases
will almost certainly have to be preloaded in their entirety because of the need for the
user to
be able to confirm header and data precedence info when stations within a database are
merged.
Well.. it's written, and debugging. Around 1500 lines of code, or 1000 without all the
comments ;-)
It does indeed read in all the data, so has to be compiled on uealogin1 (as crua6 doesn't
have
enough memory!). Reusing code from auminmatch.for did speed things up a bit, though
two new
subroutines had to be written to carry out checking for merges (within a database) and for
matches (between the databases). Also introduced a user decision at the start to allow the
TMin
database to take precedence in terms of station metadata. Here's the current state of play:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync
Before we get started, an important question: Should TMin header info take precedence
over TMax?
This will significantly reduce user decisions later, but is a big step as TMax settings may
be silently overridden!
To let TMin header values take precedence over those of TMax, enter 'YES': YES
unmatchable: 63 (tmin)
unmatchable: 48 (tmax)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations: 2
TMax stations: 2
<END QUOTE>
the databases aren't synchronised, and as there are hundreds of 'duplicate' entries.. only
around
50% match straight away. The situation isn't as bleak as it looks, though - there is further
automatching at the beginning of each cloud, so the user can still be spared the obvious.
If the
merging gets too onerous, though, I might have to automate that - with associated risks.
And of course - if you look closely - things are still a little offbeam :-/
Found another database bug by chance.. a <tab> instead of a space after 'CRANWELL':
-324320 5303 -50 62 CRANWELL UK 1961 1995 -999 -999.00
Doesn't show up in reads as it's a white space character. Argh. Fixed in tmin & tmax.
Now to find
out why some matched stations STILL don't have the backref in the last header field!!
..found it,
not my problem, it's the ones that *pre-existed* in the databases, there's 84 in total I
think. So
I can write a proglet to check that any with negative WMO codes have the positive
version in that
tmn.0707021605.dtb (651 'fixed' - includes all with negative WMOs regardless of end
field)
So why, when we matched 758 bulletins in the first place, did this program only 'fix' 651,
of which
84 were preexisting? Because, of course, the matches only get a negative WMO code if
the original
WMO code is missing (zero). The 'missing' stations would be ones that already had a
WMO code.
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync
WELCOME TO THE TMIN/TMAX SYNCHRONISER
Before we get started, an important question: Should TMin header info take precedence
over TMax?
This will significantly reduce user decisions later, but is a big step as TMax settings may
be silently overridden!
To let TMin header values take precedence over those of TMax, enter 'YES': YES
unmatchable: 48 (tmax)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
TMin stations: 2
TMax stations: 2
Stn 1: -401000 3178 3522 783 JERUSALEM ISRAEL 1863 2000 -999
401000
Stn 2: -401000 3178 3522 783 JERUSALEM ISRAEL 1863 2000 -999
401000
<END QUOTE>
Well.. it's kinda working. I found some idiotic bugs, though it is a fearsomely
complicated program with
lots of indirect pointers (though I do try and resolve them at the first opportunity). One
thing that's
it before: the program doesn't actually flush the output channels whenever you write! For
example, as I
unmatchable: 63 (tmin)
unmatchable: 48 (tmax)
match reports on channel 31 BUT THEY ARE NOT IN THE FILE YET. Here is the tail
of channel 31:
TMax: 9929470 4330 1340 342 MACERATA ITALY 1953 1975 -999
-999.00
TMin: 9929480 4030 880 585 MACOMER ITALY 1952 1978 -999
-999.00
TMax: 9929480 4030 880 585 MACOMER ITALY 1952 1978 -999
-999.00
TMin: 9929500 4010 1850 86 PALASCIA AERO ITALY 1952 1978 -999
-999.00
TMax: 9929500 4010 1850 86 PALASCIA AERO ITALY 1952 1978 -999
-999.00
In addition, the log file is EMPTY, yet at least 416 bytes have been written to it. How the
hell can I
debug if I can't monitor what's being written to the log files?!! Of course, once I force-
quit the program,
and wait a bit.. the missing info appears. Similarly if I carry on using the program, the
files get more
info. It's as if there's a write buffer that runs FIFO. Must look at the 'help'.. why is it that
whenever I
crack the programming, the systems themselves step in the screw it up? And computer
support is away of course.
Looked at f77 -help.. nothing. well nothing obvious. Anyway, more debugging and..
Seems to be working. But it's going to take ages. Here is an example of the problem:
<BEGIN QUOTE>
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
TMin stations: 2
TMax stations: 2
Not only do both databases have unnecessary duplicates, introduced for external mapping
purposes
by the look of it, but the 'main' stations (2 and 4) have different station name & country.
In fact
one of the country names is illegal! Dealing with things like this cannot be automated as
they're
Something new - a listing of 147 Australian 'bulletin' stations, most of which have
mappings to
WMO codes. Decided to xref against the (mapped) TMin database, for a laugh. Then
decided to take it
Decided to be vaguely sensible and let the program, auwmoxref.for, evolve. so to begin
with it just
did a scan between the mappings file (au_mapping_to_wmo.dat) and the tmin database
with my mappings
in (tmn.0707021605.dtb). Results:
crua6[/cru/cruts/version_3_0/db/dtr] ./auwmoxref
<BEGIN QUOTE>
AUWMOXREF: Check Australian cross-references
RESULTS:
WMO Matches: 92
(multiples) ( 0)
(multiples) ( 0)
<END QUOTE>
So first the good news - no duplicates. Well there shouldn't have been any anyway of
course, but the
way things are going I'm taking nothing for granted. See, I count something turning out
as expected
as 'good news'. So anyway.. I also extracted the statistic that 26 mappings matched both
Ref and WMO,
but to separate database entries. Thus the 115 mappings are allocated as follows:
26 WMO found elsewhere (one of which has an unmapped ref attached to it)
For the purposes of actions to take, the 13 'WMO Wrong' refs can simply be unmapped
from their incorrect
So:
31 WMO found elsewhere (one of which has an unmapped ref attached to it)
23 WMO not in database but pairing made (can add wmo codes for these)
8 WMO not in database and no pairing (can add new stations for these)
2. For the 13 with incorrectly-assigned WMOs, disengage and roll into the rest below
3. For the 1 WMO with an unmapped ref attached, disengage and roll into the rest below
3. For the 31 with dislocated WMOs, print a list and ref when doing the tmin/tmax
syncing
5. For the 8 with no WMO found and no pairing found, create new stations.
For the disengagements, decided to work directly with an editor rather than craft another
program! So
The following assignments were disengaged (and replaced with -999.00). Where a WMO
code follows in
3. 9432200 -2020 13000 340 RABBIT FLAT AUSTRALIA 1969 2006 -999
15666 (no)
7. 9454100 -2980 15110 582 INVERELL (RAGLAN ST) AUSTRALIA 1907 2006
-999 56242 (no)
9. 9475800 -3210 15090 216 SCONE SCS AUSTRALIA 2000 2006 -999
61089 (9473800)
10. 9494000 -3510 15080 85 JERVIS BAY (POINT PE AUSTRALIA 1907 2006
-999 68151 (no)
11. 9491600 -3590 14840 1482 CABRAMURRA SMHEA AWS AUSTRALIA 1962
2006 -999 72161 (no)
12. 9482700 -3630 14160 133 NHILL AUSTRALIA 1897 2006 -999
78031 (9582900)
The 'mismatched WMO code' station was disengaged from it's reference and given 48027
instead:
1. 9471100 -3150 14580 218 COBAR AIRPORT AWS AUSTRALIA 1962 2006
-999 48237 -> 48027
I mailed BOM as we have 94711 = COBAR AWS but they have *94710* for AWS and
94711 for COBAR MO. The
<BEGIN QUOTE>
Hi Ian,
The blank in the Closed column means that the site is still open
When Cobar Comparison site closed it transferred its WMO number to Cobar MO
A blank in the WMO No. column means that the site never had a WMO number.
I am not sure of the overlap between the assignment of 94711 between 48244 and 48027.
I will find
out and get back to you.
<END QUOTE>
0 -3150 14580 251 COBAR POST OFFICE AUSTRALIA 1902 1960 -999
-999.00
9471100 -3150 14580 218 COBAR AIRPORT AWS AUSTRALIA 1962 2006
-999 48027
Now looking at the dates.. something bad has happened, hasn't it. COBAR AIRPORT
AWS cannot start
in 1962, it didn't open until 1993! Looking at the data - the COBAR station 1962-2004
seems to be
an exact copy of the COBAR AIRPORT AWS station 1962-2004, except that the latter
has more missing
values. Now, COBAR AIRPORT AWS has 15 months of missing value codes beginning
Oct 1993.. coincidence?
No. I think that that series should start there. Furthermore, the overlap between COBAR
and COBAR MO
2000 178 209 184 136 80 52 45 55 105 122 166 186 (7/12)
2001 223 214 159 126 72 61 43 52 105 110 148 181 (12/12)
2002 195 185 168 148 88 58 49 63 101 128 187 192 (11/12)
All BOM codes will be appended for completeness. So the new headers (with lat/lon
from BOM too) are:
0 -3150 14583 251 COBAR POST OFFICE AUSTRALIA 1902 1960 -999
48030 (closed)
9471000 -3154 14580 218 COBAR AIRPORT AWS AUSTRALIA 1995 2006
-999 48237
Deleted:
The remaining 26 dislocated references were reassigned as for the 13 above. Legitimate
mappings:
1. 3003 9420300
2. 4032 9431200
3. 5007 9430200
4. 7176 9431700
5. 9021 9461000
6. 14508 9415000
7. 14932 9413100
8. 17031 9448000
9. 22801 9480500
10. 9571900 -3220 14860 284 DUBBO AIRPORT AWS AUSTRALIA 2000 2006
-999 65070
12. 9495400 -4070 14470 94 CAPE GRIM BAPS AUSTRALIA 2000 2006
-999 91245
13. 9596400 -4110 14680 3 LOW HEAD AUSTRALIA 2000 2006 -999
91293
14. 9595900 -4190 14670 1055 LIAWENEE AUSTRALIA 2000 2006 -999
96033
In other words, there are (115-106=) 9 mappings unfulfilled. The ref hasn't been matched
and
WMO code isn't in the database. However, that didn't mean they weren't in the database
with a
missing WMO code, did it? The following were found and augmented with both WMO
code and ref.
9594000 -3509 15080 85 JERVIS BAY (PT PERP AWS) AUSTRALIA 2000 2006
-999 68151
9532200 -2018 13001 340 RABBIT FLAT AUSTRALIA 2007 2007 -999
15666
9554100 -2978 15111 582 INVERELL (RAGLAN ST) AUSTRALIA 2007 2007
-999 56242
However, the current 'live' FORREST station (11052) started in 1993, according to
bom.au
records. And wouldn't you know it, the data for this station has missing data between
12/92
and 12/99 inclusive. So I reckon it's the old FORREST AERO station (WMO 9464600,
.au ID 11004),
with the new Australian bulletin updates tacked on (hence starting in 2000). Especially as
the
The trouble is that the bom.au mappings all agree that FORREST is now
WMO=9564600. So.. do I
split off the 2000-present data to a new station with the new number, or accept that
whoever
joined them (Dave?) looked into it and decided it would be OK? The BOM website says
they're
800m apart. Decided to be brave and split the data back into two stations, with both codes
attached (in case we ever get replacement data for the closed station, the site says it went
to
9464600 -3085 12811 159 FORREST AERO AUSTRALIA 1946 1992 -999
11004
The following mapping was added, though the station does not currently feature in the
bulletins.
9495900 -4228 14628 -999 BUTLERS GORGE AUSTRALIA 2007 2007 -999
96003
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2007-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Also ran a risky search&replace to left-justify the 'AUSTRALIA' in its field, provided the
field wasn't touched by an extended station name. Seems to have been 100% successful.
All 115 refs now matched in the TMin database. Confidence in the fidelity of the
Australian
Well OK, made some final 'improvements' to the syncing program. Now, after it forms a
cloud, it
should automatically merge stations provided the criteria are met and no others are
possibles.
It also records, in a separate 'action' file (act.*), every relevant action performed during
the run, so that if interrupted I should be able to hack in something to enable a 'resume'.
It's
been done a bit hastily so no guarantees that enough information's been saved!
Debugging is still a big issue, unfortunately. It's a complicated program to sort out and
the
possibilities for indexing errors are many. In fact, for the first time ever, it's just locked
up!
That's a first (it was due to getmos not defaulting to months 1 & 12 if the data was all
missing).
Another problem solved - spent ages wondering how the start & end years for a particular
station
(WARATAH) were being corrupted. Turns out they weren't - I'd written 'getmos' to trim
empty years,
So.. perhaps a debugged run through? I'm quickly realising that the Australian stations
are in
such a state that I'm having to constantly refer to the station descriptions on the BOM
website,
http://www.bom.gov.au/climate/cdo/metadata/pdf/metadata088110.pdf
It takes time.. time I don't have! Though I'm pleased to see that the second FSM is
helpfully
introduced, so many false references.. so many changes that aren't documented. Every
time a
cloud forms I'm presented with a bewildering selection of similar-sounding sites, some
with
references, some with WMO codes, and some with both. And if I look up the station
metadata with
one of the local references, chances are the WMO code will be wrong (another station
will have
it) and the lat/lon will be wrong too. I've been at it for well over an hour, and I've reached
the 294th station in the tmin database. Out of over 14,000. Now even accepting that it
will get
easier (as clouds can only be formed of what's ahead of you), it is still very daunting. I go
on leave for 10 days after tomorrow, and if I leave it running it isn't likely to be there
when
I return! As to whether my 'action dump' will work (to save repetition).. who knows?
Pfft.. and back to Australia almost immediately :-( .. and then Chile. Getting there.
Unfortunately, after around 160 minutes of uninterrupted decision making, my screen has
started
to black out for half a second at a time. More video cable problems - but why now?!! The
count is
up to 1007 though.
I am very sorry to report that the rest of the databases seem to be in nearly as poor a state
as
Australia was. There are hundreds if not thousands of pairs of dummy stations, one with
no WMO
and one with, usually overlapping and with the same station name and very similar
coordinates. I
know it could be old and new stations, but why such large overlaps if that's the case?
Aarrggghhh!
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
TMin stations: 4
TMax stations: 4
Enter ANY pair to match or merge, 'a' to auto-match (no merges), or 'x' to end:
I honestly have no idea what to do here. and there are countless others of equal
bafflingness.
I'll have to go home soon, leaving it running and hoping none of the systems die
overnight :-(((
.. it survived, thank $deity. And a long run of duplicate stations, each requiring multiple
decisions concerning spatial info, exact names, and data precedence for overlaps. If for
any reason
this has to be re-run, it can certainly be speeded up! Some large clouds, too - this one
started
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
TMin stations: 7
24. 7163434 4380 -7955 194 TORONTO MET RES STN CANADA 1965 1988
-999 0
39. 7163408 4388 -7945 233 RICHMOND HILL CANADA 1959 1990
-999 0
40. 7163409 4387 -7943 218 RICHMOND HILL WPCP 1960 1981 -999
0
TMax stations: 8
82. 7101987 4380 -7955 194 TORONTO MET RES STN 1965 1988 -999
0
83. 7163434 4380 -7955 194 TORONTO MET RES STN CANADA 1965 1988
-999 0
98. 7163408 4388 -7945 233 RICHMOND HILL CANADA 1959 1990
-999 0
99. 7163409 4387 -7943 218 RICHMOND HILL WPCP 1960 1981 -999
0
not return any hits with a web search. Usually the country's met office, or at least the
Weather Underground, show up - but for these stations, nothing at all. Makes me wonder
if
these are long-discontinued, or were even invented somewhere other than Canada!
Examples:
7162040 brockville
7163231 brockville
7163229 brockville
7187742 forestburg
7100165 forestburg
<BEGIN QUOTE>
From TMax: 0 5170 -12140 994 108 MILE HOUSE ABEL 1987 2002
-999 -999.00
DBG: AUTOPAIRED: 1 1
From TMin: 7194273 5165 -12130 1059 100 MILE HOUSE CANADA 1970
1999 -999 -999.00
From TMax: 7194273 5165 -12130 1059 100 MILE HOUSE CANADA
1970 1999 -999 -999.00
DBG: AUTOPAIRED: 2 2
From TMin: 7103611 5160 -12120 994 HORSE LAKE 1983 1994
-999 -999.00
From TMax: 7103611 5160 -12120 994 HORSE LAKE 1983 1994
-999 -999.00
DBG: AUTOPAIRED: 3 3
From TMin: 7103629 5155 -12120 1145 LONE BUTTE 2 1981 1991
-999 -999.00
From TMax: 7103629 5155 -12120 1145 LONE BUTTE 2 1981 1991
-999 -999.00
DBG: AUTOPAIRED: 4 4
From TMin: 7103637 5168 -12122 928 100 MILE HOUSE 6NE 1987
2002 -999 -999.00
From TMax: 7103637 5168 -12122 928 100 MILE HOUSE 6NE 1987
2002 -999 -999.00
DBG: AUTOPAIRED: 5 5
From TMin: 7103660 5147 -12112 1069 WATCH LAKE NORTH 1987
1996 -999 -999.00
From TMax: 7103660 5147 -12112 1069 WATCH LAKE NORTH 1987
1996 -999 -999.00
DBG: AUTOPAIRED: 6 6
<END QUOTE>
Now arguably, the MILE HOUSE ABEL stations should have rolled into one of the other
MILE HOUSE ones with
a WMO code.. but the lat/lon/alt aren't close enough. Which is as intended.
Well, it *kind of* worked. Thought the resultant files aren't exactly what I'd expected:
tmx.0707241721.dat: too-small (but hey, the same size as the twin) output database
ANALYSIS
Well, LOL, the reason the output databases are so small is that every station looks like
this:
9999810 -748 10932 114 SEMPOR INDONESIA 1971 2000 -999 -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971 229 225 225 229 229-9999 223 221 222 225 224-9999
Yes - just one line of data. The write loops went from start year to start year. Ho hum :-/
Not as easy to fix as you might think, seeing as the data may well be the result of a merge
and
As for the 'unbalanced' 'lost' files: well for a start, the same error as above (just one line
of data),
then on top of that, both sets written to the same file. what time did I write that bit,
3am?!! Ecch.
33. So, as expected.. I'm gonna have to write in clauses to make use of the log, act and
mat files. I so do
not want to do this.. but not as much as I don't want to do a day's interacting again!!
Got it to work.. sort of. Turns out I had included enough information in the ACT file, and
so was able to
write auminmaxresync.for. A few teething troubles, but two new databases ('tm[n|
x].0707301343.dtb')
created with 13654 stations in each. And yes - the headers are identical :-)
Here are the header counts, demonstrating that something's still not quite right..
Original:
14355 tmn.0707021605.dtb.heads
New:
13654 tmn.0707301343.dtb.heads
Lost/merged:
Original:
14315 tmx.0702091313.dtb.heads
New:
13654 tmx.0707301343.dtb.heads
Lost/merged:
258
crua6[/cru/cruts/version_3_0/db/dtr] grep 'automerg' act.0707241721.dat | wc -l
889
..so will have to look at how the db1/2xref arrays are prepped and set in the program.
Nonetheless the
construction of the new databases looks pretty good. There's aminor problem where the
external reference
field is sometimes -999.00 and sometimes 0. Not sure which is best, probably 0, as the
field will usually
be used for reference numbers/characters rather than real data values. Used an inline perl
command to fix.
uealogin1[/cru/cruts/version_3_0/db/dtr] wc -l *.heads
14355 tmn.0707021605.dtb.heads
122 tmn.0707021605.dtb.lost.heads
579 tmn.0707021605.dtb.merg.heads
13654 tmn.0708062250.dtb.heads
14315 tmx.0702091313.dtb.heads
93 tmx.0702091313.dtb.lost.heads
570 tmx.0702091313.dtb.merg.heads
13654 tmx.0708062250.dtb.heads
Almost perfect! But unfortunately, there is a slight discrepancy, and they have a habit of
being tips of
icebergs. If you add up the header/station counts of the new tmin database, merg and lost
files, you get
13654 + 579 + 122 = 14355, the original station count. If you try the same check for
tmax, however, you get
13654 + 570 + 93 = 14317, two more than the original count! I suspected a couple of
stations were being
counted twice, so using 'comm' I looked for identical headers. Unfortunately there weren't
any!! So I have
invented two stations, hmm. Got the program to investigate, and found two stations in the
cross-reference
14010> 9596900 -4170 14710 150 CRESSY RESEARCH STAT AUSTRALIA 1971
2006 -999 91306
and
226> 0 -3570 14560 110 FINLEY (CSIRO) AUSTRALIA 2000 2001 -999
0
So in the first case, LOW HEAD has been merged with another station (#14010) AND
paired with #127.
Similarly, NARRANDERA AIRPORT has been mreged with #226 and paired with #227.
However, these apparent
merges are false! As we see in the first case, 14010 is not LOW HEAD. Similarly for the
second case.
Looking in the relevant match file from the process (mat.0707241721.dat) we find:
and
crua6[/cru/cruts/version_3_0/db/dtr] wc -l *.heads
14355 tmn.0707021605.dtb.heads
122 tmn.0707021605.dtb.lost.heads
579 tmn.0707021605.dtb.merg.heads
13654 tmn.0708071548.dtb.heads
14315 tmx.0702091313.dtb.heads
93 tmx.0702091313.dtb.lost.heads
568 tmx.0702091313.dtb.merg.heads
13654 tmx.0708071548.dtb.heads
Phew! Well the headers are identical for the two new databases:
34. So the to the real test - converting to DTR! Wrote tmnx2dtr.for, which does exactly
that. It reported
233 instances where tmin > tmax (all set to missing values) and a handful where tmin ==
tmax (no prob).
Looking at the 233 illogicals, most of the stations look as though considerable work is
needed on them.
This highlights the fact that all I've done is to synchronise the tmin and tmax databases
with each
other, and with the Australian stations - there is still a lot of data cleansing to perform at
some
Input Files
TMin: tmn.0708071548.dtb
TMax: tmx.0708071548.dtb
Output file
DTR: dtr.0708071924.dtb
Normals added:
crua6[/cru/cruts/version_3_0/db/dtr] ./addnormline
ACCEPT/REJECT (A/R): A
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/dtr] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
.dtr
dtr.0708081052.dtb
1961,1990
25
dtr.txt
1901,2006
> Operating...
> NORMALS MEAN percent STDEV percent
<END QUOTE>
So a lower pewrcentage than last time (69.0 vs. 78.9), but then, more data overall so a
better
Gridding:
IDL>
quick_interp_tdm2,1901,2006,'dtrglo/dtr.',750,gs=0.5,pts_prefix='dtrtxt/dtr.',dumpglo='d
umpglo'
crua6[/cru/cruts/version_3_0/primaries/dtr] ./glo2abs
dtr.01.1901.glo
(etc)
dtr.12.2006.glo
Finally, gridding:
Writing cru_ts_3_00.1901.1910.dtr.dat
Writing cru_ts_3_00.1911.1920.dtr.dat
Writing cru_ts_3_00.1921.1930.dtr.dat
Writing cru_ts_3_00.1931.1940.dtr.dat
Writing cru_ts_3_00.1941.1950.dtr.dat
Writing cru_ts_3_00.1951.1960.dtr.dat
Writing cru_ts_3_00.1961.1970.dtr.dat
Writing cru_ts_3_00.1971.1980.dtr.dat
Writing cru_ts_3_00.1981.1990.dtr.dat
Writing cru_ts_3_00.1991.2000.dtr.dat
Writing cru_ts_3_00.2001.2006.dtr.dat
Writing cru_ts_3_00.1901.2006.dtr.dat
35. Onto the secondaries, working from the rerun methodology (see section 20 above).
Began with temperature, using the anomaly txt files from the half-degree generation:
IDL>
quick_interp_tdm2,1901,2006,'tmpbin/tmpbin',1200,gs=2.5,dumpbin='dumpbin',pts_prefi
x='tmp0km0705101334txt/tmp.'
Then precipitation:
IDL>
quick_interp_tdm2,1901,2006,'prebin/prebin',450,gs=2.5,dumpbin='dumpbin',pts_prefix=
'pre0km0612181221txt/pre.'
Finally, dtr:
IDL>
quick_interp_tdm2,1901,2006,'dtrbin/dtrbin',50,gs=2.5,dumpbin='dumpbin',pts_prefix='d
trtxt/dtr.'
*** EEEK! Is that '50' a mistype? Meaning that anything using binary DTR will need re-
doing? (RAL, Dec 07) ***
FRS:
IDL>
frs_gts,dtr_prefix='dtrbin/dtrbin',tmp_prefix='tmpbin/tmpbin',1901,2006,outprefix='frssy
n/frssyn'
IDL>
quick_interp_tdm2,1901,2006,'frsgrid/frsgrid',750,gs=0.5,dumpglo='dumpglo',nostn=1,sy
nth_prefix='frssyn/frssyn'
crua6[/cru/cruts/version_3_0/secondaries/frs] ../glo2abs
frsgrid.01.1901.glo
(etc)
frsgrid.12.2006.glo
crua6[/cru/cruts/version_3_0/secondaries/frs] ../mergegrids
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.frs.dat
Writing cru_ts_3_00.1901.1910.frs.dat
Writing cru_ts_3_00.1911.1920.frs.dat
Writing cru_ts_3_00.1921.1930.frs.dat
Writing cru_ts_3_00.1931.1940.frs.dat
Writing cru_ts_3_00.1941.1950.frs.dat
Writing cru_ts_3_00.1951.1960.frs.dat
Writing cru_ts_3_00.1961.1970.frs.dat
Writing cru_ts_3_00.1971.1980.frs.dat
Writing cru_ts_3_00.1981.1990.frs.dat
Writing cru_ts_3_00.1991.2000.frs.dat
Writing cru_ts_3_00.2001.2006.frs.dat
RD0:
IDL> rd0_gts,1901,2006,1961,1990,outprefix='rd0syn/rd0syn',pre_prefix='prebin/prebin'
filesize= 6220800
gridsize= 0.500000
yes
filesize= 6220800
gridsize= 0.500000
1961
yes
filesize= 248832
gridsize= 2.50000
1962
yes
(etc)
2006
yes
filesize= 248832
gridsize= 2.50000
% Program caused arithmetic error: Floating divide by 0
IDL>
IDL>
quick_interp_tdm2,1901,2006,'rd0grid/rd0grid',450,gs=0.5,dumpglo='dumpglo',nostn=1,s
ynth_prefix='rd0syn/rd0syn'
crua6[/cru/cruts/version_3_0/secondaries/rd0] ../glo2abs
Enter the path and name of the normals file: forrtl: error (69): process interrupted
(SIGINT)
crua6[/cru/cruts/version_3_0/secondaries/rd0] ../glo2abs
Enter the path (if any) for the output files: rd0gridabs/
rd0grid.01.1901.glo
(etc)
rd0grid.12.2006.glo
crua6[/cru/cruts/version_3_0/secondaries/rd0] ../mergegrids
Writing cru_ts_3_00.1901.1910.rd0.dat
(etc)
I have to admit, I still don't understand secondary parameter generation. I've read the
papers, and the
miniscule amount of 'Read Me' documentation, and it just doesn't make sense. In
particular, why use 2.5
degree grids of the primaries instead of 0.5? Why deliberately lose spatial resolution,
only to have to
reinterpolate later?
No matter; on to Vapour Pressure. Here's the complete output from the initial binary
gridding,using dtr and tmp:
IDL>
vap_gts_anom,dtr_prefix='dtrbin/dtrbin',tmp_prefix='tmpbin/tmpbin',1901,2006,outprefi
x='vapsyn/vapsyn',dumpbin=1
How very useful! No idea what any of that means. although it's heartwarming to see that
it's
nothing like the results of the 2.10 rerun, where 1991 looked like this:
1991 vap (x,s2,<<,>>): 0.000493031 0.000742087 -0.0595093 1.86497
Anyway now I need to use whatever VAP station data we have. And here I'm a little
flaky (again),
the vap database hasn't been updated, is it going to be? Asked Dave L and he supplied
summaries
he'd produced of CLIMAT bulletins from 2000-2006. Slightly odd format but very useful
all the
same.
And now, a brief interlude. As we've reached the stage of thinking about secondary
variables, I
wondered about the CLIMAT updates, as one of the outstanding work items is to write
routines to
convert CLIMAT and MCDW bulletins to CRU format (so that mergedb.for can read
them). So I look at
a CLIMAT bulletin, and what's the first thing I notice? It's that there is absolutely no
station
identification information apart from the WMO code. None. No lat/lon, no name, no
country. Which
means that all the bells and whistles I built into mergedb, (though they were needed for
the db
merging of course) are surplus to requirements. The data must simply be added to
whichever station
has the same number at the start, and there's no way to check it's right. I don't appear to
have a
copy of a MCDW bulletin yet, only a PDF.. I wonder if that's the same? Anyway, back to
the main job.
As I was examining the vap database, I noticed there was a 'wet' database. Could I not use
that to
assist with rd0 generation? well.. it's not documented, but then, none of the process is so I
might
WMO BLK WMO STN STNLP MSLP TEMP VAP P DAYS RN RAIN R
QUINT SUN HRS SUN % MIN_T MAX_T
10010 7093 -866 9 JAN MAYEN(NOR NAVY) NORWAY 1990 2003 -999
-999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Tyndall Centre grim file created on 13.01.2004 at 15:22 by Dr. Tim Mitchell
Grid-ref= 1, 148
1760 1580 1790 1270 890 510 470 290 430 400 590 1160
So I guess we go with days x100. Dave's files will have to be reformatted anyway so it's a
Wrote dave2cru.for to convert Dave L's CLIMAT composites to CRU-format files in the
appropriate
units. One problem is the significant number of stations without names or countries: they
are
simply 'xxxxxxxxxx' and I'm not sure how mergedb is going to take to that! Well only
one way to
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db] ./dave2cru
<END QUOTE>
Then tried to merge that into wet.0311061611.dtb, and immediately hot formatting issues
- that pesky last
-999.00
0.00
Had a quick review of mergedb; it won't be trivial to update it to treat that field as a8. So
reluctantly,
Unfortunately, that didn't solve the problems.. as there are alphanumerics in that field
later on:
-712356 5492 -11782 665 SPRING CRK WOLVERINE CANADA 1969 1988
-999 307F0P9
So.. ***sigh***.. will have to alter mergedb.for to treat that field as alpha. Aaarrgghhh.
* *
* *
* *
* *
* Incoming: *
* Potential match: *
10010 7093 -866 9 JAN MAYEN(NOR NAVY) NORWAY 1990 2003 -999
-999
Yes, the 'wet' database features old-style 5-digit WMO codes. The best approach is
probably to alter
mergedb again, to multiply any 5-digit codes by 10. Not sure if there is a similar problem
with 7-digit
Oh, more bloody delays. Modified mergedb to 'adjust' the WMO codes, fine. But then a
proper run of it
just demonstrated that it's far too picky. Even a 0.01-degree difference in coordinates
required ops
intervention. What we need for updates is an absolute priority for WMO codes, and only
a shout if the
name or the spatial coordinates are waaay off. I am seriously considering scrapping
mergedb in favour of
than mergedb's brute-force attack, as you'd expect from a program built on top of that
knowledge. And it
does save all its actions. But I don't know that I have the wherewithal.. okay, I do.
and whistles as mergedb.for, but should be faster and more helpful all the same.
Well.. it works.. but the data doesn't. It's that old devil called WMO numbering again:
..with Master: 718000 4665 -5306 28 CAPE RACE (MARS) CANADA 1920
1969 -999 -999
Now what's happened here? Well the CLIMAT numbering only gives five digits (71 800)
and so an extra zero
has been added to bring it up to six. Unfortunately, that's the wrong thing to do, because
that's the code
of CAPE RACE. The six-digit code for NANCY/ESSEY is 071800. Mailed Phil and DL
as this could be a big
Also noticed that some of the CLIMAT data seemed to be missing, eg for
NANCY/ESSEY:
2000-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2004-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2005-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
I have the CLIMAT bulletin for 10/2006, which gives data for Rain Days (12 in this
case). It doesn't seem
I am now wondering whether it would be best to go back to the MCDW and CLIMAT
bulletins themselves and work
--
Well, information is always useful. And I probably did know this once.. long ago. All
official WMO codes
official code is available we improvise with two extra digits. Now I can't see why we
didn't leave the rest
at five digits, that would have been clear. I also can't see why, if we had to make them all
seven digits,
we extended the 'legitimate' five-digit codes by multiplying by 100, instead of adding two
numerically-
meaningless zeros at the most significant (left) end. But, that's what happened, and like
everything else
So - incoming stations with WMO codes can only match stations with codes ending '00'.
Put another way, for
comparison purposes any 7-digit codes ending '00' should be truncated to five digits.
Also got the locations of the original CLIMAT and MCDW bulletins.
http://hadobs.metoffice.com/crutem3/data/station_updates/
ftp://ftp1.ncdc.noaa.gov/pub/data/mcdw
http://www1.ncdc.noaa.gov/pub/data/mcdw/
Downloaded all CLIMAT and MCDW bulletins (CLIMAT 01/2003 to 07/2007; MCDW
01/2003 to 06/2007 (with a
Wrote mcdw2cru.for and climat2cru.for, just guess what they do, go on..
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru
Enter the latest MCDW file (or <ret> for single files): ssm0706.fin
<END QUOTE>
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt
<END QUOTE>
Of course, it wasn't quite that simple. MCDW has an inexplicably complex format, which
I'm sure will vary
over time and eventually break the converter. For instance, most text is left-justified,
except the month
names for the overdue data, which are right-justified. Also, there is no missing value
code, just blank
space if a value is absent. This necessitates reading everything as strings and then testing
for content.
Oh, and a small amount of rain is marked 'T'.. as are small departures from the mean!!
So moan over, now we have a set of updates for the secondary databases. And, indeed for
the primary ones -
except that I've already processed those, as updated by Dave L.. er.. ah well. So as I'm
running stupidly
late anyway - why not find out? It's that Imp of the Perverse on my shoulder again.
Actually as I examined all the databases in the tree to work out what was wheat and what
chaff, I had my
awful memory jogged quite nastily: WE NEED RAIN DAYS. So both conversion progs
will need adjusting and
re-running!! Waaaaah! And frankly at 18:45 on a Friday evening.. it's not gonna happen
right now.
..okay, a another week, another razorblade to slide down. Modified mcdw2cru to include
rain days:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru
Enter the latest MCDW file (or <ret> for single files): ssm0706.fin
<END QUOTE>
Checked, and the four preexisting databases match perfectly with their counterparts, so I
didn't break
anything in the adjustments. and the rdy file looks good too (actually the above is the
*final* run;
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt
<END QUOTE>
Again, existing outputs are unchanged and the new rdy file looks OK (though see
bracketed note above for MCDW).
So.. to the incorporation of these updates into the secondary databases. Oh, my.
Beginning with Rain Days, known variously as rd0, rdy, pdy.. this allowed me to modify
newmergedb.for to cope
with various 'freedoms' enjoyed by the existing databases (such as six-digit WMO codes).
And then, when run,
Here is the first 'issue' encountered by newmergedb, taken from the top and with my
comments in <anglebrackets>:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb
Should the incoming 'update' header info and data take precedence over the existing
database?
Or even vice-versa? This will significantly reduce user decisions later, but is a big step!
Enter 'U' to give Updates precedence, 'M' to give Masters precedence, 'X' for equality: U
In attempting to pair two stations, possible data incompatibilities have been found.
Master Data: Correlation with Update first year aligned to this year -v
1936 900 600 1000 800 1000 900 1300 1700 2100 1800 900 1000 0.27
1937 300 1400 1300 800 1400 1800 500 1200 1600 1000 1100 1500 0.15
1938 900 1000 1500 1800 1200 1500 1200 1700 500 700 1600 700 -0.13
1939 1500 1300 1100 1400 1200 1200 1000 1300 1800 1600 1100 1300 0.24
1940 1000 1500 1000 1200 1100 1700 2600 1500 1500 1400 1700 1100 0.15
1941 1800 1200 1000 1200 900 1100 900 1200 1900 1500 1000 1400 0.48
1942 900 900 1700 900 1600 1000 600 1100 1400 1300 700 700 0.51
1943 800 1000 1000 1300 900 800 1500 1600 1400 1500 1300 1200 0.44
1944 1000 400 900 800 1200 600 900 2000 900 1100 1000 900 0.32
1945 500 400 700 700 800 1800 900 1100 1200 1100 1300 700 0.19
1946 1200 1200 100 700 900 1200 400 900 800 1900 1300 1400 0.16
1947 900 1300 1300 1100 1600 1000 800 1400 1400 1700 2100 1900 0.09
1948 1100 1400 1400 1200 1300 1800 1200 1700 1500 2200 2100 1900 0.10
1949 1100 1100 500 1500 1600 1100 1500 1200 2200 2500 900 1600 0.04
1950 1300 800 1000 1100 1700 1200 1500 800 1100 1300 1500 1400 -0.04
1951 1100 600 1400 1400 1500 1600 2100 1300 1500 1700 2000 1700 -0.13
1952 2100 800 1100 1800 1300 1200 2400 2200 1600 1000 1000 2300 -0.23
1953 2100 1400 2100 1500 900 300 1300 1700 1500 800 1200 800 -0.24
1954 2100 600 1300 1000 1300 1700 1600 2000 1800 1300 1400 1200 -0.40
1955 2200 1300 900 1000 1600 2000 1100 1400 1000 2100 2300 1600 -0.20
1956 1300 1100 1300 400 1600 1300 900 1500 2000 1300 2000 1400 -0.30
1957 1700 1600 1100 1100 1900 1900 1400 1600 1400 1700 2300 2600 -0.27
1958 1300 2200 1900 700 1500 1200 2100 1000 1900 1700 1600 1000 -0.21
1959 2500 1800 1300 900 900 1600 1600 1500 2200 1700 1000 900 -0.33
1960 1800 1700 1500 400 1300 1500 400 1000 1300 1500 1000 1400 -0.21
1961 2100 1800 2200 1500 800 1400 1600 1100 1900 1200 1200 2100 -0.59
1962 2100 1100 1000 1500 1300 1100 1300 1700 1200 2000 1600 2300 -0.37
1963 2100 2100 2000 1000 700 2000 1400 1800 1400 1600 2000 2400 -0.56
1964 2400 1100 1000 1700 1100 1400 1400 1400 2000 1200 2100 1800 -0.42
1965 1400 2100 1300 1000 1700 1700 1400 2400 1300 2100 1900 2100 -0.41
1966 1600 1600 2000 2000 1700 1200 2000 2500 2500 2700 1600 600 -0.34
1967 2200 1700 1600 1200 1000 1400 1600 1300 1700 1500 1200 2100 -0.21
1968 1600 1800 1800 1800 1500 1800 1400 2100 1000 2000 2100 2000 -0.28
1969 1100 300 1900 1200 1000 1300 1500 1200 1200 2000 1700 800 -0.25
1970 1900 1400 1200 900 600 1200 1500 700 2300 1700 1700 2100 -0.23
1971 2000 1300 1600 1600 1200 1100 1400 1800 2000 1600 1700 1500 -0.39
1972 1300 1200 1300 1200 1700 800 1400 1800 1900 2000 1700 1600 -0.26
1973 1800 1100 1700 900 1200 1500 500 1800 1200 2000 2100 2100 -0.36
1974 1100 2400 700 1600 1300 1300 1800 2000 1900 1200 1400 2400 -0.29
1975 1500 2200 1400 1700 2500 2200 2300 1600 1700 2300 1800 2600 -0.47
1976 1900 800 1100 1500 1000 900 1300 1800 2200 1600 1400 1600 -0.33
1977 1800 1400 2200 1200 1600 1900 1300 1500 1500 1900 1500 2000 -0.40
1978 1500 1800 1400 2100 700 1000 1100 1900 1700 2300 1500 2200 -0.24
1979 1700 1700 1700 1200 1500 1800 900 1200 1800 1600 1500 2300 -0.39
1980 1900 1300 1300 1000 1400 900 700 1100 1300 1600 2200 1700 -0.36
1981 2600 500 1900 2000 800 1900 1500 2000 1400 1500 1800 1600 -0.46
1982 2200 1800 1100 1600 1500 2200 1800 1400 1700 1700 1900 1400 -0.60
1983 2400 1900 1700 1200 800 1500 1200 2000 1400 2100 2000 2500 -0.23
1984 1900 800 1500 2000 1100 1600 2000 1700 1100 1400 1000 1200
1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 0.65
1991-9999 900 500 300 700 1000 1500 700 1700 1000 1300 1300 0.54
1992 800 1000 600 500 700 900-9999 1300-9999 700 900 1200 0.60
1993 600 900 400 500 900 1500 1000 800 800 1000 400 1000 0.55
1994 1300 1000 300 600 700 1000 900 600 1200 0 1400 600 0.43
1995 900 900 600 700 700 900 1100 1300 600 1800 1300 500 0.61
1996 500 1100 400 700 700 1200 1200 1100 1100 900 1000 1400 0.54
1997 1200 800 1300 600 600 100 500 1100 900-9999 1000 900 0.61
1998 1200 1300 800 1100 1100 1100 800 600 1200 1100 600 1200 0.52
1999 600 400 600 1000 700 700 1800 1400 700 1600 800 1200 0.62
2000 1100 600 1500 1700 900 1500 800 800 1000 1000 600 600 0.40
2001 600 500 700 700 600 500 1200 1200 700 1300 900 1000 0.63
2002 1000 800 1300 200 900 1100 1400 1200 1400 1800 1100 700
2003 1100-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Update Data:
2003 1100 700 700 500 1000 400 700 1100 1200 2100 800 1900
2004 900 700 600 600 1300 1200 1000 1200 1400 900 1000 1000
2005 1000 400 800 1100 900 600 1200 1000 1600 1000 1300 1200
2006 700 500 1300 400 600 1200 1600 700 1000-9999 600 1500
2007 1400 400 400 1300 1200 1200-9999-9999-9999-9999-9999-9999
<DO YOU SEE? THERE'S THAT OH-SO FAMILIAR BLOCK OF MISSING CODES
IN THE LATE 80S,
GOOD AFTER THE BREAK, DECIDEDLY DODGY BEFORE IT. THESE ARE
TWO DIFFERENT
<END QUOTE>
So.. should I really go to town (again) and allow the Master database to be 'fixed' by this
program? Quite honestly I don't have time - but it just shows the state our data holdings
have drifted into. Who added those two series together? When? Why? Untraceable,
except
anecdotally.
It's the same story for many other Russian stations, unfortunately - meaning that
(probably)
there was a full Russian update that did no data integrity checking at all. I just hope it's
restricted to Russia!!
<BEGIN QUOTE>
<END QUOTE>
This is pretty obviously the same station (well OK.. apart from the duff early period, but
I've
got used to that now). But look at the longitude! That's probably 20km! LUckily I
selected
'Update wins' and so the metadata aren't compared. This is still going to take ages,
because although
I can match WMO codes (or should be able to), I must check that the data correlate
adequately - and
for all these stations there will be questions. I don't think it would be a good idea to take
the
usual approach of coding to avoid the situation, because (a) it will be non-trivial to code
for, and
(b) not all of the situations are the same. But I am beginning to wish I could just blindly
merge
based on WMO code.. the trouble is that then I'm continuing the approach that created
these broken
<BEGIN QUOTE>
In attempting to pair two stations, possible data incompatibilities have been found.
Master Data: Correlation with Update first year aligned to this year -v
1936 1400 800 1700 900 1200 800 700 800 1800-9999-9999-9999 0.33
1937 1400 800 500 1700 1500 800 1200 1000 1700 1300 700 1200 0.32
1938 1000 1700 1200 1100 1100 800 800 1300 1400 1900 1800 1300 0.04
1939 1100 1700 1600 1800 1500 800 1500 1900 1700 1800 1300 1300 0.09
1940 1300 700 900 900 1800 1200 900 1300 1200 2200 1900 1800 0.08
1941 1400 1100 1800 1000 1400 1900 1400 700 1300 1200 1900 2000 0.02
1942 1700 900 1600 900 1200 1500 1300 1500 1200 1900 1500 1500 -0.06
1943 1400 1300 1300 800 1400 1600 1300 1500 1900 2000 700 1900 -0.17
1944 1900 1500 2000 1100 1200 1300 1500 1700 1800 1200 1500 1900 -0.32
1945 1300 1000 1400 2100 2000 1100 1700 700 1600 1800 2300 1700 -0.42
1946 2300 1900 1500 1100 1100 2000 1800 1000 1200 2100 2000 1800 -0.35
1947 1900 1400 1600 1000 2100 1900 2100 1000 1200 2000 2100 1500 -0.35
1948 1700 1500 1800 800 1300 1800 1700 1300 1800 2200 2000 2100 -0.15
1949 2300 2100 1000 700 1600 1400 1200 800 2100 2000 1100 1400 -0.07
1950 2100 2300 1000 1100 1500 1600 1600 2300 1900 1200 1100 1500 0.00
1951 1600 1000 1500 800 1500 1400 1200 600 1800 1800 1400 2400 -0.07
1952 1600 400 1100 1300 1100 1400 800 2000 1500 2300 1300 1600 -0.04
1953 2000 1200 1500 500 1300 1500 1100 1200 2300 2200 1600 2100 -0.02
1954 1700 1800 700 700 1000 1300 1200 1600 2000 1800 1800 600 0.01
1955 2400 1400 1000 1100 1700 1200 1000 1300 1500 1300 2300 1600 -0.08
1956 1300 800 1000 1100 1000 1000 1400 1800 1900 1900 2600 2000 -0.29
1957 1900 1200 1700 1000 1100 1100 1100 700 800 2300 1900 2200 -0.18
1958 1300 1600 1500 400 1500 1100 1300 1400 1900 2400 2000 1600 -0.28
1959 1700 1600 700 1300 1700 1100 1100 1600 2000 2100 1900 1600 -0.04
1962 1700 800 1200 600 400 1100 900 2000 1100 1900 1700 1500 0.25
1963 1200 1300 1700 700 1100 1600 900 1000 1100 1400 1800 2000 -0.04
1964 1900 500 1300 1300 1200 1200 1100 1100 1700 1500 2000 1800 0.13
1965 1200 1400 700 900 1200 1100 1300 1400 1800 2500 1000 1700 0.23
1966 1800 1600 2100 1300 1500 2100 900 1800 1500 2400 1900 800 0.11
1967 1600 1200 1100 600 800 1100 1100 700 1300 1200 1300 1900 0.39
1968 1600 1400 1600 1200 900 1300 1400 1000 1700 1300 1400 1200 0.24
1969 900 1000 1100 1500 1700 1700 1000 1800 1200 1400 1900 1300 0.04
1970 1500 1200 1600 1400 700 1600 700 1600 1000 1500 1900 1600 -0.02
1971 1700 400 1100 1700 1300 1700 700 2000 900 2100 2000 1900 -0.11
1972 1200 1500 1400 800 1700 1300 1700 2000 2100 1700 2500 1900 -0.08
1973 1200 1100 1100 700 800 1300 2100 1000 2400 1900 1800 2300 -0.11
1974 700 1200 1800 1800 1400 1200 1000 1300 1100 1600 1900 700 -0.14
1975 2200 1800 1400 1300 1500 1500 1400 1500 1400 2300 1900 2100 -0.15
1976 2000 1500 600 700 1100 1600 1300 1100 1500 1800 1600 1200 -0.11
1977 1900 1700 1800 1400 1000 1100 1000 1300 1500 1800 1700 2100 -0.15
1978 1600 1000 800 1400 1400 800 1600 1600 2300 2200 2200 1800 0.03
1979 1600 1600 1600 900 900 1900 1200 1700 1200 2100 1600 2000 0.00
1980 1600 1200 500 800 1500 1100 800 1700 1200 600 2200 2200 -0.05
1981 2000 1000 1700 1300 1500 1100 800 400 1500 800 1500 1900 0.06
1982 2400 1800 1100 1200 1200 1100 1000 1700 1200 2100 1800 2000 0.03
1983 2500 2100 1800 1300 1400 1200 1200 1300 1300 1900 2300 1900 0.10
1984 1200 700 500 1300 900 800 1100 1000 1700 1600 1600 1300
Update Data:
2003 1500 900 600 400 900 1200 500 700 1100 600 700 1500
2004 700 600 700 400 600 1100 500 900 900 1400 1500 600
2005 700 400 800 1400 300 900 800 800 900 500 1200 600
2006 800 700 900 1000 800 500 1000 500 1300 1100 700 1600
<END QUOTE>
Here, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)
If an update station matches a 'master' station by WMO code, but the data is unpalatably
<BEGIN QUOTE>
You have failed a match despite the WMO codes matching.
3. Give existing station a false code, and make the update the new WMO station.
Enter 1,2 or 3:
<END QUOTE>
You can't imagine what this has cost me - to actually allow the operator to assign false
WMO codes!! But what else is there in such situations? Especially when dealing with a
'Master'
database of dubious provenance (which, er, they all are and always will be).
False codes will be obtained by multiplying the legitimate code (5 digits) by 100, then
adding
1 at a time until a number is found with no matches in the database. THIS IS NOT
PERFECT but as
there is no central repository for WMO codes - especially made-up ones - we'll have to
chance
duplicating one that's present in one of the other databases. In any case, anyone
comparing WMO
codes between databases - something I've studiously avoided doing except for tmin/tmax
where I
had to - will be treating the false codes with suspicion anyway. Hopefully.
Of course, option 3 cannot be offered for CLIMAT bulletins, there being no metadata
with which
This still meant an awful lot of encounters with naughty Master stations, when really I
suspect
nobody else gives a hoot about. So with a somewhat cynical shrug, I added the nuclear
option -
to match every WMO possible, and turn the rest into new stations (er, CLIMAT
excepted). In other
words, what CRU usually do. It will allow bad databases to pass unnoticed, and good
databases to
become bad, but I really don't think people care enough to fix 'em, and it's the main
reason the
And there are STILL WMO code problems!!! Let's try again with the issue. Let's look at
the first
station in most of the databases, JAN MAYEN. Here it is in various recent databases:
As we can see, even I'm cocking it up! Though recoverably. DTR, TMN and TMX need
to be written as (i7.7).
You see? The leading zero's been lost (presumably through writing as i7) and then a zero
has been added at
the trailing end. So it's a 5-digi WMO code BUT NOT THE RIGHT ONE.
Aaaarrrgghhhhhh!!!!!!
Actually, a brief interlude to churn out the tmin & tmax primaries, which got sort-of
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.tmn
tmn.0708071548.dtb
1961,1990
25
tmn.txt
1901,2006
> Operating...
#####
IDL>
quick_interp_tdm2,1901,2006,'tmnglo/tmn.',750,gs=0.5,pts_prefix='tmntxt/tmn.',dumpgl
o='dumpglo'
#####
Enter the path and name of the normals file: gunzip clim.6190.lan.tmn
Enter the path (if any) for the output files: tmnabs
tmn.01.1901.glo
(etc)
tmn.12.2006.glo
#####
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmn.dat
Writing cru_ts_3_00.1901.1910.tmn.dat
(etc)
#####
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.tmx
tmx.0708071548.dtb
1961,1990
25
1901,2006
> Operating...
#####
IDL>
quick_interp_tdm2,1901,2006,'tmxglo/tmx.',750,gs=0.5,pts_prefix='tmxtxt/tmx.',dumpgl
o='dumpglo'
#####
Enter the path (if any) for the output files: tmxabs
tmx.01.1901.glo
(etc)
tmx.12.2006.glo
#####
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmx.dat
Writing cru_ts_3_00.1901.1910.tmx.dat
(etc)
the intermediate products - which would have made my detective work easier. The
ridiculous process
stage, none of which are automatically zipped/unzipped. Crazy. I've filled a 100gb disk!
So, anyway, back on Earth I wrote wmocmp.for, a program to - you guessed it - compare
WMO codes from
<BEGIN QUOTE>
REPORT:
Database Title Exact Match Close Match Vague Match Awful Match Codes
Added WMO = 0
<END QUOTE>
So the largest database, precip, contained 14397 stations with usable WMO codes (and
1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin
100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean
fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.
The big story is the need to fix the tmean WMO codes. For instance:
10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
01001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0001001 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0100100 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
I favour the first as it's technically accurate. Alternatively we seem to have widely
adopted the third, which
at least has the virtue of being consistent. Of course it's the only one that will match the
precip:
100100 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00
01001 7093 -867 10 JAN MAYEN NORWAY 1921 2006 -999 -999.00
Aaaaarrrggghhhh!!!!
And the reason this is so important is that the incoming updates will rely PRIMARILY
on matching the WMO codes!
of 'qenuine WMO codes'.. and wouldn't you know it, I've found four!
The strategy is to use Dave Lister's list, grabbing country names from the Dresden list.
Wrote
Wrote 'fixwmos.for' - probably not for the first time, but it's the first prog of that name in
my repository so I'll
have to hope for the best. After an unreasonable amount of teething troubles (due to my
forgetting that the tmp
database stores lats & lons in degs*100 not degs*10, and also to the presence of a '-
99999' as the lon for GUATEMALA
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos
2263 WMO Codes were 'fixed' and all were rewritten as (i7.7)
crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>
The first records have changed as follows:
1c1
< 10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
---
> 0100100 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
So far so good.. but records that weren't matched with the reference set didn't fare so
well:
89c89
< 10050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912
-999.00
---
> 0010050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912
-999.00
This is misleading because, although there probably won't BE any incoming updates for
ISFJORD RADIO, we can't say for
certain that there will never be updates for any station outside the current reference set. In
fact, we can say with
careful use of verb) to use the country codes database to work out the most significant
'real' digits of these codes?
Well, I fancy the first one. We'll make two passes through the data, the first pass changes
nothing but saves counts of
the successful factors in bins: *0.01, *0.1, *1, *10, *100 should do it. I sure hope all the
results are in one bin!
It worked. An initial 'verbose' run showed a consistent choice of factor, though it'll exit
with an error code if multiple
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos
crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>
Example results:
<BEGIN QUOTE>
1c1
< 10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
---
> 0100100 7090 -870 10 Jan Mayen NORWAY 1921 2006 341921
-999.00
89c89
< 10050 780 142 9 ISFJORD RADIO NORWAY 1912 1979 101912
-999.00
---
> 0100500 7800 1420 9 ISFJORD RADIO NORWAY 1912 1979 101912
-999.00
159c159
< 10080 783 155 28 Svalbard Lufthavn NORWAY 1911 2006 341911
-999.00
---
> 0100800 7830 1550 28 Svalbard Lufthavn NORWAY 1911 2006 341911
-999.00
<END QUOTE>
Then.. attacked the wet database! And immediately found this beauty:
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1994 500 800 600 400 600 100 0 100 200 400 1000 1300
1995 400 100 1100 900 1200 800 200 100 200 400 800 500
1996 500 1100 1500 600 900-9999 0 300 400 700 0 1100
1997 800 1000 700 1000 1000 1000 200 200 400 700 200 1000
1998 700 700 1000 1000-9999 800 100 100 0 200 400 700
1999 300 1000 800-9999 700 800 0 200-9999 600 400 200
2000 1100 600 900 900 1000 400-9999 100 200 300 0 400
2002 800 300 600 1300 800 500 400 100 300 400 400 600
2003 300-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Gotta love the system! Like this is ever going to be a blind bit of use. Modified the code
to
leave such stations unmolested, but identified in a separate file so they can be 'cleansed',
it
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/rd0] ./fixwmos
crua6[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
I then removed the sole illegal (see above) from wet.0710021341.dtb, which becomes the
'new old'
wet/rd0 database.
So.. to incorporate the updates! Finally. First, the MCDW, metadata-rich ones:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb
ian - do
you want the quick and dirty approach? This will blindly match
* new header 0100100 7056 -840 9 JAN MAYEN NORWAY 1990 2007
-999 -999 *
Writing wet.0710041559.dtb
OUTPUT(S) WRITTEN
(automatically: 1556)
(by operator: 0)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
latitude band for a given database - needs a Matlab prog to drive really)
[a bit of debugging here as the last records weren't being written properly,
*WARNING: ignore this, the CLIMAT bulletins were later improved with metadata and
newmergedb rerun*
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb
you want the quick and dirty approach? This will blindly match
Writing wet.0710081508.dtb
OUTPUT(S) WRITTEN
(automatically: 2498)
(by operator: 0)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
Now of course, we can't add any of the CLIMAT bulletin stations as 'new' stations
because we don't have any metadata! so.. is it worth using the lookup table? Because
although I'm thrilled at the high match rate (87%!), it does seem worse when you
At this stage I knocked up rrstats.for and the visualisation companion tool, cmprr.m. A
simple process
to show station counts against time for each 10-degree latitude band (with 20-degree
bands at the
North and South extremities). A bit basic and needs more work - but good for a quick &
dirty check.
Wrote dllist2headers.for to convert the 'Dave Lister' WMO list to CRU header format -
the main difficulty
being the accurate conversion of the two-character 'country codes' - especially since
many are actually
state codes for the US! Ended up with wmo.0710151633.dat as our reference WMO set.
Incorporated the reference WMO set into climat2cru.for. Successfully reprocessed the
CLIMAT bulletins
pre.0710151817.dtb
rdy.0710151817.dtb
sun.0710151817.dtb
tmn.0710151817.dtb
tmp.0710151817.dtb
tmx.0710151817.dtb
vap.0710151817.dtb
In fact, it was far more successful than I expected - only 11 stations out of 2878 without
metadata!
Re-ran newmergedb:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb
you want the quick and dirty approach? This will blindly match
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
(automatically: 2498)
(by operator: 0)
> Rejected: 71
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>
307 stations rescued! and they'll be there in future of course, for metadata-free CLIMAT
bulletins
to match with.
wet.0311061611.dtb
wet.0710161148.dtb
Now it gets tough. The current model for a secondary is that it is derived from one or
more primaries,
The IDL secondary generators do not allow 'genuine' secondary data to be incorporated.
This would have
been ideal, as the gradual increase in observations would have gradually taken
precedence over the
primary-derived synthetics.
The current stats for the wet database were derived from the new proglet, dtbstats.for:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./dtbstat
Total: 6143
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>
So, without further ado, I treated RD0 as a Primary and derived gridded output from the
database:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.rd0
wet.0710161148.dtb
1961,1990
25
1901,2007
> Operating...
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>
Not particularly good - the bulk of the data being recent, less than half had valid normals
(anomdtb
calculates normals on the fly, on a per-month basis). However, this isn't so much of a
problem as the
IDL>
quick_interp_tdm2,1901,2007,'rd0glo/rd0.',450,gs=0.5,dumpglo='dumpglo',pts_prefix='r
d0txt/rd0.'
Defaults set
1901
1902
(etc)
2007
IDL>
<END QUOTE>
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Enter the path (if any) for the output files: rd0abs/
rd0.01.1901.glo
(etc)
<END QUOTE>
Then.. wait a minute! I checked back, and sure enough, quick_interp_tdm.pro DOES
allow both synthetic and 'real' data
<BEGIN QUOTE>
; TDM: the dummy grid points default to zero, but if the synth_prefix files are present in
call,
; the synthetic data from these grids are read in and used instead
<END QUOTE>
And so.. (after some confusion, and renaming so that anomdtb selects percentage
anomalies)..
IDL>
quick_interp_tdm2,1901,2006,'rd0pcglo/rd0pc',450,gs=0.5,dumpglo='dumpglo',synth_pr
efix='rd0syn/rd0syn',pts_prefix='rd0pctxt/rd0pc.'
The trouble is, we won't be able to produce reliable station count files this way. Or can
we use the same strategy,
producing station counts from the wet database route, and filling in 'gaps' with the precip
station counts? Err.
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Enter the path (if any) for the output files: rd0pcgloabs/
rd0pc.01.1901.glo
(etc)
<END QUOTE>
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./mergegrids
Writing cru_ts_3_00.1901.1910.rd0.dat
Writing cru_ts_3_00.1911.1920.rd0.dat
Writing cru_ts_3_00.1921.1930.rd0.dat
Writing cru_ts_3_00.1931.1940.rd0.dat
Writing cru_ts_3_00.1941.1950.rd0.dat
Writing cru_ts_3_00.1951.1960.rd0.dat
Writing cru_ts_3_00.1961.1970.rd0.dat
Writing cru_ts_3_00.1971.1980.rd0.dat
Writing cru_ts_3_00.1981.1990.rd0.dat
Writing cru_ts_3_00.1991.2000.rd0.dat
Writing cru_ts_3_00.2001.2006.rd0.dat
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>
Minimum = 0
Maximum = 32630
Vals >31000 = 1
For the whole of 2001:
Minimum = 0
Maximum = 56763
Vals >31000 = 5
Not good. We're out by a factor of at least 10, though the extremes are few enough to just
cap at DiM. So where has
Minimum = 0
Maximum = 3050
Vals >3100 = 0
That all seems fine for a percentage normals set. Not entirly sure about 0 though.
Minimum = -48.046
Maximum = 0.0129
This leads to a show-stopper, I'm afraid. It looks as though the calculation I'm using for
percentage anomalies is,
absgrid(ilon(i),ilat(i)) = nint(normals(i,imo) +
DataA(XAYear,XMonth,XAStn) =
nint(1000.0*((real(DataA(XAYear,XMonth,XAStn)) / &
real(NormMean(XMonth,XAStn)))-1.0))
This could well explain things. It could also mean that I have to reproduce v3.00 precip
AFTER it's been used (against
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Enter the path (if any) for the output files: rd0pcgloabs
rd0pc.01.1901.glo
(etc)
<END QUOTE>
Minimum = 0
Maximum = 5090 (a little high but not fatal)
Vals >4000 = 2 (so the bulk of the excessions are only a few days over)
So, good news - but only in the sense that I've found the error. Bad news in that it's a
further confirmation that my
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Enter the path (if any) for the output files: pre0km0612181221abs/
pregrid.01.1901.glo
(etc)
<END QUOTE>
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./mergegrids
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.pre.dat
Writing cru_ts_3_00.1901.1910.pre.dat
Writing cru_ts_3_00.1911.1920.pre.dat
Writing cru_ts_3_00.1921.1930.pre.dat
Writing cru_ts_3_00.1931.1940.pre.dat
Writing cru_ts_3_00.1941.1950.pre.dat
Writing cru_ts_3_00.1951.1960.pre.dat
Writing cru_ts_3_00.1961.1970.pre.dat
Writing cru_ts_3_00.1971.1980.pre.dat
Writing cru_ts_3_00.1981.1990.pre.dat
Writing cru_ts_3_00.1991.2000.pre.dat
Writing cru_ts_3_00.2001.2006.pre.dat
crua6[/cru/cruts/version_3_0/primaries/precip]
<END QUOTE>
Then back to finish off rd0. Modified glo2abs to allow the operator to set minima and
maxima, with a
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Enter the path (if any) for the output files: rd0pcgloabs/
Choose: 3
rd0pc.01.1901.glo
(etc)
<END QUOTE>
Output was checked.. and as expected, January 2001 had 556 values of 3100 :-)
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./mergegrids
Welcome! This is the MERGEGRIDS program.
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.rd0.dat
Writing cru_ts_3_00.1901.1910.rd0.dat
Writing cru_ts_3_00.1911.1920.rd0.dat
Writing cru_ts_3_00.1921.1930.rd0.dat
Writing cru_ts_3_00.1931.1940.rd0.dat
Writing cru_ts_3_00.1941.1950.rd0.dat
Writing cru_ts_3_00.1951.1960.rd0.dat
Writing cru_ts_3_00.1961.1970.rd0.dat
Writing cru_ts_3_00.1971.1980.rd0.dat
Writing cru_ts_3_00.1981.1990.rd0.dat
Writing cru_ts_3_00.1991.2000.rd0.dat
Writing cru_ts_3_00.2001.2006.rd0.dat
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>
We have:
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/vap] ./newmergedb
you want the quick and dirty approach? This will blindly match
Writing vap.0710241541.dtb
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
(by operator: 0)
> Rejected: 2
uealogin1[/cru/cruts/version_3_0/db/vap]
<END QUOTE>
<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/vap] ./newmergedb
you want the quick and dirty approach? This will blindly match
Writing vap.0710241549.dtb
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
(automatically: 2599)
(by operator: 0)
uealogin1[/cru/cruts/version_3_0/db/vap]
<END QUOTE>
So, not as good as the MCDW update.. lost 68.. but then of course we are talking about
station data that
So we will try the unaltered rd0 process on vap. It should be the same; a mix of synthetic
and observed.
***********************************************************************
**************
***********************************************************************
**************
Vapour Pressure data was observed, I discovered that some of the Wet Days and Vepour
Pressure datasets had been
swapped!!
So I wrote crutsstats.for, whioch returns monthly and annual minima, maxima and means
for any gridded output file.
So the monthly maxima are fine here. But for the decadal files?
Much confusion! The orders of magnitude have changed to reflect the expected ranges -
but the data have clearly been swapped!
Another decade:
crua6[/cru/cruts/vap_wet_investigation]cat cru_ts_2_10.1921-1930.vap.grid.stats
It looks like a consistent problem: all the decadal VAp and WET files should be
discarded, and only the 'full run' 1901-2002
files used. But my theory that the error occurred when the 1901-2002 files were
converted to decadal doesn't sound true now,
because why would the precision levels change? Surely, if the decadal files are derived
from the 1901-2002 files, it's just
VAP:
WET:
VAP:
WET:
It's evident that the data have not only been swapped - they've been scaled too.
Aaaarrrgghhhhhh!!!!!
***********************************************************************
********
* PRIORITY INTERRUPT ENDS * PRIORITY INTERRUPT ENDS * PRIORITY
INTERRUPT ENDS *
***********************************************************************
********
Original: vap.0311181410.dtb
MCDW: vap.0709111032.dtb
Intermediate: vap.0710241541.dtb
CLIMAT: vap.0710151817.dtb
Final: vap.0710241549.dtb
Produce anomalies:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.vap
vap.0710241549.dtb
1961,1990
25
vap.txt
1901,2006
> Operating...
> NORMALS MEAN percent STDEV percent
crua6[/cru/cruts/version_3_0/secondaries/vap]
<END_QUOTE>
Well.. 47% accepted, 53% no normals.. pretty much as expected, and unlikely to improve
no matter how many new CLIMAT
and MCDW updates there are. We need back data for 1961-1990.
Synthetic production:
<BEGIN_QUOTE>
IDL>
vap_gts_anom,dtr_prefix='../dtrbin/dtrbin',tmp_prefix='../tmpbin/tmpbin',1901,2006,outp
refix='vapsyn/vapsyn',dumpbin=1
(etc)
<END_QUOTE>
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synth_prefi
x='vapsyn/vapsyn',pts_prefix='vaptxt/vap.'
Create absolute grids from anomaly grids:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 1
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.YYYY.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>
Ah - and I was really hoping this time that it would just WORK. But of course not -
nothing works first
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats
(etc)
<END_QUOTE>
What?! Every year has the same min (fine, VAP of 0 is probably impossible), max (I can
just about believe,
if there's a cell with no stations inside the cdd and the normal for it happens to be the
highest value, and
MEAN (oh no, NO WAY!). What's odder - the .glo files are different:
56
Admittedly, 56 lines different out of 360 isn't hugely different. And looking, they are
only slight and
infrequent differences. But the monthly stats are all cloned as well:
without the 'zero minimum' flag (just in case I coded that badly, I was in a hurry):
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>
Sadly, that gave the same result. So what of the published (v2.10) VAP dataset? That
looks ~ok:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats
(etc)
<END_QUOTE>
Not good at all. Or, rather, good that it must be a solvable problem. Except that it's 10 to
5 on a Sunday
Where to start? Well, retrace your steps, that's how you get out of a minefield. So first up,
to compare
similar months in the anomaly files. Though I already know what I'm going to find, don't
I? Because glo2abs
isn't going to do anything unusual, it just adds the normal and there you go. So if the
absolutes are very
similar, the anomalies will be, too.. hmm. Well, I *suppose* I could try producing two
more copies of the
output files - one with just synthetic data and one with just observed data? It's only a
couple of re-runs
IDL>
quick_interp_tdm2,1901,2006,'vapsynglo/vapsyn.',1000,gs=0.5,dumpglo='dumpglo',nost
n=1,synth_prefix='vapsyn/vapsyn'
crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./glo2abs
Enter the path (if any) for the output files: vapsynabs/
vapsyn.01.1901.glo
vapsyn.02.1901.glo
(etc)
crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./mergegrids
Welcome! This is the MERGEGRIDS program.
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.syn.dat
Writing cru_ts_3_00.1901.1910.vap.syn.dat
Writing cru_ts_3_00.1911.1920.vap.syn.dat
Writing cru_ts_3_00.1921.1930.vap.syn.dat
Writing cru_ts_3_00.1931.1940.vap.syn.dat
Writing cru_ts_3_00.1941.1950.vap.syn.dat
Writing cru_ts_3_00.1951.1960.vap.syn.dat
Writing cru_ts_3_00.1961.1970.vap.syn.dat
Writing cru_ts_3_00.1971.1980.vap.syn.dat
Writing cru_ts_3_00.1981.1990.vap.syn.dat
Writing cru_ts_3_00.1991.2000.vap.syn.dat
Writing cru_ts_3_00.2001.2006.vap.syn.dat
<END_QUOTE>
<BEGIN_QUOTE>
IDL>
quick_interp_tdm2,1901,2006,'vapobsglo/vapobs.',1000,gs=0.5,dumpglo='dumpglo',pts_
prefix='vaptxt/vap.'
crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./glo2abs
Enter the path (if any) for the output files: vapobsabs/
vapobs.01.1901.glo
vapobs.02.1901.glo
(etc)
crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./mergegrids
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.obs.dat
Writing cru_ts_3_00.1901.1910.vap.obs.dat
Writing cru_ts_3_00.1911.1920.vap.obs.dat
Writing cru_ts_3_00.1921.1930.vap.obs.dat
Writing cru_ts_3_00.1931.1940.vap.obs.dat
Writing cru_ts_3_00.1941.1950.vap.obs.dat
Writing cru_ts_3_00.1951.1960.vap.obs.dat
Writing cru_ts_3_00.1961.1970.vap.obs.dat
Writing cru_ts_3_00.1971.1980.vap.obs.dat
Writing cru_ts_3_00.1981.1990.vap.obs.dat
Writing cru_ts_3_00.1991.2000.vap.obs.dat
Writing cru_ts_3_00.2001.2006.vap.obs.dat
<END_QUOTE>
Synthetic-only:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./crutsstats
(etc)
<END_QUOTE>
Observed-only:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./crutsstats
(etc)
<END_QUOTE>
Oh, GOD. What is going on? Are we data sparse and just looking at the climatology?
How can a synthetic
dataset derived from tmp and dtr produce the same statistics as an 'real' dataset derived
from observations?
IDL>
quick_interp_tdm2,1901,2006,'vapsynglo/vapsyn.',1000,gs=0.5,dumpglo='dumpglo',nost
n=1,synth_prefix='vapsyn/vapsyn'
IDL>
quick_interp_tdm2,1901,2006,'vapobsglo/vapobs.',1000,gs=0.5,dumpglo='dumpglo',pts_
prefix='vaptxt/vap.'
Well they look fine. The synthetic run has no other data inputs ('nostn=1'), and the
observed run has no references to
the synthetic data. So.. either quick_interp_tdm2.pro is doing something 'unusual', or, or..
hang on, let's try the
climatology for stats:
Ah, Bingo was his name-o! as I was hoping (well OK it's a bad kind of hope), the reason
it's all the same is that it is
by and large defaulting to the climatology. Which means that not much (any?) data is
getting through, no matter if we
use synthetic, observed, or both together. What's odd about that conclusion is that the
synthetic data is derived from
TMP and DTR - two very well-populated datasets! So synthetics alone should pretty
much fill the.. hang on, just though
of something horrendous.. oh, okay, probably not that. I was wondering if glo2abs.for
was factoring the normals so that
absgrid(ilon(i),ilat(i)) =
* nint(anoms(ilon(i),ilat(i))*10) + normals(i,imo)
..so the anomaly is getting the weight! But still - - not a wise thing to leave to automatics.
So glo2abs should prompt
the user.. but with what? Just one anomaly and normal? Several? The same one from
different timesteps? Eeek. Let's look
January 1961, lines 11103, 11104 in the glo file (11099, 11100 without header, putting it
on about 33.5 degs N)
Those anomalies are mighty tiny, given that the absolutes are three-digit integers! Hardly
surprising they're not really
appearing on the radar when added to normals typically two orders of magnitude higher!
Even with the *10 in the glo2abs
Looked at the observed anomalies (output from anomdtb.f90) - here the anomalies are
larger! Between -5 and +5, roughly,
So the binary file anomaly units - the ones we're using - are in hPa*10. Let's get one o'
them synthetic glo files:
IDL>
vap_gts_anom,dtr_prefix='../dtrbin/dtrbin',tmp_prefix='../tmpbin/tmpbin',1961,1961,outp
refix='vapsynglo/vapsyn.',dumpglo=1
For Jan 1961 (may as well stick with it), -999 is the missing value code. The range is
-0.0149 to +0.0222 (remember this is
an anomaly in hPa according to the program comment). So if it's telling the truth, the
binary anomalies presented to
quick_interp_tdm2.pro will range from roughly -0.3 to +0.3. still nt going to impinge on
normals between 1 and 358, is it?
Tyndall Centre grim file created on 12.01.2004 at 11:47 by Dr. Tim Mitchell
291 294 296 293 287 279 265 262 271 279 286 287
Grid-ref= 1, 311
14 11 13 21 44 69 92 90 65 37 22 14
Grid-ref= 1, 312
13 10 12 20 43 67 90 87 63 35 21 13
That's what I've been missing! D'oh. That '[Multi= 0.1000]'. That would still only give a
range of 0.1 to 35.8 hPa, and
Two things, then. Firstly to get glo2abs to read the multiplicative factor from the
climatology header and impose it on the
output. Secondly to work out why all the anomalies have different magnitudes! Or is
vapour pressure really so teeny?
Working on glo2abs. Well my theory for additive anomalies is this: I read in the normals,
and apply the multiplicative factor
in the header (for VAP it's 0.1). I assume the anomalies are already in the relevant units
(ie require no factoring). This
looks to be the case for .txt files anyway. So I can add the anomaly to the adjusted
normal. Then (because I need integer
output) I can DIVIDE by the factor (because that got us from integer to real before). Fine
in theory but it all depends on
the anomalies being in regular 'units' (why wouldn't they be? They're reals!). OK, check
from the beginning, obs first:
Database: hPa*10 (typically 3-digit integers)
<BEGIN_QUOTE>
Factor = 0.1
<END_QUOTE>
And how does anomdtb.f90 use the Factor? well in the original version:
<BEGIN_QUOTE>
OpStDev = Factor*sqrt((OpEn/(OpEn-1))*((OpTotSq/OpEn)-
((OpTot/OpEn)**2)))
OpMean = Factor*(OpTot/OpEn)
ALat(XAStn),ALon(XAStn),AElv(XAStn),real(DataA(XAYear,XMonth,XAStn))
*Factor,AStn(XAStn)
<END_QUOTE>
I *think* the factor is being used multiplicatively. I don't understand why it's being used
as a divisor though.. I must
have understood last December because I managed to rewrite the 'standard deviation'
section, also using it as a divisor!
One obvious thing to try is to use the revised glo2abs. That should now be working in
'units' (but saving in whatever
range the normals are in). After that I could try comparing the old and 'new' (ie modded
by me) versions of anomdtb.f90
So, I revised glo2abs. It now reads the 'Multi' factor from the climatology header, and
applies it to the normals before
they're used.
So, re-ran quick_interp+tdm2.pro:
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synth_prefi
x='vapsyn/vapsyn',pts_prefix='vaptxt/vap.'
A sample of the outputs, vap.12.1962.glo, had a range of values from -2.3006 to +1.8388,
with the majority being 0. A total
of 56387 cells were nonzero, which given that there are 67420 land cells, isn't too bad.
It's a pretty gaussian distribution,
too. It still seems like a small variation (typically +/- 0.5). For the cell where I live
(Norwich, 363,286), the normals are:
Or in hPa:
7.1 6.9 7.6 8.6 10.7 12.9 14.7 14.9 13.5 11.5 8.8 7.7
The nearest station (well based on a quick search) is LOWESTOFT. Taking 1962 and
1963 and scaling:
62 7.6 6.9 6.5 9.2 10.9 12.6 14.4 15.0 13.6 12.3 8.9 6.5
63 5.4 5.5 7.9 9.9 11.1 14.8 15.8 15.1 14.6 11.7 10.3 6.9
The ranges:
2.2 1.4 1.4 0.7 0.2 2.2 1.4 0.1 1.0 0.6 1.4 0.4
Well our sample December 1962 range of anomalies was -2.3006 to +1.8388, and the
January range is -3.3640 to +2.1250. So, I
have to admit, that's the same order of magnitude for our particular cell, year and
month(s).
So, assuming these .glo files are OK, we'll try glo2abs again:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 1
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>
Max 315
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 4
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>
Result for December 1962: Min 1, Max 315. A good spread of values, without a
disproportionate number of '1's, I'm please
to say.
So, to generate the output files. Again.
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>
And what of the statistics. Well by now I've realised that we don't have complete
coverage! So the normals are
bound to poke through quite a bit. In fact, the story is as it was in the beginning! *cries*
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats
(etc)
<END_QUOTE>
Now admittedly, the 106 mean does vary.. it hioits the dizzying heights of 107 on
occasion! With a couple of 105s
thrown in to balance the books. Had a look at the stats in detail, compared to those for
CRU TS 2.10. And guess
what? Yes.. the old stats are better! Here's the first decade:
CRU TS 2.10
CRU TS 3.00
CRU TS 2.10
CRU TS 3.00
I DON'T UNDERSTAND!!!!!
Well, OK - I see that a VAP of zero is acceptable. Though as it's a pressure, I don't
believe it! I'll stick with 1.
The issue is that the earlier dataset has a variability (in the maximum) that we just don't
have in the new one. And
I feel that I've been through every bloody phase of the process and checked we're doing it
right!!!
~~~
Right. Let's look at the distributions of values in each dataset. We'll take Jan 1910 and
Jun 2000. And as this is
Offsets. Well each month has 360 lines, so each year has 4320 lines. So for Jan 1910 we
need to skip nine years,
or 38880 lines, then take the next 360. For Jun 2000 we need to skip 99 years, or 427680
lines, then another five
I loaded the resultant monthly files into Matlab, and played with them mercilessly.
Well to start with, they all look the same. Truly. I've got a 4-plot page with TS 2.10 in the
left-hand column,
and TS 3.00 on the right. January 1910 on the top, June 2000 on the bottom. and they
look pretty much inseparable,
though if I had to Spot The Difference, the TS 2.10 June 2000 distribution is a little
flatter (that is, the
massive spike at the low end is a little shorter, and the rest of the entourage are a little
taller.
What are particularly worthy of note are the maximums. Because they don't match those
produced by crutsstats.for.
Not entirely sure why the latter ones would be wrong. But I suspect crutsstats - because
otherwise I miscounted
the line numbers to extract June 2000 with! Actually, OK, that does seem more likely.
Let's try it from the 1991-2000 files. The offset will be 9*4320 + 5*360 + 360 = 41040.
gunzip -c
/cru/cruts/fromtyn1/data/cru_ts_2.10/newly_gridded/data_dec/cru_ts_2_10.1991-
2000.vap.grid.gz | head -41040 | tail -360 > cru_ts_2_10.Jun.2000.vap.dat
Well - looks like I did miscount, because the new files are different! And so are the
Maxima:
..so almost perfect. At least the stats for the file I'm creating match.
And now the June 2000 histograms are much more interesting! And of course (for this is
THIS project), much
more worrying. The June 2000 plot for the new data (3.00) shows a fall at VAP ->0. This
is in contrast to the
other three, which show a more expotential decline from a high near 0 (though admittedly
the 2.10 version does have a second
peak at around 120). In fact, the June 2000 3.00 series has peaks at ~90 and ~300! Oh,
help.
The big question must be, why does it have so little representation in the low numbers?
Especially given that I'm rounding
Oh, sod it. It'll do. I don't think I can justify spending any longer on a dataset, the
previous version of which was
So.. one week to go before handover, and I'm just STARTING the Sun/Cloud parameter,
the one I thought would cause the most
trouble! Oh, boy. Let's try and work out the scenario.
Historically, we've issued Cloud:
Tyndall Centre grim file created on 22.01.2004 at 13:52 by Dr. Tim Mitchell
CRU TS 2.1
Grid-ref= 1, 148
725 750 750 700 638 600 613 613 663 675 713 725
"Bear in mind that there is no working synthetic method for cloud, because Mark New
lost the coefficients file and never found it again (despite searching on tape
archives at UEA) and never recreated it. This hasn't mattered too much, because
the synthetic cloud grids had not been discarded for 1901-95, and after 1995
pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; output can then be used as dummy input to splining program that also
As for converting sun hours to cloud cover.. we only appear to have interactive, file-by-
file
IDL
(the 'ann' versions above include the assumption that the relationships remain constant
through the year)
F77
./f77/mnew/sh2sp_m.for
./f77/mnew/sh2sp_normal.for
./f77/mnew/sh2sp_tdm.for
program sunh2cld
Does NO SUCH THING!!! Instead it creates SUN percentages! This is clear from the
variable names and
user interactions.
So.. if I add the sunh -> sun% process from sh2cld_tdm.for into Hsp2cldp_m.for, I
should end up with a
sun hours to cloud percent convertor. Possibly. Except that the sun% to cld% engine
looks like it's
do im=1,12
ratio = (real(sunp(im))/100)
if (ratio.ge.0.95) cldp(im) = 0
if (ratio.lt.0.95.and.ratio.ge.0.35)
* cldp(im) = (0.95-ratio)*100
if (ratio.lt.0.35.and.ratio.ge.0.15)
* cldp(im) = ((0.35-ratio)*50)+60
enddo
Looking back I see we found cloud and sunpercent databases (line counts shown):
228936 cld.0301081434.dtb
104448 cld.0312181428.dtb
111989 combo.cld.dtb
57395 spc.0301201628.dtb
51551 spc.0312221624.dtb
51551 spc.94-00.0312221624.dtb
<BEGIN_QUOTE>
For 1901 to 1995 - stay with published data. No clear way to replicate
process as undocumented.
This is confusing. I can only use one (observed) cloud database in the final gridding. The
above
agreement seems to assume that all data after 1996 will come from sun. But dtbstat.for
reports:
<BEGIN_QUOTE>
Report for: spc.0312221624.dtb (it's similar for the other spcs, except the earlier one goes
to 2002)
Total: 2100
<END_QUOTE>
So the Sun Percent databases run for long periods. Similarly, for cloud:
<BEGIN_QUOTE>
Total: 3605
<END_QUOTE>
Not as long a run, and it sure ends at 1996! So 1901 to 1995 will, as agreed, remain
untouched.
Well.. let's try converting the MCDW and CLIMAT Sun hours to Sun percents, then
adding to the
SPC database (spc.0312221624.dtb). Modified Hsh2cld .for to save sun percent too. Lots
of debugging..
and drainage paper no. 24. Food and Agriculture Organization of the United Nations,
Rome.
This was used to inform the Fortran conversion programs by indicating the latitude-
potential_sun and
fact calculating Cloud Percent, despite calling it Sun Percent!! Just awful.
And so..
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld
crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld
crua6[/cru/cruts/version_3_0/db/cld]
<END_QUOTE>
So, now the luxury of a little experiment.. I merged the MCDW and CLIMAT 'spc'
databases into
MCDW:
<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/db/cld] ./newmergedb
WELCOME TO THE DATABASE UPDATER
you want the quick and dirty approach? This will blindly match
(automatically: 867)
(by operator: 0)
> Added as new Master stations: 826
> Rejected: 0
<END_QUOTE>
CLIMAT:
<BEGIN_QUOTE>
(automatically: 917)
(by operator: 0)
<END_QUOTE>
So, as expected, a few of the CLIMAT stations couldn't be matched for metadata.. no
worries.
what's interestng is that roughly the same ratio of stations were matched with existing in
both
Now, as our updates only start in 2003, that means we've just lost between 826 and 1005
sets of
data (added as new). We can't be exact as we don't know the overlap between the MCDW
and the CLIMAT
bulletins.. but we will have a better idea when I try the anomdtb experiment on the
combined update.
First, add the CLIMAT update again, this time to the MCDW-updated database:
CLIMAT:
<BEGIN_QUOTE>
(automatically: 1736)
(by operator: 0)
> Rejected: 38
<END_QUOTE>
Note several bits of good news! Firstly, rejects are down to 38 (60 having matched with
MCDW stations).
That's not *that* good of course - those will be new and so 2003 onwards only. Similarly,
(1005-246=)
759 CLIMAT bulletins matched MCDW ones, they will also be 2003 onwards only. In
other words, there were
only (1736-759=) 977 updates to existing stations. So.. yes I'm being sidetracked again.. I
found and
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru
Enter the latest MCDW file (or <ret> for single files): ssm0708.fin
<END_QUOTE>
Now I'm not planning to re-run all the previous parameters! Hell, they should have had
the older data
in already! But for sun/cloud, this could help enormously. Here's the plan:
4. Use the new program 'normshift.for' to calculate 95-02 normals from TS 2.10 CLD.
6. Modify the in-database normals (step 3) with the difference (step 5).
7. Carry on as before?
No.. this won't work. anomdtb.for calculates normals on the fly - it would have to know
too much.
The next opportunity comes at the output from anomdtb - the normalised values in the
*.txt files that
the IDL gridder reads. These are just files - one per month - with lists of coordinates and
values, so
ideal to add normalised values to. Decided that this will be the process:
..meanwhile, as before..
So we then just have to merge the two 6190 anomaly sets! Which could just be a
concatenation.
Easy, then.. the only thing we need is the miraculous 'newprog.for'! With three days
before delivery.
No, no, no - HANG ON. Let's not try and boil the ocean! How about:
1901-2002 Static, as published, leave well alone (or recalculate with better DTR).
2003-2006/7 Calc from modern SunH and use the suggested mods after gridding.
1. MCDW only goes back to 2006, so what's the data density for 2003-2005? Should this
also use synthetic
2. No guarantee of continuity from 2002 to 2003. This could be the real stickler. Moving
from one system
OKAY.. normshift.for now creates a gridded set of conversion data between whatever
period you choose
and 1961-1990. Such that it can be added to the gridded output of the process run with
the 'false'
normalisation period.
So.. first, merge your bulletins:
Well FIRSTLY, you realise that your databases don't have normals lines, so you modify
mcdw2cru.for and
climat2cru.for to optionally add them, then you re-run them on the bulletins, ending up
with:
<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru
Enter the latest MCDW file (or <ret> for single files): ssm0708.fin
<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt
<END_QUOTE>
So.. NOW can I merge CLIMAT into MCDW?!
<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/merge_CLIMAT_into_MCDW]
./newmergedb
you want the quick and dirty approach? This will blindly match
Writing sun.0711272225.dtb
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
(automatically: 1775)
(by operator: 0)
> Rejected: 28
<END_QUOTE>
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld
<END_QUOTE>
..and yay!
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.cld
cld.0711272230.dtb
1995,2002
12.5
cld.txt
1995,2007
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0711272230.dts
<END_QUOTE>
Well.. a 'qualified' yay.. only half got normals! But I don't like to raise the 'missing
percentage'
limit to 25% because we're only talking about 8 values to begin with!!
The output files look OK.. between 400 and 600 values in each, not a lot really but hey,
better than
nowt. So onto the conversion data (must stop calling 'em factors, they're not
multiplicative).
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./normshift
<END_QUOTE>
So, erm.. now we need to create our synthetic cloud from DTR. Except that's the thing we
CAN'T do because
pro cal_cld_gts_tdm.pro needs those bloody coefficients (a.25.7190, etc) that went
AWOL. Frustratingly we
do have some of the outputs from the program (ie, a.25.01.7190.glo), but that's obviously
no use.
So, erm. We need synthetic cloud for 2003-2007, or we won't have enough data to run
with. And yes it's
<BEGIN_QUOTE>
crua6[/cru/mark1/markn/gts/cld/val] ls -l
total 7584
<END_QUOTE>
..which looks to me like the place where he calculated the coefficients. The *.j files are
IDL 'Journal' files,
<BEGIN_QUOTE>
YEAR: 1981
% RD25_GTS 11 /cru/u2/f080/Idl/rd25_gts.pro
% $MAIN$ 1 /tmp_mnt/cru-auto/mark1/f080/gts/cld/val/cld_corr.j
IDL>
<END_QUOTE>
I then had to chase around to find three sets of missing files.. to fulfil these five
conditions:
hgrid,'~/u1/hahn/hahn25.',1981,1991
rgrid,'../glo_reg_25/glo.cld.',1981,1991
hgrid2,'~/u1/hahn/hahn25.',1983,1991
igrid,'c1/isccp.',1983,1991
rgrid2,'../glo_reg_25/glo.cld.',1983,1991
I managed to find the hahn25 files (on Mark's disk), and some likely-looking isccp files
(also on Mark's disk).
But although there were plenty of files with 'glo', 'cld' and '25' in them, there were none
matching the filename
construction above. However, as some of those were in the same directory - I'll take that
chance!!
I did try, honestly. Very hard. I found all the files, and put them in directories. I made a
local copy of the job
file, 'H_cld_corr.j', with the local directory refs in. Hell, I even precompiled the correct
version of rdbin!
All for nothing, as usual. It runs quite happily, zipping through things, until:
YEAR: 1983
c1/isccp.83.07.72
c1/isccp.83.08.72
c1/isccp.83.09.72
c1/isccp.83.10.72
c1/isccp.83.11.72
c1/isccp.83.11.72.Z: No such file or directory
c1/isccp.83.12.72
YEAR: 1984
c1/isccp.84.01.72
(etc)
It isn't seeing the isccp files EVEN THOUGH THEY ARE THERE. Odd. If I create Z
files it says they aren't compressed.
It ends with:
YEAR: 1991
yes
filesize= 248832
gridsize= 2.50000
I have no idea what it's actually done though. It doesn't appear to have produced
anything.. ah:
IDL> help
% At $MAIN$ 1 /tmp_mnt/cru-
auto/cruts/version_3_0/cloud_synthetics/H_cld_corr.j
ILAT INT = 72
IM INT = 12
N LONG = Array[5225]
NN LONG = 5225
Compiled Procedures:
Compiled Functions:
IDL>
..so this is one of a set of tools *that you have to know how to use*. All the work's done
in the IDl data space.
Well as we don't have any instructions, that's a complete waste of two-and-a-half days'
time.
NETDCF
Well now, we have to make the data available in NetCDF and ASCII grid formats. At the
moment, it might be best to
just post-process the final ASCII grids into NetCDF; though more elegant to have
mergegrids.for produce both! As it
has the data there anyway.. so I modified mergegrids.for into makegrids.for, with added
NetCDF goodness. as
***********************************************************************
*
Finally got NetCDF & Fortran working on the chosen server here (damp.badc.rl.ac.uk). I
am definitely not a
chamaeleonic life form when it comes to unfamiliar computer systems. Shame. The
elusive command line compile
statement is:
Hunting for CDDs I found a potential problem with binary DTR (used in the construction
of Frost Days, Vapour
Pressure, and (eventually) Cloud. It looks as though there was a mistyping when the 2.5-
degree binaries were
constructed:
IDL>
quick_interp_tdm2,1901,2006,'dtrbin/dtrbin',50,gs=2.5,dumpbin='dumpbin',pts_prefix='d
trtxt/dtr.'
That '50' should have been.. 750! Oh bugger. Well, might as well see if generation does
work here. DTR/bin/2.5:
..er.. hang on while I try and get IDL to recognise a path.. meh. As usual I find this
effectively
impossible, so have to issue manual .compile statements. The suite of progs required to
compile for
quick_interp_tdm2.pro is:
glimit.pro
area_grid.pro
strip.pro
wrbin.pro
IDL> @../../programs/idl/loads4idl.j
IDL>
was getting tedious. n00b. [this now fixed - ed] Anyway, here's the corrected
IDL>
quick_interp_tdm2,1901,2006,'dtrbin/dtrbin',750,gs=2.5,dumpbin='dumpbin',pts_prefix='
dtrtxt/dtr.'
Defaults set
1901
1902
(etc)
<BEGIN_QUOTE>
IDL>
frs_gts,dtr_prefix='dtrbin/dtrbin',tmp_prefix='tmpbin/tmpbin',1901,2006,outprefix='frssy
n/frssyn'
IDL> quick_interp_tdm2,1901,2006,'frsgrid/frsgrid',750,gs=0.5,dumpglo='dumpglo',
nostn=1,synth_prefix='frssyn/frssyn'
-bash-3.00$ ./glo2abs
Enter the path (if any) for the output files: frsabs/
Choose: 3
frs.01.1901.glo
frs.02.1901.glo
(etc)
<END_QUOTE>
Now looking to get makegrids.for working.. managed to get the data to write by declaring
REALs as
instead! Ah well. Still tussling with the 'time' variable.. not clear how to handle
observations.
<BEGIN_QUOTE>
>I need to define the time parameter in the NetCDF version of the CRUTS
>I wondered if there was a convention (in CRU, or wider) for allocating
short time(time) ;
<END_QUOTE>
And that seems to now be working! Here's the run with the compile statement included:
<BEGIN_QUOTE>
Writing: cru_ts_3_00.1901.1910.frs.dat
cru_ts_3_00.1901.1910.frs.nc
Writing: cru_ts_3_00.1911.1920.frs.dat
cru_ts_3_00.1911.1920.frs.nc
Writing: cru_ts_3_00.1921.1930.frs.dat
cru_ts_3_00.1921.1930.frs.nc
Writing: cru_ts_3_00.1931.1940.frs.dat
cru_ts_3_00.1931.1940.frs.nc
Writing: cru_ts_3_00.1941.1950.frs.dat
cru_ts_3_00.1941.1950.frs.nc
Writing: cru_ts_3_00.1951.1960.frs.dat
cru_ts_3_00.1951.1960.frs.nc
Writing: cru_ts_3_00.1961.1970.frs.dat
cru_ts_3_00.1961.1970.frs.nc
Writing: cru_ts_3_00.1971.1980.frs.dat
cru_ts_3_00.1971.1980.frs.nc
Writing: cru_ts_3_00.1981.1990.frs.dat
cru_ts_3_00.1981.1990.frs.nc
Writing: cru_ts_3_00.1991.2000.frs.dat
cru_ts_3_00.1991.2000.frs.nc
Writing: cru_ts_3_00.2001.2006.frs.dat
cru_ts_3_00.2001.2006.frs.nc
-bash-3.00$
<END_QUOTE>
And here, for a combination of posterity and boredom, is a (curtailed) dump from
ncdump:
<BEGIN_QUOTE>
netcdf cru_ts_3_00.1901.2006.frs {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (1272 currently)
variables:
double lon(lon) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
double lat(lat) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
int time(time) ;
time:long_name = "time" ;
time:calendar = "standard" ;
frs:units = "days" ;
frs:scale_factor = 0.00999999977648258 ;
frs:correlation_decay_distance = 750. ;
frs:_FillValue = -9999. ;
frs:missing_value = -9999. ;
// global attributes:
:institution = "BADC" ;
:contact = "BADC <[email protected]>" ;
data:
(etc)
(etc)
82.25, 82.75, 83.25, 83.75, 84.25, 84.75, 85.25, 85.75, 86.25, 86.75,
time = 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385,
(etc)
1626, 1627, 1628, 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637,
-999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,
-bash-3.00$
<END_QUOTE>
And VAP:
<BEGIN_QUOTE>
IDL>
vap_gts_anom,dtr_prefix='dtrbin/dtrbin',tmp_prefix='tmpbin/tmpbin',1901,2006,outprefi
x='vapsyn/vapsyn.',dumpbin=1
(etc)
<END_QUOTE>
These numbers are different from the original runs - so that was a genuine mistyping.
Eek, that's not
<BEGIN_QUOTE>
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synth_prefi
x='vapsyn/vapsyn.',pts_prefix='vaptxt/vap.'
Defaults set
1901
1902
(etc)
-bash-3.00$ ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 4
vap.01.1901.glo
vap.02.1901.glo
(etc)
-bash-3.00$ ./makegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.vap.dat
cru_ts_3_00.1901.1910.vap.nc
Writing: cru_ts_3_00.1911.1920.vap.dat
cru_ts_3_00.1911.1920.vap.nc
Writing: cru_ts_3_00.1921.1930.vap.dat
cru_ts_3_00.1921.1930.vap.nc
Writing: cru_ts_3_00.1931.1940.vap.dat
cru_ts_3_00.1931.1940.vap.nc
Writing: cru_ts_3_00.1941.1950.vap.dat
cru_ts_3_00.1941.1950.vap.nc
Writing: cru_ts_3_00.1951.1960.vap.dat
cru_ts_3_00.1951.1960.vap.nc
Writing: cru_ts_3_00.1961.1970.vap.dat
cru_ts_3_00.1961.1970.vap.nc
Writing: cru_ts_3_00.1971.1980.vap.dat
cru_ts_3_00.1971.1980.vap.nc
Writing: cru_ts_3_00.1981.1990.vap.dat
cru_ts_3_00.1981.1990.vap.nc
Writing: cru_ts_3_00.1991.2000.vap.dat
cru_ts_3_00.1991.2000.vap.nc
Writing: cru_ts_3_00.2001.2006.vap.dat
cru_ts_3_00.2001.2006.vap.nc
-bash-3.00$
<END_QUOTE>
A quick look at the VAP NetCDF headers & data looked good. So - yay, that's the
damage repaired,
pity it took over a day of the time at RAL. But I didn't have to fix it now - it was an
opportunity
Next problem - station counts. I had this working fine in CRU - here it's insisting on
stopping
indefinitely at January 1957. Discovered - after 36 hours of fretting and debugging - that
it's
And what d'you know, when I debug it, it's as simple as being too close to the pole and
not having
any loop restrictions in the East and West hunts for valid cells.. just looping forever!
Added
a few simple conditionals and all seems to run.. but outputs don't look right, the Jan 1957
station
Managed to get anomdtb compiled with gfortran, after altering a few lines (in anomdtb
and its mods)
where Tim had shrugged off the surly bounds of strict F90.. it must be compiled in
programs/fortran/
though, with the line (embedded in the anomdtb comments too):
As part of the modifications I removed the unused options - meaning that a .dts file is no
longer
required (and, of course, neither is 'falsedts.for'). Ran it for the temperature database and
got
glo2abs and makegrids, ended up with grids very similar (though sadly not identical) to
the
Scripting. Now this was always going to be the challenge, for a large suite of highly-
interactive
programs in F77, F90 and IDL which didn't follow universal file naming conventions. So
to start with,
mcdw2cru (interactive)
climat2cru (interactive)
tmnx2dtr (interactive)
frs_gts_tdm
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
vap_gts_anom
anomdtb (interactive)
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
rd0_gts_anom
anomdtb (interactive)
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
Tried to compile makegrids.for on uealogin1 (as crua6 is being retired). Got an odd error:
..which was cured with the addition of '-xarch=native64' to the compile statement:
Then had to play around to try and reduce the size of the NetCDF files - they were bigger
than the uncompressedASCII ones! This
was because the variable was declared as DOUBLE, which is 64 bits, or 8 bytes, per
datum. A waste, since we deal with the data
as integers and use factors to restore 'real' values. So redeclared as INT. Considering re-
redeclaring as SHORT, which is 16
bits to INT's 32.. however, that only gives me signed -32,768 to 32,767 or unsigned 0 to
65,535. That's enough for our datasets
but only if precip has a positive missing value code, which I don't like the sound of.
Reproduced all primaries and secondaries with INT typing for the NetCDF component.
Simultaneously trying to work out why stncounts.for is apparently ignoring the South
Pole station
(Amundsen-Scott) even though the rest of the output looks fine.. eventually realised that
the land/sea mask is blobking it!!
Station counts work continues.. should the NetCDF files be written as INT to match the
data files, or SHORT to save a lot of space?
In fact, should the station counts be in the same NetCDF files as their data?!!
Finished off the local regenration of VAP and FRS from the corrected dtrbin files:
DTR fix:
IDL>
quick_interp_tdm2,1901,2006,'dtrbin/dtrbin',750,gs=2.5,dumpbin='dumpbin',pts_prefix='
dtrtxt/dtr.'
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synth_prefi
x='vapsyn/vapsyn.',pts_prefix='vaptxt/vap.'
that glo2abs.for now saves filenames with year.month, rather than replicating the loopy
month.year that quick_interp_tdm2.pro gives.
This will allow file lists to be compiled with 'ls >tmpfile', so that timespan and missing
files can be detected at the start. Have
also had to amend (and re-run at great length!) stnounts.for to do the same.
Just realised I still haven't worked out how to do the station counts for the secondaries. I
can't work out how Tim did it either,
since he worked out counts at the same stage I do. Did he let the primary parameter's
counts override? Or backfill? Well we could
take the approach that the gridding routine takes, namely to use observed data and only
refer to synthetic when that fails (and only
So, stncounts will have to accept TWO sets of .txt files. At each timestep it will have to
first count the secondary parameter's
stations, then the primary parameter(s) will be counted and fill in any zeros in the grid.
However, we will need different information
for the paper - what use is the effective station count as some is of higher 'quality' than
the rest? So will probably need the regular
observed-only count as well.. which could be a separate run of stncounts but faaar more
sensible to be a side effect of this run.
I think the mods to stncounts will be the turning point for all the programs. So far, they
have all been generic, but this is not
tenable if the system is to be automated - they need to be aware of the parameters and
what they mean. Otherwise stncounts will have
to be told how many primaries produced the synthetic grids, etc, etc - stupid. So I need to
devise a directory structure and file
naming schema that will support the entire update process. Eeeeeeeek.
Back to precip, it seems the variability is too low. This points to a problem with the
percentage anomaly routines. See earlier
absgrid(ilon(i),ilat(i)) = nint(normals(i,imo) +
This was shown to be delivering unrealistic values, so I went back to anomdtb to see how
the anomalies were contructed in the
DataA(XAYear,XMonth,XAStn) =
nint(1000.0*((real(DataA(XAYear,XMonth,XAStn)) / &
real(NormMean(XMonth,XAStn)))-1.0))
Modified anomdtb to dump the precip anomaly calculation. It seems to be working with
raw values, eg:
However, that -1000 is interesting for zero precip. It looks as though the anomalies are in
mm*10, like
the precip database raw values.. but no! These are dumped from within the program, the
output .txt files
have -100 instead of -1000. That's because of the CheckVariSuffix routine, which returns
a factor based
Variable="precipitation (mm)"
Factor = 0.1
do XAStn = 1, NAStn
ALat(XAStn),ALon(XAStn),AElv(XAStn),real(DataA(XAYear,XMonth,XAStn))
*Factor,AStn(XAStn)
end do
So.. we grid these values. The resulting anomalies (for Jan 1980) look like this:
max = 1612.4
min = -100
These should be applied to the climatology (normals). I think they can be applied to
either unscaled 'real'
value normals or to normals which are mm*10. Results will be scaled accordingly. So..
let's look at glo2abs.
Again.
absgrid(ilon(i),ilat(i)) = nint((rnormals(i,imo)*
* (anoms(ilon(i),ilat(i))+1000)/1000)/rmult)
V = N(A+1000)/1000*rmult
Not happy with that anyway. The multiplicative factor.. should that be there at all?
Now, if these are 'genuine' percentage anomalies - ie, they represent the percentage
change from the mean,
then the formula to convert them using unscaled normals would be:
V = N + N(A/100)
For instance, -100 would give V = 0, and 100 would give V = 2N. Now if the normals are
*10, surely the
results will be *10 too? As each term has N as a multiplicative factor anyway. This
makes
me wonder about the prevailing theory that the anomalies need to be *10. They are
fractions of the normal, so it shouldn't matter whether the normal is real or *10. The
So, a run (just for 1980) of glo2abs, then, using the following formula:
absgrid(ilon(i),ilat(i)) = nint(rnormals(i,imo) +
V = N + A*N/100
This should give integer values of mm*10 (because the normals are mm*10 and
uncorrected).
So, working backwards, what *should* the original anomalising routine in anomdtb.f90
look
think we can call equivalence if V and N are in mm? Complicated. The '100' is not a
scaling factor, it's the number that determines a percentage calculation. If we use
1000, what are we saying? The percentage anomalies we would have got are now 10x
higher.
Where V=0, A will be -1000, a meaningless percentage anomaly for precip. That's where
the CheckVariSuffix factor pops up, multiplying by 0.1 and getting us back where we
started.
So.. Tim's original looks right, once you understand the correction factor applied later.
A slightly Heath-Robinson attempt to verify.. extracted the cell for Wick from the 1980
106
69
91
22
16
74
79
80
129
156
177
151
The station says:
1980 763 582 718 153 95 557 587 658 987 1162 1346 1113
**sigh** Yes, they do follow a similar shape.. it's just that the original station data
Station DUDDUUUUUUD
Gridded DUDDUUUUUUD
So, once again I don't understand statistics. Quel surprise, given that I haven't had any
1060 740 870 600 610 670 700 910 990 1070 1200 1110
Now, these look like mm*10, the same units as the station data and what I expected. If I
apply the anomalies to these normals, it looks like I'll get what I'm after.. the trouble
is that glo2abs.for deliberately works with real values and so applies the factor in the
clim header before using the values. I just don't think that works with percentages.. hmm.
Actually, it does. I ran through the algorithms and because the normal is multiplicative,
you can do the scaling before or after. In other words, if V' is V produced with scaled
normals (N*0.1) then we do end up with V = 10V'. So I just need to include the factor in
V = (N + A*N/100)/F
Ran it, and the results were good. So - as it's the only change - I won't have to regrid
precip after all! Just re-run from glo onwards.. did so, then used the old (but working)
So.. comparisons. Well I want to compare with both 2.0 and 2.1, because they do differ.
So
I will need to convert 2.0 to regular-gridded, as I did with 2.1. If I could only remember
**
Another problem. Apparently I should have derived TMN and TMX from DTR and
TMP, as that's
what v2.10 did and that's what people expect. I disagree with publishing datasets that are
simple arithmetic derivations of other datasets published at the same time, when the real
does not tell us what to do when either or both values (TMP, DTR) are missing. One
thing to
check is the climatologies. Here are the first two cell normals for all four parameters:
Grid-ref= 1, 148
270 274 269 265 259 253 246 242 248 252 261 268
Grid-ref= 1, 311
Grid-ref= 1, 148
56 71 54 59 55 50 54 51 57 61 61 73
Grid-ref= 1, 311
74 71 72 77 59 65 64 56 49 51 63 71
Grid-ref= 1, 148
242 238 242 236 232 228 219 217 220 222 230 232
Grid-ref= 1, 311
Grid-ref= 1, 148
298 309 296 295 287 278 273 268 277 283 292 305
Grid-ref= 1, 311
Grid-ref= 1, 312
Well, making allowances for rounding errors, they do seem to hold to the relationship.
Wrote maketmnx.for to derive TMN and TMX from TMP and DTR grids. Works with
the output files
from glo2abs.for. Ran makegrids to produce .dat and .nc files ( still pre-station count
inclusion).
On to precip problems. Tim O ran some comparisons between 2.10 and 3.00, in general
things are
much improved but there are a few hair raisers (asterisked for special concern):
stations having data a factor of 10 too low. This ties in with the WWR station data that
DL
added for 1991-2000, which aprently was prone to scaling issues. Wrote stnx10.for to
scale
a file of WWR Bangladesh records, then manually C&P'd the decade over the erroneous
ones in
Then Laos/Vietnam. Here we have an anomalously high peak for 1991 DJF. Used
getllstations.for
to extract all stations in a box around Laos & Vietnam (8 to 25N, 100 to 110E), a total of
96
stations from Thailand, Vietnam, Laos, Kampuchea, and China. Eeeek. Tim O's program
only worked
with boxes though. Also, I'm not 100% sure which year DJF belongs to in Tim's world..
hopefully
it's the December year (as it was the fourth column in his plot table). However.. plotted
*all*
the data as overlapping years, and there is no trace of a spike in DJF. Uh-oh.
I'm not actually convinced that the 'country box' approach is much cop. Better to examine
each
land cell and automagically mark any with excessions? Say 5 SD to begin with. Could
then be
extra clever and pull the relevant stations and find the source of the excession? Of course,
this
shouldn't happen, since there is a 4SD limit imposed by anomdtb.f90 for precip (3SD for
others).
Wrote vietlaos.for to run through the lists of Vietnam and Laos cells (provided by Tim O)
and
extract the DJF precip values for each (from the 1901-2006 gridded file). It then
calculates the
standard deviation of each series, normalises, and notes any values over 6.0 SDs (1991
onwards).
Result.. some very high values (up to 11.3 standard deviations!) in 1991/2. Worst cells:
Index 273 can be related to time as follows. The series begins in 1901 and we take three
values
per year (J,F,D). So 1990 would be the 90th year and the 268th-270th values. Thus 273 =
Dec 1991.
The cells are all contiguous, implying a single station's influence via the gridding
process:
The 'epicentre' of the anomaly looks to be cell (213,571), which is in the Laos file:
So we're looking for stations in the vicinity of 105.75E, 16.75N. Well the precip database
has a
4893000 1990 10210 304 LUANG PRABANG LAOS 1951 2006 -999
-999.00
4894000 1800 10260 170 VIENTIANE LAOS 1941 2006 -999 -999.00
4894600 1738 10465 152 THAKHEK LAOS 1989 2006 -999 -999.00
4894800 1670 10500 184 SENO LAOS 1951 2006 -999 -999.00
4895200 1568 10643 168 SARAVANE LAOS 1989 2006 -999 -999.00
Well, SENO has to be the prime candidate. Unfortunately, this is from SENO:
4894800 1670 10500 184 SENO LAOS 1951 2006 -999 -999.00
<snip>
<snip>
1992 324 338 93 691 1932 2344 2048 4464 756 607 0 197
..nope..
4894600 1738 10465 152 THAKHEK LAOS 1989 2006 -999 -999.00
<snip>
1991 0 0 905 119 861 6058 3578 7092 2417 373 0 324
1992 105 318 125 456 2140 2978 4623 4595 3376 425 0 854
each station:
4893000 1990 10210 304 LUANG PRABANG LAOS 1951 2006 -999
-999.00
1992 193 911 0 497 657 1246 2971 2837 929 584 95 1372
1994 0 54 1107 291 1702 2436 2025 3636 1516 316 185 816
1992 411 719 0 816 754 1252 2573 1671 1686 991 351 879
1994 0 208 1695 503 2262 1607 1743 2562 3205 118 193 454
4894000 1800 10260 170 VIENTIANE LAOS 1941 2006 -999 -999.00
1971 0 70 140 340 2940 2750 2890 2260 1630 1030 0 180
1992 381 273 11 424 2372 4878 4381 3676 3091 630 0 212
1994 0 300 921 322 2685 2725 4698 1932 4000 3031 1016 166 (inc for comparison
with previous)
4894600 1738 10465 152 THAKHEK LAOS 1989 2006 -999 -999.00
1991 0 0 905 119 861 6058 3578 7092 2417 373 0 324
1992 105 318 125 456 2140 2978 4623 4595 3376 425 0 854
1994 0 612 952 558 1697 7092 5121 4276 2428 486 20 2 (inc for comparison
with previous)
4894700 1660 10480 155 SAVANNAKHET LAOS 1970 2006 -999
-999.00
1992 324 338 93 691 1932 2344 2048 4464 756 607 0 197
1994 0 734 390 494 1381 3377 1525 5651 1881 600 0 0 (inc for comparison with
previous)
4894800 1670 10500 184 SENO LAOS 1951 2006 -999 -999.00
1971 0 880 130 370 1270 4010 2200 2860 1930 410 0 140
1992 488 280 50 80 1883 2503 2644 2935 2039 131 0 89 (inc for comparison
with previous)
1994 0 532 318 969 2065 1937 1197 4552 1934 197 0 0 (inc for comparison with
previous)
4895200 1568 10643 168 SARAVANE LAOS 1989 2006 -999 -999.00
1992 287 33 52 222 1072 5444 2998 8899 2243 1070 0 0 (inc for comparison
with previous)
1994 0 10 354 686 1743 3387 5829 3254 4219 408 41 4 (inc for comparison with
previous)
1998 26 619 0 574 2386 2871 1530 2308 2680 913 463 73
2005 0 0 120 1230 1990 2860 4350 8060 3770 280 140 70
1992 166 101 0 210 665 1898 2574 6448 2942 648 10 31 (inc for comparison
with previous)
1994 0 0 134 220 2537 3596 5161 5384 7693 1513 236 94
Summary: LUANG PRABANG shows a significant anomaly of 1372 for Dec 1992.
Unfortunately, this
finds echoes both temporal (1994 has 816) and spatial (SAYABOURY's 1992 is 879).
So, if these
values are causing the spike, it's genuine (if exaggerated in a way yet to be determined).
Wrote vietlaos2, to gather data from the cells AND stations. It also gets the climatology.
Initially
it only gathered 13 stations with data in 1991/2, using 'VIETNAM' and 'LAOS' to select
on country
name. However, taking the cell [214,574] in December 1991 as the peak incident, we can
use those
coordinates (17.25N, 107.25E) to centre a bounding box for station selection. A box
10degs square
yields only 17 stations, none of which have anything remotely spikey in Dec 1991. A box
20degs
square (some would say unfeasibly large) yields 98 stations, one of which does have a bit
of a spike
in Dec 91.. not impressively so though, and it's a long way away:
Over 10.5 degrees South and over 7 degrees West of the target cell. Not very convincing,
especially
One FINAL try with vietlaos3.for. Just looking at December, now, and getting the
original station
normals as well as the climatological ones. The whole chain. This proves to be
surprisingly
complicated.
On a parallel track (this would really have been better as a blog), Tim O has found that
the binary
'binfac' set to 10 for TMP and DTR. This may explain the poor performance and
coverage of VAP in
particular.
Back to VietLaos.. the station output from vietlaos3.for had a couple of stations with
missing
anomaly values:
I eventually worked out that I hadn't collapsed a universal probability, it was just the 4
standard
deviation screen in anomdtb (4 for precip, 3 for temp). To confirm, I did a short anomdtb
run (just
for 1991) with the sd limit set to 10, and sure enough:
They both look high enough to trigger the 4sd cap. However, since the spike we're
investigating is
from a regular process run, where that limit was in place, we can't use those values.
Program is thus
Next issue is to make sense of the output. The first line from the station file is (headings
added):
There are 63 stations and 204 cells (196 when missing values (sea) eliminated). I guess
one approach
would be to grid the anomalies, to see if a peak is visible. I did. It is. The simple
interpolation
in Matlab puts the peak at 17.25N, 105.25E - matches the grid peak for lat and a little
west for lon.
6190 34 127 290 907 1813 2900 2271 3353 2596 886 84 13
1940-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1949 0 200 170 470 3000 2720 3110 4920 360 690 90-9999
1951 0 340 770 1380 530 3380 1590 1950 3580 1430 20 0
1953 260 110 430 630 1010 2200 1480 2780 1180 310 10 0
1956 0 420 150 1000 3000 2930 3980 3840 2020 220 0 0
1959 0 430 730 290 2300 1540 2080 2030 3910 280 0 0
1960 0 190 550 650 1230 1750 3750 5090 2190 700 90 0
1963 0 0 610 600 1010 3480 2130 3410 1250 220 150 0
1965 0 250 660 780 2120 2700 2110 2810 2210 1350 0 0
1966 0 610 310 730 3340 1370 3100 4010 2020 510 40 110
1971 0 730 570 740 2130 3580 4060 2100 3240 510 30 170
1972 0 550 460 1280 1040 3470 3250 3640 2980 2340 20 0
1974 400 0 60 2160 1160 2520 3070 6110 1920 570 260 0
1975 20 350 360 410 2200 3340 3230 3560 940 520 20 40
1976 0 210 380 1700 1160 1460 2430 3720 3250 780 60 0
1978 0 100 920 650 1710 3690 2960 4420 2190 110 0 0
1980 0 50 190 1040 1440 3490 3310 930 6130 1830 170 20
1981 0 210 220 720 2630 4730 2490 1750 610 1260 90 0
1982 0 0 290 840 1330 2160 390 4390 3400 1720 370 0
1984 10 0 350 1270 2030 2290 2900 3880 2130 1380 650 0
1985 380 50 170 860 1100 4270 1580 3350 900 1170 0 0
1986 0 0 120 1650 2120 2210 1830 2980 1760 1700 240 10
1987 0 110 290 360 1090 4210 2670 3640 2140 1040 20 0
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1991 0-9999 105 226 1370 2079 1452 4190 3799 1610-9999 321
1992 328 314 150 637 1968 1906 2366 4973 1287 238 0 216
1994 0 781 274 409 1837 2297 1625 5755 1709 216-9999-9999
1995 0 140 834 672 1556 1606 4439 2848 681 857 69 0
1996 12 2 660 2394 1566 1526 1960 3350 4843 724 476-9999
1997 35 321 458 642 1154 2832 1197 4071 1722 1800 0-9999
1998 0 346 154 241 2174 3348 813 2153 2231 276 85 37
1999 63 0 182 1025 3449 1207 3681 1570 2628 299 109-9999
Note that the Dec 1991 value is anomalous, but not as extreme as the 1945 datum,
which would get the same treatment with normals and climatologies, so should
produce an even bigger spike for 1945 DJF! Unless of course it's screened out by
the 4SD rule.. which it is! Well - no value in pre.1945.12.txt for this location.
Anyway.. this is the highest value in the Vietnam/Laos cells for Dec 1991:
With a normal of 130, that makes the anomaly -48.85. Now I'm confused. How can
an anomalously high value be well below the 61-90 mean? Aaarrgghhhh. Perhaps I
should look at the highest anomaly. That turns out to be 80, from here:
216 563 18.25 101.75 1.80 1.00
Not exactly a show stopper. Time to look at the .glo files, which glo2abs processes
>> glod3(210:216,567:573)
The spike is at [213,569]. Yes, I know, it's the n-th set of coordinates. You should see the
plots! But looking at the anomalies is the closest we'll get to what Tim's program was
doing,
ie, calculating DJF standard deviations. Or something. Now, the coordinates are 16.75N,
104.75E.
And wouldn't you know it, our prime suspect (see above) is on top of it:
4838300 1653 10472 138 MUKDAHAN THAILAND 1934 2000 -999
-999.00
So OK, here we go with the full run-down for December 1991, in the 16.75N,105.75E
region:
Raw data: 321 Highest unscreened December for this station (67 years)
Normal: 13 Looks right - of course, very low for the target data!
Ah well - had enough. It looks like it's an extreme but believable event in a Thai station,
let's
leave it like that. Re-running precip, with the new updated database pre.0803271802.dtb:
<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.pre
pre.0803271802.dtb
1961,1990
25
pre.txt
1901,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0803271802.dtb
IDL>
quick_interp_tdm2,1901,2006,'preglo/pregrid.',450,gs=0.5,dumpglo='dumpglo',pts_prefix
='pretxt/pre.'
Defaults set
1901
1902
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Welcome! This is the GLO2ABS program.
Enter the path (if any) for the output files: preabs/
Choose: 1
pregrid.01.1901.glo
(etc)
pregrid.12.2006.glo
uealogin1[/cru/cruts/version_3_0/primaries/precip] ./makegrids
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.pre.dat
cru_ts_3_00.1901.1910.pre.nc
Writing: cru_ts_3_00.1911.1920.pre.dat
cru_ts_3_00.1911.1920.pre.nc
Writing: cru_ts_3_00.1921.1930.pre.dat
cru_ts_3_00.1921.1930.pre.nc
Writing: cru_ts_3_00.1931.1940.pre.dat
cru_ts_3_00.1931.1940.pre.nc
Writing: cru_ts_3_00.1941.1950.pre.dat
cru_ts_3_00.1941.1950.pre.nc
Writing: cru_ts_3_00.1951.1960.pre.dat
cru_ts_3_00.1951.1960.pre.nc
Writing: cru_ts_3_00.1961.1970.pre.dat
cru_ts_3_00.1961.1970.pre.nc
Writing: cru_ts_3_00.1971.1980.pre.dat
cru_ts_3_00.1971.1980.pre.nc
Writing: cru_ts_3_00.1981.1990.pre.dat
cru_ts_3_00.1981.1990.pre.nc
Writing: cru_ts_3_00.1991.2000.pre.dat
cru_ts_3_00.1991.2000.pre.nc
Writing: cru_ts_3_00.2001.2006.pre.dat
cru_ts_3_00.2001.2006.pre.nc
<END_QUOTE>
On to the reproduction of binaries for TMP and DTR, and subsequent regeneration of
VAP and FRS.
TMP Binaries:
IDL>
quick_interp_tdm2,1901,2006,'tmpbin/tmpbin',1200,gs=2.5,dumpbin='dumpbin',binfac=1
0,pts_prefix='tmp0km0705101334txt/tmp.'
Defaults set
1901
1902
1903
1904
1905
1906
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
grid 1918 non-zero -0.4379 1.0228 1.4534 cells= 52579
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
grid 1941 non-zero 0.0049 1.0253 1.4988 cells= 63950
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
grid 1964 non-zero -0.3518 0.9639 1.4553 cells= 82284
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
grid 1987 non-zero 0.1116 0.9654 1.4412 cells= 84529
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
2000
2001
2002
2003
2004
2005
2006
IDL>
DTR Binaries:
IDL>
quick_interp_tdm2,1901,2006,'dtrbin/dtrbin',750,gs=2.5,dumpbin='dumpbin',binfac=10,p
ts_prefix='dtrtxt/dtr.'
Defaults set
1901
1902
1903
1904
1905
1906
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
grid 1918 non-zero 0.1303 0.9134 1.2533 cells= 26447
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
grid 1941 non-zero 0.1334 0.8502 1.1411 cells= 38486
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
grid 1964 non-zero 0.0955 0.6719 0.9029 cells= 54909
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
grid 1987 non-zero -0.0579 0.6599 0.8919 cells= 55607
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
2000
2001
2002
2003
2004
2005
2006
IDL>
VAP synthetics:
IDL>
vap_gts_anom,dtr_prefix='../dtrbin/dtrbin',tmp_prefix='../tmpbin/tmpbin',1901,2006,outp
refix='vapsyn/vapsyn.',dumpbin=1
IDL>
VAP Gridding:
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synth_prefi
x='vapsyn/vapsyn.',pts_prefix='vaptxt/vap.'
Defaults set
1901
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 4
vap.01.1901.glo
(etc)
vap.12.2006.glo
uealogin1[/cru/cruts/version_3_0/secondaries/vap] ./makegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.YYYY.MM.glo.abs
Writing: cru_ts_3_00.1901.1910.vap.dat
cru_ts_3_00.1901.1910.vap.nc
Writing: cru_ts_3_00.1911.1920.vap.dat
cru_ts_3_00.1911.1920.vap.nc
Writing: cru_ts_3_00.1921.1930.vap.dat
cru_ts_3_00.1921.1930.vap.nc
Writing: cru_ts_3_00.1931.1940.vap.dat
cru_ts_3_00.1931.1940.vap.nc
Writing: cru_ts_3_00.1941.1950.vap.dat
cru_ts_3_00.1941.1950.vap.nc
Writing: cru_ts_3_00.1951.1960.vap.dat
cru_ts_3_00.1951.1960.vap.nc
Writing: cru_ts_3_00.1961.1970.vap.dat
cru_ts_3_00.1961.1970.vap.nc
Writing: cru_ts_3_00.1971.1980.vap.dat
cru_ts_3_00.1971.1980.vap.nc
Writing: cru_ts_3_00.1981.1990.vap.dat
cru_ts_3_00.1981.1990.vap.nc
Writing: cru_ts_3_00.1991.2000.vap.dat
cru_ts_3_00.1991.2000.vap.nc
Writing: cru_ts_3_00.2001.2006.vap.dat
cru_ts_3_00.2001.2006.vap.nc
FRS synthetics:
IDL>
frs_gts,dtr_prefix='../dtrbin/dtrbin',tmp_prefix='../tmpbin/tmpbin',1901,2006,outprefix='fr
ssyn/frssyn'
filesize= 6220800
gridsize= 0.500000
1961
filesize= 248832
gridsize= 2.50000
(etc)
1990
filesize= 248832
gridsize= 2.50000
% Compiled module: DAYS.
1901
filesize= 248832
gridsize= 2.50000
(etc)
2006
filesize= 248832
gridsize= 2.50000
IDL>
FRS gridding:
IDL>
quick_interp_tdm2,1901,2006,'frsgrid/frsgrid',750,gs=0.5,dumpglo='dumpglo',nostn=1,sy
nth_prefix='frssyn/frssyn'
Defaults set
1901
1902
found: frssyn/frssyn1902.gz
(etc)
2006
found: frssyn/frssyn2006.gz
IDL>
Tim suggested the 'synthfac' parameter in quick_interp_tdm2. The note for it says:
; multi factor to obtain synth file actual values
..so I reckon it should be 0.1 - but was wrong. The note is misleading at best, since the
actual code
dummygrid=dummygrid/synthfac
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synthfac=1
0,synth_prefix='vapsyn/vapsyn.',pts_prefix='vaptxt/vap.'
Defaults set
1901
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
vap.1901.01.glo
(etc)
vap.2006.12.glo
uealogin1[/cru/cruts/version_3_0/secondaries/vap] ./makegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.YYYY.MM.glo.abs
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.vap.dat
cru_ts_3_00.1901.1910.vap.nc
Writing: cru_ts_3_00.1911.1920.vap.dat
cru_ts_3_00.1911.1920.vap.nc
Writing: cru_ts_3_00.1921.1930.vap.dat
cru_ts_3_00.1921.1930.vap.nc
Writing: cru_ts_3_00.1931.1940.vap.dat
cru_ts_3_00.1931.1940.vap.nc
Writing: cru_ts_3_00.1941.1950.vap.dat
cru_ts_3_00.1941.1950.vap.nc
Writing: cru_ts_3_00.1951.1960.vap.dat
cru_ts_3_00.1951.1960.vap.nc
Writing: cru_ts_3_00.1961.1970.vap.dat
cru_ts_3_00.1961.1970.vap.nc
Writing: cru_ts_3_00.1971.1980.vap.dat
cru_ts_3_00.1971.1980.vap.nc
Writing: cru_ts_3_00.1981.1990.vap.dat
cru_ts_3_00.1981.1990.vap.nc
Writing: cru_ts_3_00.1991.2000.vap.dat
cru_ts_3_00.1991.2000.vap.nc
Writing: cru_ts_3_00.2001.2006.vap.dat
cru_ts_3_00.2001.2006.vap.nc
IDL>
quick_interp_tdm2,1901,2006,'frsgrid/frsgrid',750,gs=0.5,dumpglo='dumpglo',nostn=1,sy
nthfac=10,synth_prefix='frssyn/frssyn'
Defaults set
1901
1902
(etc)
2006
IDL>
Also re-doing WET/RD0:
IDL>
quick_interp_tdm2,1901,2006,'prebin/prebin',450,gs=2.5,dumpbin='dumpbin',binfac=10,
pts_prefix='pretxt/pre.'
Defaults set
1901
1902
(etc)
2006
no stations found in: pretxt/pre.2006.09.txt
IDL>
There then followed a produciton run for WET, resulting in unrealistic, banded output.
This was tracked down to the sythetic gridder, rd0_gts_tdm.pro, using half-degree
normals with a 2.5-degree output. So it was modified to read the 2.5 normals, and rerun:
IDL>
rd0_gts,1901,2006,1961,1990,outprefix='rd0syn/rd0syn',pre_prefix='../prebin/prebin'
yes
filesize= 248832
gridsize= 2.50000
yes
filesize= 248832
gridsize= 2.50000
1961
filesize= 248832
gridsize= 2.50000
(etc)
1990
filesize= 248832
gridsize= 2.50000
1901
filesize= 248832
gridsize= 2.50000
1902
(etc)
2006
filesize= 248832
gridsize= 2.50000
..which is what happened last time. And, again - all synthetics produced, apparently
OK. I think it's just the last few empty months of 2006..
IDL>
quick_interp_tdm2,1901,2006,'rd0pcglo/rd0pc',450,gs=0.5,dumpglo='dumpglo',synth_pr
efix='rd0syn/rd0syn',synthfac=10,pts_prefix='rd0pctxt/rd0pc.'
Defaults set
1901
1902
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/wet] ./glo2abs
Enter the path (if any) for the output files: rd0pcabs
Choose: 3
rd0pcglo/rd0pc.01.1901.glo
rd0pcglo/rd0pc.1901.01.glo
rd0pc.1901.01.glo
(etc)
rd0pc.2006.12.glo
uealogin1[/cru/cruts/version_3_0/secondaries/wet] ./makegrids
start year with SSSS and end year with EEEE, and
cru_ts_3_00.1901.1910.wet.nc
Writing: cru_ts_3_00.1911.1920.wet.dat
cru_ts_3_00.1911.1920.wet.nc
Writing: cru_ts_3_00.1921.1930.wet.dat
cru_ts_3_00.1921.1930.wet.nc
Writing: cru_ts_3_00.1931.1940.wet.dat
cru_ts_3_00.1931.1940.wet.nc
Writing: cru_ts_3_00.1941.1950.wet.dat
cru_ts_3_00.1941.1950.wet.nc
Writing: cru_ts_3_00.1951.1960.wet.dat
cru_ts_3_00.1951.1960.wet.nc
Writing: cru_ts_3_00.1961.1970.wet.dat
cru_ts_3_00.1961.1970.wet.nc
Writing: cru_ts_3_00.1971.1980.wet.dat
cru_ts_3_00.1971.1980.wet.nc
Writing: cru_ts_3_00.1981.1990.wet.dat
cru_ts_3_00.1981.1990.wet.nc
Writing: cru_ts_3_00.1991.2000.wet.dat
cru_ts_3_00.1991.2000.wet.nc
Writing: cru_ts_3_00.2001.2006.wet.dat
cru_ts_3_00.2001.2006.wet.nc
VAP - three stations deleted with unbelievable data:
6451000 208 1148 599 BITAM GABON 1971 2007 -999 -999
6275000 1400 3233 378 ED DUEIM SUDAN 1971 2007 -999 -999
crua6[/cru/cruts/version_3_0/secondaries/vap] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.vap
vap.0804231150.dtb
1961,1990
25
vap.txt
1901,2006
> Operating...
IDL>
quick_interp_tdm2,1901,2006,'vapglo/vap.',1000,gs=0.5,dumpglo='dumpglo',synthfac=1
0,synth_prefix='vapsyn/vapsyn.',pts_prefix='vaptxt/vap.'
% Compiled module: QUICK_INTERP_TDM2.
Defaults set
1901
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Enter the path (if any) for the output files: vapabs/
Choose: 1
vapglo/vap.01.1901.glo
vapglo/vap.1901.01.glo
vap.1901.01.glo
(etc)
vap.2006.12.glo
uealogin1[/cru/cruts/version_3_0/secondaries/vap] ./makegrids
Enter a gridfile with YYYY for year and MM for month: vapabs/vap.YYYY.MM.glo.abs
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.vap.dat
cru_ts_3_00.1901.1910.vap.nc
Writing: cru_ts_3_00.1911.1920.vap.dat
cru_ts_3_00.1911.1920.vap.nc
Writing: cru_ts_3_00.1921.1930.vap.dat
cru_ts_3_00.1921.1930.vap.nc
Writing: cru_ts_3_00.1931.1940.vap.dat
cru_ts_3_00.1931.1940.vap.nc
Writing: cru_ts_3_00.1941.1950.vap.dat
cru_ts_3_00.1941.1950.vap.nc
Writing: cru_ts_3_00.1951.1960.vap.dat
cru_ts_3_00.1951.1960.vap.nc
Writing: cru_ts_3_00.1961.1970.vap.dat
cru_ts_3_00.1961.1970.vap.nc
Writing: cru_ts_3_00.1971.1980.vap.dat
cru_ts_3_00.1971.1980.vap.nc
Writing: cru_ts_3_00.1981.1990.vap.dat
cru_ts_3_00.1981.1990.vap.nc
Writing: cru_ts_3_00.1991.2000.vap.dat
cru_ts_3_00.1991.2000.vap.nc
Writing: cru_ts_3_00.2001.2006.vap.dat
cru_ts_3_00.2001.2006.vap.nc
WET looks better, but variability is still too low. It's complicated by the synthetic
elements in
>> whos
>> c
c=
124416
>> min(d)
ans =
0
>> max(d)
ans =
303
>> hmean(d)
ans =
123.7939
So we can deduce that the rd0 2.5 degree normals are in days*10. Similarly for the others
of interest:
glo.pre.norm 0 1244 58 mm
My guess is that glo25.pre.6190 has a lower max because the wider coverage of each cell
is squashing
IDL>
quick_interp_tdm2,1901,2006,'prebin05/prebin05.',450,gs=0.5,dumpbin='dumpbin',binfa
c=10,pts_prefix='pretxt/pre.'
Defaults set
1901
(etc)
yes
filesize= 248832
gridsize= 2.50000
yes
filesize= 248832
gridsize= 2.50000
1961
yes
filesize= 6220800
gridsize= 0.500000
2006
filesize= 6220800
gridsize= 0.500000
IDL>
Defaults set
1901
1902
(etc)
RIGHT, stop all that.. Tim O has recalculated the 2.5-degree binary normals for PRE
(from half degree)
and WET (from TS 2.1). So.. time to try out the OTHER synthetic rd0 generator, the one
that reads
precip anomalies:
IDL>
rd0_gts_anom,1901,2006,1961,1990,outprefix='rd0syn/rd0syn.',pre_prefix='../prebin/pre
bin'
IDL>
IDL>
quick_interp_tdm2,1901,2006,'rd0glo/rd0.',450,gs=0.5,dumpglo='dumpglo',synth_prefix
='rd0syn/rd0syn.',synthfac=10,pts_prefix='rd0pctxt/rd0pc.'
Defaults set
1901
% Compiled module: RDBIN.
1902
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/wet] ./glo2abs
Enter the path (if any) for the output files: rd0abs/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Choose: 3
rd0glo/rd0.01.1901.glo
rd0glo/rd0.1901.01.glo
rd0.01.1901.glo
(etc)
rd0.12.2006.glo
uealogin1[/cru/cruts/version_3_0/secondaries/wet] ./makegrids
Enter a gridfile with YYYY for year and MM for month: rd0abs/rd0.YYYY.MM.glo.abs
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.wet.dat
cru_ts_3_00.1901.1910.wet.nc
Writing: cru_ts_3_00.1911.1920.wet.dat
cru_ts_3_00.1911.1920.wet.nc
Writing: cru_ts_3_00.1921.1930.wet.dat
cru_ts_3_00.1921.1930.wet.nc
Writing: cru_ts_3_00.1931.1940.wet.dat
cru_ts_3_00.1931.1940.wet.nc
Writing: cru_ts_3_00.1941.1950.wet.dat
cru_ts_3_00.1941.1950.wet.nc
Writing: cru_ts_3_00.1951.1960.wet.dat
cru_ts_3_00.1951.1960.wet.nc
Writing: cru_ts_3_00.1961.1970.wet.dat
cru_ts_3_00.1961.1970.wet.nc
Writing: cru_ts_3_00.1971.1980.wet.dat
cru_ts_3_00.1971.1980.wet.nc
Writing: cru_ts_3_00.1981.1990.wet.dat
cru_ts_3_00.1981.1990.wet.nc
Writing: cru_ts_3_00.1991.2000.wet.dat
cru_ts_3_00.1991.2000.wet.nc
Writing: cru_ts_3_00.2001.2006.wet.dat
cru_ts_3_00.2001.2006.wet.nc
Wrong again! The saga continues.. actually I'm beginning to wonder if it'll still be going
when I JOIN SAGA.
This time, the 'real' areas have variability 10x too low, and the 'synthetic' areas have
variability sqrt(10)
too low. The latter can be explained by the binary precip being in %age anoms *10, so
rd0_gts_anom.pro modified
to divide by 1000 when calculating (instead of 100). Example (from the normals
calculation):
Before:
'Synthfac=10' will also not be needed in the final gridding, that should take care of the
'real' area variability.
So..
IDL>
rd0_gts_anom,1901,2006,1961,1990,outprefix='rd0syn/rd0syn.',pre_prefix='../prebin/pre
bin'
IDL>
IDL>
quick_interp_tdm2,1901,2006,'rd0glo/rd0.',450,gs=0.5,dumpglo='dumpglo',synth_prefix
='rd0syn/rd0syn.',pts_prefix='rd0pctxt/rd0pc.'
<snip!>
That didn't work, real areas 10x too small (synth areas OK though). So..
IDL>
quick_interp_tdm2,1901,2006,'rd0glo/rd0.',anomfac=10,450,gs=0.5,dumpglo='dumpglo',
synth_prefix='rd0syn/rd0syn.',pts_prefix='rd0pctxt/rd0pc.'
Defaults set
1901
1902
(etc)
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/wet] ./glo2abs
Enter the path (if any) for the output files: rd0abs/
Choose: 3
rd0glo/rd0.1901.01.glo
rd0.1901.01.glo
(etc)
rd0.2006.12.glo
uealogin1[/cru/cruts/version_3_0/secondaries/wet] ./makegrids
Enter a gridfile with YYYY for year and MM for month: rd0abs/rd0.YYYY.MM.glo.abs
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1901.1910.wet.dat
cru_ts_3_00.1901.1910.wet.nc
Writing: cru_ts_3_00.1911.1920.wet.dat
cru_ts_3_00.1911.1920.wet.nc
Writing: cru_ts_3_00.1921.1930.wet.dat
cru_ts_3_00.1921.1930.wet.nc
Writing: cru_ts_3_00.1931.1940.wet.dat
cru_ts_3_00.1931.1940.wet.nc
Writing: cru_ts_3_00.1941.1950.wet.dat
cru_ts_3_00.1941.1950.wet.nc
Writing: cru_ts_3_00.1951.1960.wet.dat
cru_ts_3_00.1951.1960.wet.nc
Writing: cru_ts_3_00.1961.1970.wet.dat
cru_ts_3_00.1961.1970.wet.nc
Writing: cru_ts_3_00.1971.1980.wet.dat
cru_ts_3_00.1971.1980.wet.nc
Writing: cru_ts_3_00.1981.1990.wet.dat
cru_ts_3_00.1981.1990.wet.nc
Writing: cru_ts_3_00.1991.2000.wet.dat
cru_ts_3_00.1991.2000.wet.nc
Writing: cru_ts_3_00.2001.2006.wet.dat
cru_ts_3_00.2001.2006.wet.nc
Hmmm.. still some problems. In several areas, including a swathe of Russia, the mean
values drop
uealogin1[/cru/cruts/version_3_0/db/rd0] ./getllstations
GETCELLSTATIONS
2388400 6160 9000 63 BOR RUSSIA (ASIA) 1936 2007 -999 -999
2389100 6167 9637 261 BAJKIT RUSSIA (ASIA) 1936 2007 -999 -999
2490800 6033 10227 260 VANAVARA RUSSIA (ASIA) 1936 2007 -999
-999
2926300 5845 9215 78 JENISEJSK RUSSIA (ASIA) 1936 2007 -999 -999
2928200 5842 9740 134 BOGUCANY RUSSIA (ASIA) 1936 2007 -999
-999
2398600 6047 9302 521 SEVERO-JENISEJSK RUSSIA (ASIA) 2004 2007 -999
0
2937900 5720 9488 147 TASEJEVA RIVER RUSSIA (ASIA) 2004 2007 -999
0
The last two are too short to have any meaning. The second and third have missing data
over
the entire period of concern. That leaves BOR, JENISEJSK and BOGUCANY, the latter
of which
we'll examine closer. Here's the series, lifted directly from wet.0710161148.dtb:
2928200 5842 9740 134 BOGUCANY RUSSIA (ASIA) 1936 2007 -999
-999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1936 1200 500 800 500 1100 1400 900 1200 400 1300 1600 1700
1937 1400 1000 1300 400 800 1200 1100 1300 1800 2000 1300 1000
1938 1100 1100 500 1000 1000 800 600 1300 1400 2300 1800 1500
1939 1200 700 900 700 1500 1300 1100 1600 800 1900 1800 1900
1940 800 1500 500 1400 1400 1300 1400 1600 1500 2400 2000 1600
1941 2200 1500 1800 1100 1400 1300 1700 1600 900 1600 1900 1700
1942 1100 1600 1500 1100 1000 1900 600 1600 500 2000 1700 1900
1943 1000 1000 800 0 1500 1100 1000 1100 1200 2000 2100 2100
1944 1300 1100 1100 400 1400 1200-9999 1100 1500 1600 2000 2900
1945 2100 900 1000-9999 900 2100 1700 1200 900 1700 1900 1600
1946 2400 1400 1500 1000 1800 1700 1300 1800 700 1700 2200 2300
1947 1400 1300 1700 1500 900 600 1000 700 2000 600 1300 1200
1948 1700 1100 900 1100 1100 1100 1100 1900 1400 1300 1200 1500
1949 2100 1100 1000 700 1900 1600 800 1500 1600 1100 1600 1200
1950 1500 1100 800 800 1400 600 600 800 1600 1500 1900 2500
1951 1100 1200 1400 500 1000 1400 1200 2000 1100 1400 1100 1400
1952 1500 1200 1100 700 1600 1100 1300 1200 1200 2100 1200 1300
1953 1200 600 1300 700 800 800 1100 1100 1400 2100 1500 2100
1954 1900 1300 1100 1100 800 400 1700 1100 1300 1800 2000 1500
1955 1100 1600 1100 1000 1500 1400 1000 1100 1500 1400 1600 2000
1956 1800 1200 1000 1200 800 900 1900 800 1100 2100 800 1200
1957 1200 300 700 1200 1300 900 1300 1200 1700 1700 1900 2200
1958 2000 1000 1200 900 1400 1100 800 1000 1200 2000 2200 1900
1959 2100 1000 900 1400 1800 700 1600 1300 1300 1600 2300 1900
1960 1700 1800 1000 1600 1000 1500 1400 1500 2300 1100 1900 1200
1961 1700 1400 600 500 1000 1400 1400 1700 1800 1500 1600 1800
1962 1200 1300 700 700 800 1000 1300 1200 1200 900 2100 2000
1963 900 600 700 800 1300 1000 1300 1300 1400 1300 1100 2100
1964 1000 800 1200 500 800 1400 1000 800 700 1900 1000 2300
1965 1500 1100 800 500 1000 1100 800 1500 1900 1200 1600 1400
1966 2100 2100 1400 1000 1800 1200 1000 1000 1200 2100 2700 2400
1967 1800 1700 900 900 1200 900 1100 1100 800 1100 1300 1700
1968 1000 1500 800 900 800 1700 600 1200 1600 800 2200 1400
1969 1400 1500 1000 1500 1300 1300 1100 1100 1400 1200 1500 1700
1970 2000 1300 200 1000 1100 900 1100 1700 900 1500 1500 1800
1971 2100 900 1300 700 1200 500 1000 1600 2000 1400 1600 1600
1972 1600 1600 800 500 1400 1200 600 1700 1600 1500 2400 2300
1973 1400 1200 1600 1600 1200 900 1000 1400 600 800 1700 1800
1974 1800 1400 900 1100 1500 1400 1000 1500 1500 1700 2100 2200
1975 2600 1500 1100 1000 1200 1200 1100 1700 1300 1200 2200 1200
1976 1300 1400 1000 600 1300 600 700 1300 1800 1900 1900 1800
1977 1800 1100 2000 900 1400 1400 900 1700 1000 1600 2500 1600
1978 2500 1100 1000 1300 900 1400 1100 1300 1300 1500 1600 2000
1979 2100 1700 1900 1200 1200 600 600 1500 1000 2200 2400 1500
1980 1700 1100 600 700 900 1100 1300 800 1300 1500 2100 2200
1981 1600 1800 1100 1200 1300 700 1100 1200 1200 1300 1300 1800
1982 1300 1000 1500 800 1200 1300 1700 1400 1700 1800 2800 2500
1983 2200 1200 1400 1700 1200 1000 700 1600 1200 1200 1900 2400
1984 1400 1900 1000 800 1300 700 700 200 400 1500 2400 2100
1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1991-9999 400 400 600 800 900 1200 600 800 700 300 700
1992 600 100 200 800 500-9999-9999 700 500 200 1000 600
1993 100 200 200 200 700 1300 100 1100 900 800 300 900
1994 700 200 300 500 1000 700 300 600 1400 900 300 900
1995 900 900 600 900 500 1100 800 0 1000 100 1100 400
1996 600 100 300 1000 600 300 200 1100 1100 600 1000 1500
1997 700 500 600 400 600 1200 500 1500 700 1100 900 1000
1998 700 500 0 1000 1100 1000-9999 900 1300 1500 900 1600
1999 900 400 200 700 200 900 700 900 600 1000 800 700
2000 400 700 600 500 1400 1000 700 600 900 1000 900 1200
2001 400 700 700 300 1100 1000 1300 400 1000 900 900 1000
2002 800 1100 600 600 400 1500-9999 1100 900 700 1100 500
2003 800 800 400 600 500 200 400 900 1200 900 1100 500
2004 500 900 1000 600 800 900 800 1000 1500 1200 900 400
2005 700 300 100 900 600 1500 900 600 900 1600 800 500
2006 600 600 900 400 700 500 500 1000 900 1100 800 1300
2007 500 1000 900 200 1200 900 700-9999-9999-9999-9999-9999
You can see that the data after 1990 are for some months significantly lower than the
period before.. which would be the period the normals would be based on! I used Matlab
6190 1667 1342 1063 933 1172 1080 980 1288 1272 1412 1852 1840
0 1288 -10.00
They aren't percentage anomalies! They are percentage anomalies /10. This could explain
why
the real data areas had variability 10x too low. BUT it shouldn't be - they should be
regular percentage anomalies! This whole process is too convoluted and created myriad
Back on the case. I need to find where the post-1990 data came from for these three
stations. I already know
wet.0311061611.dtb
wet.0710161148.dtb
I was going to do further backtracing, but it's been revealed that the same issues were in
2.1 - meaning that
I didn't add the duff data. The suggested way forward is to not use any observations after
1989, but to allow
synthetics to take over. I'm not keen on this approach as it's likely (imo) to introduce
visible jumps at 1990,
since we're effectively introducing a change of data source just after calculating the
normals. My compromise is
So, first, we need synthetic-only from 1990 onwards, that can be married with the
existing glos from pre-1990.
Actually, we might as well produce a full series of gridded syn-only rd0. Hell, we can do
both options in one go!
No point in using the final gridding routine, rd0_gts_anom can produce glo files itself,
let's give it a go.
Well - not straightforward. rd0_gts_anom.pro is quite resistant to the idea that it might
produce half-degree
synthetics, to the point where I'm really not sure what's left to modify! Eventually found
it.. the .glo saving
routine takes a second argument which is a code for the grid size. Because just giving it
the grid size just
SaveGlo,23,rd0month,CallFile=Savefile,CallTitle=SaveTitle
Now that 23 is the key, but you have to look in quick_interp_tdm2.pro to decode it:
if (gs[0] eq 0.5) then SaveGrid=12
So actually, this was saving with a gridsize of 5 degrees! Disquietingly, this isn't born out
by the file sizes,
but we'll gloss over that. So, with '23' changed to '12', we have rd0_gts_anom_05.pro.
IDL>
rd0_gts_anom,1901,2006,1961,1990,outprefix='rd0syn05glo/rd0syn05.',pre_prefix='../pr
ebin05/prebin05.'
IDL>
crua6[/cru/cruts/version_3_0/secondaries/wet] ./glo2abs
Enter the path (if any) for the output files: rd0syn05abs/
rd0syn05glo/rd0syn05.01.1901.glo
rd0syn05glo/rd0syn05.1901.01.glo
rd0syn05.1901.01.glo
(etc)
rd0syn05.2006.12.glo
There was then some copying around of decades' worth chunks of .abs files, to make a set
with obs/syn to 1989
uealogin1[/cru/cruts/version_3_0/secondaries/wet] ./makegrids
start year with SSSS and end year with EEEE, and
CRU TS 3.00 Mean Temperature : CRU TS 3.00 Rain Days synth 1990 on
Writing: cru_ts_3_00.1901.1910.wet.dat
cru_ts_3_00.1901.1910.wet.nc
Writing: cru_ts_3_00.1911.1920.wet.dat
cru_ts_3_00.1911.1920.wet.nc
Writing: cru_ts_3_00.1921.1930.wet.dat
cru_ts_3_00.1921.1930.wet.nc
Writing: cru_ts_3_00.1931.1940.wet.dat
cru_ts_3_00.1931.1940.wet.nc
Writing: cru_ts_3_00.1941.1950.wet.dat
cru_ts_3_00.1941.1950.wet.nc
Writing: cru_ts_3_00.1951.1960.wet.dat
cru_ts_3_00.1951.1960.wet.nc
Writing: cru_ts_3_00.1961.1970.wet.dat
cru_ts_3_00.1961.1970.wet.nc
Writing: cru_ts_3_00.1971.1980.wet.dat
cru_ts_3_00.1971.1980.wet.nc
Writing: cru_ts_3_00.1981.1990.wet.dat
cru_ts_3_00.1981.1990.wet.nc
Writing: cru_ts_3_00.1991.2000.wet.dat
cru_ts_3_00.1991.2000.wet.nc
Writing: cru_ts_3_00.2001.2006.wet.dat
cru_ts_3_00.2001.2006.wet.nc
Now, we're on to STATISTICS. Specifically, the ones needed for the paper. They
include:
Cell coverage - the percentage of cells in each region that have stations, are near stations,
or are not covered;
All statistics will be required at a yearly timestep and broken down by region (ie, Europe,
Africa, Asia..).
Cell coverage would ideally come from the IDL gridder - however, as we know, the
approach of quick_interp_tdm2.pro
does not lend itself to revealing such information! The best solution will be to use the
paired cell/cdd station
Station counts should be straightforward to derive from the anomaly files (.txt), as output
by anomdtb.f90. This,
however, will only work for Primary parameters, since Secondaries are driven from
synthetic data as well. Further,
the synthetic element in this is usually at 2.5 degrees, so a direct relationship with half-
degree coverage will
be hard to establish.
Data sources will not be easy (see Station counts above). One approach could be to
analyse the anomaly files for
the Primary parameter(s), and make the assumption that their half-degree coverage will
carry through (via the
2.5-degree synthetic stage and the gridding) to the final gridded data.
Actually, I think the most logical approach is to produce secondary station files that just
record the observed
contributions (as opposed to the derived ones). Users will be free to use these in tandem
with the appropriate
primary counts, which they can assume will have 'contributed' to the unfilled cells but to
a less reliable extent
/cru/cruts/final_structure
/cru/cruts/final_structure/incoming
/cru/cruts/final_structure/incoming/BOM
/cru/cruts/final_structure/incoming/CLIMAT
/cru/cruts/final_structure/incoming/MCDW
/cru/cruts/final_structure/incoming/other
/cru/cruts/final_structure/primary
/cru/cruts/final_structure/primary/tmp
/cru/cruts/final_structure/primary/tmp/txt
/cru/cruts/final_structure/primary/tmp/glo
/cru/cruts/final_structure/primary/tmp/abs
/cru/cruts/final_structure/primary/tmp/stn
/cru/cruts/final_structure/primary/tmp/stn/cdd0
/cru/cruts/final_structure/primary/tmp/stn/cddn
/cru/cruts/final_structure/primary/pre
/cru/cruts/final_structure/primary/pre/txt
/cru/cruts/final_structure/primary/pre/glo
/cru/cruts/final_structure/primary/pre/abs
/cru/cruts/final_structure/primary/pre/stn
/cru/cruts/final_structure/primary/pre/stn/cdd0
/cru/cruts/final_structure/primary/pre/stn/cddn
/cru/cruts/final_structure/primary/tmn
/cru/cruts/final_structure/primary/tmn/txt
/cru/cruts/final_structure/primary/tmn/glo
/cru/cruts/final_structure/primary/tmn/abs
/cru/cruts/final_structure/primary/tmn/stn
/cru/cruts/final_structure/primary/tmn/stn/cdd0
/cru/cruts/final_structure/primary/tmn/stn/cddn
/cru/cruts/final_structure/primary/tmx
/cru/cruts/final_structure/primary/tmx/txt
/cru/cruts/final_structure/primary/tmx/glo
/cru/cruts/final_structure/primary/tmx/abs
/cru/cruts/final_structure/primary/tmx/stn
/cru/cruts/final_structure/primary/tmx/stn/cdd0
/cru/cruts/final_structure/primary/tmx/stn/cddn
/cru/cruts/final_structure/primary/dtr
/cru/cruts/final_structure/primary/dtr/txt
/cru/cruts/final_structure/primary/dtr/glo
/cru/cruts/final_structure/primary/dtr/abs
/cru/cruts/final_structure/primary/dtr/stn
/cru/cruts/final_structure/primary/dtr/stn/cdd0
/cru/cruts/final_structure/primary/dtr/stn/cddn
/cru/cruts/final_structure/secondary
/cru/cruts/final_structure/secondary/vap
/cru/cruts/final_structure/secondary/vap/syn
/cru/cruts/final_structure/secondary/vap/txt
/cru/cruts/final_structure/secondary/vap/glo
/cru/cruts/final_structure/secondary/vap/abs
/cru/cruts/final_structure/secondary/vap/stn
/cru/cruts/final_structure/secondary/vap/stn/observed_only
/cru/cruts/final_structure/secondary/vap/stn/observed_only/cdd0
/cru/cruts/final_structure/secondary/vap/stn/observed_only/cddn
/cru/cruts/final_structure/secondary/vap/stn/all/cdd0
/cru/cruts/final_structure/secondary/vap/stn/all/cddn
/cru/cruts/final_structure/secondary/wet
/cru/cruts/final_structure/secondary/wet/syn
/cru/cruts/final_structure/secondary/wet/txt
/cru/cruts/final_structure/secondary/wet/glo
/cru/cruts/final_structure/secondary/wet/abs
/cru/cruts/final_structure/secondary/wet/stn
/cru/cruts/final_structure/secondary/wet/stn/observed_only
/cru/cruts/final_structure/secondary/wet/stn/observed_only/cdd0
/cru/cruts/final_structure/secondary/wet/stn/observed_only/cddn
/cru/cruts/final_structure/secondary/wet/stn/all/cddn
/cru/cruts/final_structure/secondary/frs
/cru/cruts/final_structure/secondary/frs/syn
/cru/cruts/final_structure/secondary/frs/txt
/cru/cruts/final_structure/secondary/frs/glo
/cru/cruts/final_structure/secondary/frs/abs
/cru/cruts/final_structure/secondary/frs/stn
/cru/cruts/final_structure/secondary/frs/stn/observed_only
/cru/cruts/final_structure/secondary/frs/stn/observed_only/cdd0
/cru/cruts/final_structure/secondary/frs/stn/observed_only/cddn
/cru/cruts/final_structure/secondary/frs/stn/all/cdd0
/cru/cruts/final_structure/secondary/frs/stn/all/cddn
/cru/cruts/final_structure/secondary/cld
/cru/cruts/final_structure/secondary/cld/syn
/cru/cruts/final_structure/secondary/cld/txt
/cru/cruts/final_structure/secondary/cld/glo
/cru/cruts/final_structure/secondary/cld/abs
/cru/cruts/final_structure/secondary/cld/stn
/cru/cruts/final_structure/secondary/cld/stn/observed_only
/cru/cruts/final_structure/secondary/cld/stn/observed_only/cdd0
/cru/cruts/final_structure/secondary/cld/stn/observed_only/cddn
/cru/cruts/final_structure/secondary/cld/stn/all * might not do these
/cru/cruts/final_structure/secondary/cld/stn/all/cdd0
/cru/cruts/final_structure/secondary/cld/stn/all/cddn
/cru/cruts/final_structure/static
/cru/cruts/final_structure/static/climatology
/cru/cruts/final_structure/static/mask
mcdw2cru (interactive)
climat2cru (interactive)
tmnx2dtr (interactive)
frs_gts_tdm
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
vap_gts_anom
anomdtb (interactive)
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
rd0_gts_anom
anomdtb (interactive)
quick_interp_tdm2
glo2abs (interactive)
makegrids (interactive)
platforms) a text file might be the way forward. At a pinch, all it would need to contain
would be the root to
users to compile into.. and a file of compile lines? I wonder how far off I am from a
makefile? That would help with
the frightening anomdtb.f linkages. Tried 'make' with anomdtb and it doesn't
automatically find the includes, even
I guess I need to finish the fortran gridder program. That would allow steamlining. Notes
on that
work are mainly in the file 'gridder.sandpit'. Suffice to say, it works :-) Needs tweaking,
and
So, you release a dataset that people have been clamouring for, and the buggers only start
<QUOTE>
I realise you are likely to be very busy at the moment, but we have come across
something in
the CRU TS 3.0 data set which I hope you can help out with.
We have been looking at the monthly precipitation totals over southern Africa (Angola,
to be
precise), and have found some rather large differences between precipitation as specified
in
the TS 2.1 data set, and the new TS 3.0 version. Specifically, April 1967 for the cell
12.75
south, 16.25 east, the monthly total in the TS 2.1 data set is 251mm, whereas in TS 3.0 it
is
476mm. The anomaly does not only appear in this cell, but also in a number of
neighbouring
cells. This is quite a large difference, and the new TS 3.0 value doesn't entirely tie in
with what we might have expected from the station-based precip data we have for this
area.
Would it be possible for you could have a quick look into this issue?
Many thanks,
Daniel.
--------------------------------------------------------
Dr Daniel Kingston
Department of Geography
Gower Street
London
WC1E 6BT
UK
Email [email protected]
<END>
Well, it's a good question! And it took over two weeks to answer. I wrote angola.m,
which
pretty much established that three local stations had been augmented for 3.0, and that
April 1967 was anomalously wet. Lots of non-reporting stations (ie too few years to form
normals) also had high values. As part of this, I also wrote angola3.m, which added two
rather interesting plots: the climatology, and the output from the Fortran gridder I'd just
1. The 2.10 output doesn't look like the climatology, despite there being no stations in
the area. It ought to have simply relaxed to the clim, instead it's wetter.
2. The gridder output is lower than 3.0, and much lower than the stations!
I asked Tim and Phil about 1., they couldn't give a definitive opinion. As for 2., their
guesses were correct, I needed to mod the distance weighting. As usual, see
gridder.sandpit
to pay somebody for a month to recreate Mark New's coefficients. But it never quite
The idea is to derive the coefficients (for the regressing of cloud against DTR) using
the published 2.10 data. We'll use 5-degree blocks and years 1951-2002, then produce
coefficients for each 5-degree latitude band and month. Finally, we'll interpolate to
Lots of 'issues'. We need to exclude 'background' stations - those that were relaxed to
the climatology. This is hard to detect because the climatology consists of valid values,
so testing for equivalence isn't enough. It might have to be the station files *shudder*.
Using station files was OK, actually. A bigger problem was the inclusion of strings of
consecutive, identical values (for cloud and/or dtr). Not sure what the source is, as they
are not == to the climatology (ie the anoms are not 0). Discussed with Phil - decided to
try excluding any cell with a string like that of >10 values. Cloud only for now. The
result
3.00 -38.00
3.00 -38.00
3.00 -36.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -37.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -41.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -43.00
3.00 -38.00
3.00 -38.00
3.00 -41.00
3.00 -38.00
3.00 -39.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -43.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -44.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
3.00 -38.00
As can be seen, neither the dtr (left) nor the cloud (right) look 'sensible', even as
anomalies. Several other months in lat band #19 are either nan or -999 (count=0).
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -53.50
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
13.50 -50.00
17.00 -65.00
8.50 -42.00
11.50 -49.00
18.00 -71.00
////////////////////
1.00 33.50
1.00 40.00
1.00 32.00
1.00 42.50
1.00 38.00
1.00 38.00
1.00 32.50
1.00 52.50
1.00 44.00
1.00 36.50
1.00 41.00
1.00 30.50
1.00 38.00
1.00 36.00
1.00 38.00
1.00 38.50
1.00 39.00
1.00 31.50
1.00 40.00
1.00 38.00
1.00 31.00
1.00 44.00
1.00 43.00
1.00 37.00
1.00 31.00
1.00 31.00
1.00 30.50
So, we can have a proper result, but only by including a load of garbage! In fact,
10 52 nan nan
Hmm.. also tried just removing duplicate strings (rather than whole cells):
This 'looks' better - not so steep, and the intercept is a shade closer to 0. The
Matlab script plotcld.m allows comparison of scatter diagrams, these are fed from
example data files manually extracted from the cloudreg.log file after varying the
Showed Phil - and now sidetracked into producing global mean series from the 3.0
OK, got cloud working, have to generate it now.. but distracted by starting on the
mythical 'Update' program. As usual, it's much more complicated than it seems. So,
Well first, some ground rules: should this be 'dumb'? Should the operator say what
they want to happen, and walk away, coming back later to check it worked? Or should
forth? At the moment, the introduction of new data (MCDW, CLIMAT, BOM) is highly
interactive, and, though BOM should be fully automatic in the future, the same
cannot be said for MCDW and CLIMAT. Hmmm. well I guess there are two possibilities:
as necessary, some of which may ask the operator to decide on matches. This could
take hours, or even days, depending on the quality of the incoming metadata.
of the merge programs. These have a fixed threshold of confidence for adding new
data to exisitng databases. When the threshold is crossed, the data is not added
but stored in a new database, which might of course be later added under option 1.
Note that the threshold would be higher than that in 1. that initiates operator
involvement.
Is this sufficient? It certainly means more coding, but not a huge amount. In a worst
case scenario (where the operator always chooses '2.'), we still have the unused data
updates that can be interactively merged in at any time (even yaesr in the future).
This all avoids the big questions, of course. When do updates happen, and how far back
do they go? For instance, let's say there are six-month published updates. So say the
full 1901-present files are published yearly, with six-month update files as interims.
B. The data used in the January-to-June update is further updated after publication and
is present in the next 'full' release (so that the early Jan-Jun grids differ from
(in both A. and B., it would usually be MCDW updates that carried retrospective data,
to update, it ought to warn if it finds earlier updates in those files. So further mods
to mcdw2cruauto are required.. its results file must list extras. Or - ooh! How about a
SECOND output database for the MCDW updates, containing just the OVERDUE stuff?
Back.. think.. even more complicated. My head hurts. No, it actually does. And I ought
to be on my way home. But look, we create a new master database (for each parameter)
every time we update, don't we? What we ought to do is provide a log file for each
new database, identifying which data have been added. Oh, God. OK, let's go..
2. Ops selects MCDW, CLIMAT, and/or BOM data and gives update dates
4. Update checks source files are present and initiates conversion to CRU format.
5. Update runs the merging program to join the new data to the existing databases,
creating new databases. If data for previous periods is included in the update
files, it will be included.
5a. If Ops selected 'automatic', merging program asks for decisions on 'difficult'
6. Merge program creates log of changes between old databases and new ones, inc.
UPDATE PROCESS
5. Update runs the anomaly and gridding programs for the specified period.
Note. The following system command will find the number of stations reporting in a
Discovered ('remembered' would be better; sadly I didn't) that I never got round to
writing a BOM-to-CRU converter. It got overtaken by the drastic need to get the tmin
and tmax databases synchronised (see above, somewhere). There was a barely-started
was a good entry into the fraught world of automatic, script-fed programs.
succession (the latter two having their output databases compared successfully with
Next, I suppose it's the next in the sequence: mergedb. This is where I'm anxious: I
want it all to be plain sailing and automatic, but I don't think there's any practical
way to obviate the operator from the need to make judgements on the possible mapping
of stations.
---
Back to get CLD sorted out. Need a break from the updater! Though much the same
difficulties, trying to work out the process (it's anything but straightforward for
cloud, seeing as the incoming updates are in Sun Hours, and we have to apply our
one of the problems is that you need a latitude value to perform the conversion - so
the CLIMAT bulletins lose the value if they can't be matched in the WMO list! Not much
I can do about that, and let's face it those stations are going to end up as 'new'
So.. using the new converters (which are built to be driven by the update program), I
uealogin1[/cru/cruts/version_3_0/update_top] ./mcdw2cruauto
uealogin1[/cru/cruts/version_3_0/update_top] cat
results/results.0901101032/mcdw.0901101032.res
OK
uealogin1[/cru/cruts/version_3_0/update_top]
uealogin1[/cru/cruts/version_3_0/update_top] ./climat2cruauto
uealogin1[/cru/cruts/version_3_0/update_top] cat
results/results.0901101032/climat.0901101032.res
OK
uealogin1[/cru/cruts/version_3_0/update_top]
The output cld databases both look OK, and pretty much equivalent except that MCDW
goes back
further (to 1994). CLIMAT is 2000 onwards because that's what's on Phil Brohan's
website.
the notes, this was the product of processing the MCDW and CLIMAT bulletins into
giving us cld.0711272230.dtb. So the new cloud databases I've just produced should
be, if not identical, very similar? Oh, dear. There is a passing similarity, though
this seems to break down in Winter. I don't have time to do detailed comparisons, of
course, so we'll just run with the new one. After all, I have tested the conversion
for the latest programs, I'm not sure how much testing was done last time.
The procedure last time - that is, when I was trying to re-produce TS 2.10, we have
no idea what the procedure was for its initial production! - was to incorporate the
sun percent data from the bulletins into the existing sun percent db (spc.0312221624.dtb).
The trouble is, the existing cloud dbs are bigger. They stop at 1996, but that's no
228936 cld.0301081434.dtb
104448 cld.0312181428.dtb
111989 combo.cld.dtb
57395 spc.0301201628.dtb
51551 spc.0312221624.dtb
51551 spc.94-00.0312221624.dtb
So, how about merging our new MCDW cloud database into cld.0312181428.dtb, then
merging
the CLIMAT one into that? The logic here is that the cloud must be from observations,
because the sun databases are much smaller. Well, the ones we know about! It might be
worth checking the station numbers for each year though. Unfortunately, we don't have a
lot of luck merging MCDW updates into the Dec 2003 CLD database:
uealogin1[/cru/cruts/version_3_0/db/cld] ./newmergedb
you want the quick and dirty approach? This will blindly match
Writing cld.0902101404.dtb
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
New master database: cld.0902101404.dtb
(automatically: 858)
(by operator: 0)
> Rejected: 5
uealogin1[/cru/cruts/version_3_0/db/cld]
Of course, as we are only generating from 1996 onwards, this probably isn't
Next, merge CLIMAT into the new database. well of course this is much more
uealogin1[/cru/cruts/version_3_0/db/cld] ./newmergedb
Writing cld.0902101409.dtb
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
OUTPUT(S) WRITTEN
(automatically: 1858)
(by operator: 0)
> Rejected: 2
uealogin1[/cru/cruts/version_3_0/db/cld]
updated first with derived-cloud data from MCDW (1994-2008), then with
Well, we have the program. And we've played with it, but forgot to c&p those runs
into here (well they were only a few days ago!) so here they are now:
crua6[/cru/cruts/version_3_0/secondaries/cld/cldfromdtrtxt] ./dtr2cld
Then an experimental IDL gridding using half degree and glo output. It was late at night,
crua6[/cru/cruts/version_3_0/secondaries/cld] idl
IDL Version 5.4 (OSF alpha). (c) 2000, Research Systems, Inc.
IDL>
quick_interp_tdm2,1995,2006,'cld.',750,gs=0.5,dumpglo='dumpglo',pts_prefix='cldfromd
trtxt/cld.'
Defaults set
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
So now we need to try that last step again - this time going for 2.5-degree binary
outputs, suitable for feeding back into it for the full cloud gridding. Oh, my.
IDL>
quick_interp_tdm2,1996,2006,'cldfromdtr25bin/cld.',750,gs=2.5,dumpbin='dumpbin',pts_
prefix='cldfromdtrtxt/cld.'
crua6[/cru/cruts/version_3_0/secondaries/cld] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.cld
cld.0902101409.dtb
cld.0902101409.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
1961,1990
25
cldnew.txt
1996,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
crua6[/cru/cruts/version_3_0/secondaries/cld]
Unfortunately, that isn't working. Too many stations outside the usual normals
period (1961-1990). My notes from the last attempt are less than inspiring.. it
looks as though we need the program 'normshift.for', and normalise 95-02. So:
crua6[/cru/cruts/version_3_0/secondaries/cld] ./anomdtb
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
.cld
cld.0902101409.dtb
cld.0902101409.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
25
cldupdate.txt
1996,2006
> Operating...
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0902101409.dtb
crua6[/cru/cruts/version_3_0/secondaries/cld]
Hmm.. that's giving us between 670 and 790 stations per month.. not too bad
I suppose seeing as it's a secondary parameter. Now for normshift, which has
normals and 1961-1990 normals. So after gridding we could add these.. except that
after gridding we'll have incorporated the DTR_derived synthetic cloud, which is
of course based on the 1961-1990 normals as it's derived from DTR!! Arrrrggghh.
So.. {sigh}.. another problem. Well we can't change the updates side, that has to
use 1995-2002 normals. But maybe we'll have to adjust the station anomalies, prior
Wrote movenorms.for, using the engine of dtr2cld (as it's processing the same kind
of files and also needs to map stations to cells). However we quickly hit a problem:
crua6[/cru/cruts/version_3_0/secondaries/cld] ./movenorms
Please enter a generic source file with MM for month and YYYY for year:
cldupdatetxt/cldupdate.YYYY.MM.txt
Start MONTH: 01
End MONTH: 12
Please enter a generic destination file with MM for month and YYYY for year:
cldupdate6190/cldupdate6190.YYYY.MM.txt
File: cldupdate6190/cldupdate6190.1996.01.txt
crua6[/cru/cruts/version_3_0/secondaries/cld]
This is a station on the West coast of India; probably Mumbai. Unfortunately, as a coastal
station it runs the risk of missing the nearest land cell. The simple movenorms program is
about to become less simple.. but was do-able. The log file was empty at the end,
indicating
that all 'damp' stations had found dry land:
crua6[/cru/cruts/version_3_0/secondaries/cld] ./movenorms
Please enter a generic source file with MM for month and YYYY for year:
cldupdatetxt/cldupdate.YYYY.MM.txt
Start MONTH: 1
End MONTH: 12
Please enter a generic destination file with MM for month and YYYY for year:
cldupdate6190/cldupdate6190.YYYY.MM.txt
crua6[/cru/cruts/version_3_0/secondaries/cld] wc -l movenorms.log
0 movenorms.log
crua6[/cru/cruts/version_3_0/secondaries/cld]
So.. now I should be able to do the final gridding of cloud for 1996-2006.
IDL>
quick_interp_tdm2,1996,2006,'cloudcomboglo/cld.',750,gs=0.5,dumpglo='dumpglo',synt
h_prefix='cldfromdtr25bin/cld.',pts_prefix='cldupdate6190/cldupdate6190.'
<output removed as re-done below with CDD=600>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./glo2abs
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./makegrids
All files look alright. BUT. The NetCDF attributes (which are still bad) do say that
the CDD for cloud is 600. If it is, I will eat my screen, because I'll have to do all
IDL>
quick_interp_tdm2,1996,2006,'cldfromdtr25bin/cld.',600,gs=2.5,dumpbin='dumpbin',pts_
prefix='cldfromdtrtxt/cld.'
Defaults set
1996
% Compiled module: MAP_SET.
1997
1998
1999
2000
2001
2002
2003
2004
grid 2004 non-zero -23.9221 47.9721 145.4819 cells= 48179
2005
2006
IDL>
IDL>
quick_interp_tdm2,1996,2006,'cloudcomboglo/cld.',600,gs=0.5,dumpglo='dumpglo',synt
h_prefix='cldfromdtr25bin/cld.',pts_prefix='cldupdate6190/cldupdate6190.'
Defaults set
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./glo2abs
Enter the path (if any) for the output files: cloudcomboabs/
Choose: 2
cloudcomboglo/cld.01.1996.glo
cloudcomboglo/cld.1996.01.glo
cld.1996.01.glo
cld.1996.02.glo
(etc)
cld.2006.11.glo
cld.2006.12.glo
crua6[/cru/cruts/version_3_0/secondaries/cld]
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./makegrids
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1996.2000.cld.dat
cru_ts_3_00.1996.2000.cld.nc
Writing: cru_ts_3_00.2001.2006.cld.dat
cru_ts_3_00.2001.2006.cld.nc
uealogin1[/cru/cruts/version_3_0/secondaries/cld]
The question is, IS THIS ANY GOOD? Well, we currently have published cloud data
to 2002. So we can make comparisons between 1996 and 2002. Oh, my. I am sure
I used mmeangrid.for to calculate monthly mean fields (1996-2002) for both 2.10 and
less than ideal, though they could have been much worse. Essentially, North America
is totally different - cloudier in Feb/Mar/Apr, sunnier the rest of the year. There
are other differences, particularly in Northern Asia, but these are oatchier and
don't extend throughout the year. So.. the obvious cause would be the inclusion of
DTR-derived cloud, since that would have significant station counts in North America
compared to CLD? Also, there seems to be horizontal banding.. not a good sign given
the nature of the DTR-to-CLD conversion! Naturally, the way to test this is to make
The inclusion of 5 will show the extent of missing data, perhaps.. so I'm suggesting
3-2 How the DTR-derived synthetic CLD relates to the 'combo' CLD
IDL>
quick_interp_tdm2,1996,2002,'cldfromdtrglo05/cld.',600,gs=0.5,dumpglo='dumpglo',pts_
prefix='cldfromdtrtxt/cld.'
Defaults set
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
quick_interp_tdm2,1996,2002,'cldfromupdate6190glo/cld.',600,gs=0.5,dumpglo='dumpgl
o',pts_prefix='cldupdate6190/cldupdate6190.'
Defaults set
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./glo2abs
Enter the path (if any) for the output files: cldfromdtrglo05abs/
Choose: 2
cldfromdtrglo05/cld.01.1996.glo
cldfromdtrglo05/cld.1996.01.glo
cld.1996.01.glo
cld.1996.02.glo
(etc)
cld.2002.11.glo
cld.2002.12.glo
crua6[/cru/cruts/version_3_0/secondaries/cld] ./glo2abs
Enter the path (if any) for the output files: cldfromupdate6190gloabs/
Choose: 2
cldfromupdate6190glo/cld.01.1996.glo
cldfromupdate6190glo/cld.1996.01.glo
cld.1996.01.glo
cld.1996.02.glo
(etc)
cld.2002.11.glo
cld.2002.12.glo
crua6[/cru/cruts/version_3_0/secondaries/cld]
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./makegrids
start year with SSSS and end year with EEEE, and
ending with '.dat', eg: cru_ts_3_00.SSSS.EEEE.tmp.dat :
cru_ts_3_00.SSSS.EEEE.cld_from_dtr_only.dat
CRU TS 3.00 Mean Temperature : CRU TS 3.00 Percentage Cloud Cover from DTR
only
Writing: cru_ts_3_00.1996.2000.cld_from_dtr_only.dat
cru_ts_3_00.1996.2000.cld_from_dtr_only.nc
Writing: cru_ts_3_00.2001.2002.cld_from_dtr_only.dat
cru_ts_3_00.2001.2002.cld_from_dtr_only.nc
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./makegrids
CRU TS 3.00 Mean Temperature : CRU TS 3.00 Percentage Cloud Cover from SUN obs
only
Writing: cru_ts_3_00.1996.2000.cld_from_sunobs_only.dat
cru_ts_3_00.1996.2000.cld_from_sunobs_only.nc
Writing: cru_ts_3_00.2001.2002.cld_from_sunobs_only.dat
cru_ts_3_00.2001.2002.cld_from_sunobs_only.nc
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./mmeangrid
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./mmeangrid
uealogin1[/cru/cruts/version_3_0/secondaries/cld]
1. cru_ts_2_10.1996.2002.cld.dat.mmeans
2. cru_ts_3_00.1996.2002.cld.dat.mmeans
3. cru_ts_3_00.1996.2002.cld_from_dtr_only.dat.mmeans
4. cru_ts_3_00.1996.2002.cld_from_sunobs_only.dat.mmeans
5. clim.6190.lan.cld.grid
And here are our target comparisons again, this time with notes:
-> major diffs globally, all months (lat. striping) c/w 3-1
The deduction so far is that the DTR-derived CLD is waaay off. The DTR looks OK, well
OK in the sense that it doesn;t have prominent bands! So it's either the factors and
offsets from the regression, or the way they've been applied in dtr2cld.
Well, dtr2cld is not the world's most complicated program. Wheras cloudreg is, and I
immediately found a mistake! Scanning forward to 1951 was done with a loop that, for
of 600!!! That may have had something to do with it. I also noticed, as I was correcting
THAT, that I reopened the DTR and CLD data files when I should have been opening the
bloody station files!! I can only assume that I was being interrupted continually when
I was writing this thing. Running with those bits fixed improved matters somewhat,
though now there's a problem in that one 5-degree band (10S to 5S) has no stations! This
will be due to low station counts in that region, plus removal of duplicate values.
Had a think. Phil advised averaging the bands either side to fill the gap, but yuk! And
also the band to the North (ie, 5S to equator) is noticeably lower (extreme, even). So
<MAIL QUOTE>
Phil,
I've looked at why we're getting low counts for valid cloud cells in certain 5-degree
latitude bands.
The filtering algorithm omits any cell values where the station count is zero, for either
CLD or DTR. In general, it's the CLD counts that are zero and losing us the data.
However, in many cases, the cloud value in that cell on that month is not equal to the
climatology. And there is plenty of DTR data. So I'm wondering how accurate the station
counts are for secondary variables, given that they have to reflect observed and synthetic
inputs. Here's a brief example:
CLD------------------- DTR-------------------
So, I'm proposing to filter on only the DTR counts, on the assumption that PRE was
probably available if DTR was, so synthesis of CLD was likely to have happened, just
not shown in the station counts which are probably 'conservative'?
I didn't get an email back but he did verbally consent. So away we go!
Running with a DTR-station-only screening gives us lots of station values, even with
duplicate filtering turned back on. Niiice. It's still not exactly smooth, but it
crua6[/cru/cruts/version_3_0/secondaries/cld] ./dtr2cld
crua6[/cru/cruts/version_3_0/secondaries/cld]
IDL>
quick_interp_tdm2,1996,2006,'cldfromdtr25bin/cld.',600,gs=2.5,dumpbin='dumpbin',pts_
prefix='cldfromdtrtxt/cld.'
Defaults set
1996
1997
1998
grid 1998 non-zero -31.6090 41.7502 182.8359 cells= 36481
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
Final gridding with obs as well:
IDL>
quick_interp_tdm2,1996,2006,'cloudcomboglo/cld.',600,gs=0.5,dumpglo='dumpglo',synt
h_prefix='cldfromdtr25bin/cld.',pts_prefix='cldupdate6190/cldupdate6190.'
Defaults set
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
IDL>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./glo2abs
Enter the path (if any) for the output files: cloudcomboabs/
Choose: 2
cloudcomboglo/cld.01.1996.glo
cloudcomboglo/cld.1996.01.glo
cld.1996.01.glo
cld.1996.02.glo
(etc)
cld.2006.11.glo
cld.2006.12.glo
crua6[/cru/cruts/version_3_0/secondaries/cld]
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./makegrids
start year with SSSS and end year with EEEE, and
Writing: cru_ts_3_00.1996.2000.cld.dat
cru_ts_3_00.1996.2000.cld.nc
Writing: cru_ts_3_00.2001.2006.cld.dat
cru_ts_3_00.2001.2006.cld.nc
uealogin1[/cru/cruts/version_3_0/secondaries/cld]
uealogin1[/cru/cruts/version_3_0/secondaries/cld] ./mmeangrid
uealogin1[/cru/cruts/version_3_0/secondaries/cld]
Back with cmpmgrids.m.. and things look MUCH better. Differences with the
climatology,
or with the 2.10 release, are patchy and generally below 30%. Of course it would be
nice if the differences with the 2.10 release were negligable, since our regression
coefficients were based on 2.10 DTR and CLD.. though of course the sun hours
component
is an unknown there, as is the fact that 2.10 used PRE as well as DTR for the synthetics.
Anyway it gets the thumbs-up. The strategy will be to just produce it for 2003-2006.06,
to tie in with the rest of the 3.00 release. So I just need to.. argh. I don't have any
way to create NetCDF files 1901-2006 without the .glo.abs files to work from! I'd have
to specially code a version that swallowed the existing 1901-2002 then added ours. Meh.
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] ls -l
total 2414884
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] gunzip
cru_ts_3_00.2001.2006.cld.dat.gz
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] wc -l
cru_ts_3_00.2001.2006.cld.dat
25920 cru_ts_3_00.2001.2006.cld.dat
17280 cru_ts_3_00.2003.2006.cld.dat
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] mv cru_ts_2_10.1901-
2002.cld.grid cru_ts_3.00.1901.2006.cld.dat
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] cat
cru_ts_3_00.2003.2006.cld.dat >>cru_ts_3.00.1901.2006.cld.dat
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final] wc -l
cru_ts_3.00.1901.2006.cld.dat
457920 cru_ts_3.00.1901.2006.cld.dat
crua6[/cru/cruts/version_3_0/secondaries/cld/cld_final]
Well to take a slightly different tack, I thought I'd look at the gridding end of
things. Specifically, how to run IDL in batch mode. I think I've got it: you create
a batch file with the command(s) in, then setenv IDL_STARTUP [name of batch file].
When you type 'idl' it runs the batch file, unfortunately it doesn't quit afterwards,
though adding an 'exit' line to the batch file does the trick! Of course, there is no
easy way to check it's working properly, since the random element (used when relaxing
crua6[/cru/cruts/version_3_0/secondaries/cld]
Still, the mechanism is so similar to that used to run other Fortran progs that we
can carry on, I guess. Naturally I would prefer to use the gridder I wrote, partly
because it does a much better, *documentable* job, but mainly because I don'y want
Also looked at NetCDF production, as it's still looming. ncgen looks quite good, it
can work from a 'CDL' file (format is the same as the output from ncdump). It can
Ah well. Back to the 'incoming data' process. The fact that the mcdw2cruauto and
climat2cruauto programs worked fine for CLD is a big bonus, they read their runs
and date files andthey wrote their results. Though the results didn't include the
names of the output databases, I've had second thoughts about that. I want the
update program to be in charge, so it should know what files have been produced
(assuming the result is 'OK'). If the conversion program sends back a list, then
the update program will have to parse it to find out which parameter is which,
and that's silly when it should know anyway!! The situation is different for
merging. I don't have a full strategy for file naming yet. Let's look at a typical
process for an unnamed (not tmn or tmx) primary parameter, ie simple case:
File(s) Process
mcdw update(s)
convert mcdw
mcdw db
current db
current+mcdw db
climat update(s)
convert climat
climat db
current+mcdw+climat db
anomalise
anomaly files
grid
gridded anomalies
climatology
actualise
gridded actuals
reformat into .dat and .nc
So, naming. Well the governing principle of the update process is that all files
have the same 10-digit datestamp. So the run can be uniquely identified, as can
all its files (data, log, etc). I am NOT changing that! A main problem is that
we will have to depart from the rigid database naming schema ('tla.datestr.dtb')
because we will have lots of databases in a single run. In the above example,
four databases will all have the same datestamp. Here's a possible name system:
mcdw db mcdw.tla.datestr.dtb
current+mcdw db int1.tla.datestr.dtb
climat db clmt.tla.datestr.dtb
current+mcdw+climat db int2.tla.datestr.dtb
tla.datestr.dtb
pre.0902161401.dtb
For secondary parameters it's even worse! I'm not super-keen on the use of 'int1'
('interim 1') and so on.. they give no useful information. But a more complicated
schema isn't going to be uderstood by anyone else anyway! And we should have the
Database Master List to refer to at all times.. okay. All interim databases will
be labeled 'int1', 'int2', and so forth. The update program will have to keep
track of numbering. And, of course - it will have to tell the merging program
It gets WORSE. The update program has to know which 'Master' database to pass to
the merge program. For MCDW, it's going to be the 'current' database for that
parameter. But for CLIMAT and BOM, it depends on whether MCDW or CLIMAT
(respectively) merges have gone before. And only for those parameters that are
precursored! More complexity. Well, I suppose I can take one of two approaches:
1. Test at each stage for each parameter (ie for BOM, test whether CLIMAT tmx/tmn
have just been done). This could be done by testing for the filenames or by
setting flags.
2. Maintain a list in memory of 'latest' databases for each parameter. A bit less
Okay. Because it is so complicated (well, for my brain anyway), I'm going to write
out the filenames that update is using and expecting, so I can check that the
dtstr = 0902161655
par = TMP
source = MCDW
prev db = db/tmp/tmp.0809111204.dtb
CONVERSION
MERGING
db/tmp/tmp.0809111204.dtb Current/latest db
the necessary directories are being created yet, though.. they are now. Some
So, with half of the update program written, I got it all compiled, reset all
Of course, I immediately realised that I'd missed out the DTR conversion at the end.
And that.. didn't go any better than the rest of it, despite a quick conversion of
tmnx2dtrauto.for.
stuff revolves around the tmin and tmax databases being kept in absolute step. That is,
same stations, same coordinates and names, same data spans. Otherwise the job of
synching, and of converting to DTR, becomes horrendous. But look at what happens to
the
606244 tmn/tmn.0708071548.dtb
606244 tmx/tmx.0708071548.dtb
climat conversions
27090 climat.tmn.0902192248.dtb
27080 climat.tmx.0902192248.dtb
607692 int2.tmn.0902192248.dtb
604993 int2.tmx.0902192248.dtb
5388 bom.tmn.0902192248.dtb
5388 bom.tmx.0902192248.dtb
607692 int3.tmn.0902192248.dtb
604993 int3.tmx.0902192248.dtb
Sometimes life is just too hard. It's after midnight - again. And I'm doing all this
over VNC in 256 colours, which hurts. Anyway, the above line counts. I don't know
which is the more worrying - the fact that adding the CLIMAT updates lost us 1251
lines from tmax but gained us 1448 for tmin, or that the BOM additions added sod all.
And yes - I've checked, the int2 and int3 databases are IDENTICAL. Aaaarrgghhhhh.
I guess.. I am going to need one of those programs I wrote to sync the tmin and tmax
databases, aren't I?
Actually, it's worse than that. The CLIMAT merges for TMN and TMX look very
similar:
(automatically: 2227)
(by operator: 0)
Rejects file:
updates/CLIMAT/db/db.0902192248/climat.tmn.0902192248.dtb.rejected
<END QUOTE>
(automatically: 2226)
(by operator: 0)
> Added as new Master stations: 566
Rejects file:
updates/CLIMAT/db/db.0902192248/climat.tmx.0902192248.dtb.rejected
<END QUOTE>
I don't see how we end up with such drastic differences in line counts!!
Well the first thing to do was to fix climat2cruauto so that it treated tmin and tmax as
inseparable. Thus the CLIMAT databases for these two should be identical (um, apart
from
OK, this is getting SILLY. Now the BOM and CLIMAT conversions are in sync, and the
original
originals
606244 db/tmn/tmn.0708071548.dtb
606244 db/tmx/tmx.0708071548.dtb
climat conversions
27080 updates/CLIMAT/db/db.0902201023/climat.tmn.0902201023.dtb
27080 updates/CLIMAT/db/db.0902201023/climat.tmx.0902201023.dtb
climat merged interims
607687 updates/CLIMAT/db/db.0902201023/int2.tmn.0902201023.dtb
604987 updates/CLIMAT/db/db.0902201023/int2.tmx.0902201023.dtb
5388 updates/BOM/db/db.0902201023/bom.tmn.0902201023.dtb
5388 updates/BOM/db/db.0902201023/bom.tmx.0902201023.dtb
607687 updates/BOM/db/db.0902201023/int3.tmn.0902201023.dtb
604987 updates/BOM/db/db.0902201023/int3.tmx.0902201023.dtb
So the behaviour of newmergedbauto is, for want of a better word, unpredictable. Oh,
joy.
(automatically: 0)
(by operator: 0)
> Added as new Master stations: 0
Rejects file:
updates/BOM/db/db.0902201023/bom.tmn.0902201023.dtb.rejected
<END QUOTE>
(automatically: 0)
(by operator: 0)
Rejects file:
updates/BOM/db/db.0902201023/bom.tmx.0902201023.dtb.rejected
<END QUOTE>
I really thought I was cracking this project. But every time, it ends up worse than before.
OK, let's try and work out the order of events. I'm using getheads to look at metadata
only.
crua6[/cru/cruts/..CLIMAT/db/db.0902201023]
crua6[/cru/cruts/version_3_0/update_top/db]
4848
crua6[/cru/cruts/../CLIMAT/db/db.0902201023]
Looking at the log files for the CLIMAT merging, they give identical stats! what differ
are
crua6[/cru/cruts/version_3_0/update_top/logs/logs.0902201023] diff
merg.climat.tmn.0902201023.log merg.climat.tmx.0902201023.log |more
1,2c1,2
---
281c281
---
287c287
---
<END QUOTE>
..and so on. What's got me stumped is that the headers of both pairs of input databases
crua6[/cru/cruts/version_3_0/update_top/db]
You see? The HANNOVER 1930 date, and the BERLIN-TEMPELHOF 1991 date, are
wrong!! Christ.
That's not even consistent, one's supposedly in the tmin file, the other, the tmax one.
So, an apparently-random pollution of the start dates. And.. FOUND IT! As usual, the
program is
doing exactly what I asked it to do. When I wrote it I simply didn't consider the
possibility
of tmin and tmax needing to sync. So one of the first things it does, when reading in the
exisitng database, is to truncate station data series where whole years are missing values.
And
for HANNOVER, tmax has 1927-1929 missing, but tmin has (some) data in those years.
A-ha!
What to do.. I guess the logical thing to do is to not truncate for tmin and tmax! So I
added a
flag to newmergedbauto, that it passes to the 'getmos' subroutine, that stops it from
replacing
start and end years, and.. it worked!! Hurrah! Or, well.. it ran without giving any errors or
crashing horribly. Yes, that's it. And here are all the 142 files (and directories) it created:
./results/results.0902201545
./results/results.0902201545/conv.mcdw.0902201545.res
./results/results.0902201545/merg.mcdw.tmp.0902201545.res
./results/results.0902201545/merg.mcdw.pre.0902201545.res
./results/results.0902201545/merg.mcdw.vap.0902201545.res
./results/results.0902201545/merg.mcdw.wet.0902201545.res
./results/results.0902201545/merg.mcdw.cld.0902201545.res
./results/results.0902201545/conv.climat.0902201545.res
./results/results.0902201545/merg.climat.tmp.0902201545.res
./results/results.0902201545/merg.climat.vap.0902201545.res
./results/results.0902201545/merg.climat.wet.0902201545.res
./results/results.0902201545/merg.climat.pre.0902201545.res
./results/results.0902201545/merg.climat.cld.0902201545.res
./results/results.0902201545/merg.climat.tmn.0902201545.res
./results/results.0902201545/merg.climat.tmx.0902201545.res
./results/results.0902201545/conv.bom.0902201545.res
./results/results.0902201545/merg.bom.tmn.0902201545.res
./results/results.0902201545/merg.bom.tmx.0902201545.res
./results/results.0902201545/mdtr.0902201545.res
./runs/runs.0902201545
./runs/runs.0902201545/conv.mcdw.0902201545.dat
./runs/runs.0902201545/merg.mcdw.0902201545.dat
./runs/runs.0902201545/conv.climat.0902201545.dat
./runs/runs.0902201545/merg.climat.0902201545.dat
./runs/runs.0902201545/conv.bom.0902201545.dat
./runs/runs.0902201545/merg.bom.0902201545.dat
./runs/runs.0902201545/mdtr.0902201545.dat
./db/tmp/tmp.0902201545.dtb
./db/tmn/tmn.0902201545.dtb
./db/tmx/tmx.0902201545.dtb
./db/dtr/dtr.0902201545.dtb
./db/pre/pre.0902201545.dtb
./db/vap/vap.0902201545.dtb
./db/wet/wet.0902201545.dtb
./db/cld/cld.0902201545.dtb
./updates/BOM/db/db.0902201545
./updates/BOM/db/db.0902201545/bom.tmn.0902201545.dtb
./updates/BOM/db/db.0902201545/bom.tmx.0902201545.dtb
./updates/BOM/db/db.0902201545/int3.tmn.0902201545.dtb
./updates/BOM/db/db.0902201545/bom.tmn.0902201545.dtb.rejected
./updates/BOM/db/db.0902201545/int3.tmx.0902201545.dtb
./updates/BOM/db/db.0902201545/bom.tmx.0902201545.dtb.rejected
./updates/BOM/db/db.0902201545/int3.dtr.0902201545.dtb
./updates/BOM/mergefiles/merg.bom.tmn.0902201545.mat
./updates/BOM/mergefiles/merg.bom.tmn.0902201545.act
./updates/BOM/mergefiles/merg.bom.tmn.0902201545.xrf
./updates/BOM/mergefiles/merg.bom.tmx.0902201545.mat
./updates/BOM/mergefiles/merg.bom.tmx.0902201545.act
./updates/BOM/mergefiles/merg.bom.tmx.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.tmp.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.tmp.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.tmp.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.vap.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.vap.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.vap.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.wet.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.wet.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.wet.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.pre.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.pre.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.pre.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.cld.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.cld.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.cld.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.tmn.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.tmn.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.tmn.0902201545.xrf
./updates/CLIMAT/mergefiles/merg.climat.tmx.0902201545.mat
./updates/CLIMAT/mergefiles/merg.climat.tmx.0902201545.act
./updates/CLIMAT/mergefiles/merg.climat.tmx.0902201545.xrf
./updates/CLIMAT/db/db.0902201545
./updates/CLIMAT/db/db.0902201545/climat.tmp.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.vap.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.wet.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.pre.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.cld.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.tmn.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.tmx.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/int2.tmp.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.tmp.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.vap.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.vap.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.wet.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.wet.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.pre.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.pre.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.cld.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.cld.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.tmn.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.tmn.0902201545.dtb.rejected
./updates/CLIMAT/db/db.0902201545/int2.tmx.0902201545.dtb
./updates/CLIMAT/db/db.0902201545/climat.tmx.0902201545.dtb.rejected
./updates/MCDW/mergefiles/merg.mcdw.tmp.0902201545.mat
./updates/MCDW/mergefiles/merg.mcdw.tmp.0902201545.act
./updates/MCDW/mergefiles/merg.mcdw.tmp.0902201545.xrf
./updates/MCDW/mergefiles/merg.mcdw.pre.0902201545.mat
./updates/MCDW/mergefiles/merg.mcdw.pre.0902201545.act
./updates/MCDW/mergefiles/merg.mcdw.pre.0902201545.xrf
./updates/MCDW/mergefiles/merg.mcdw.vap.0902201545.mat
./updates/MCDW/mergefiles/merg.mcdw.vap.0902201545.act
./updates/MCDW/mergefiles/merg.mcdw.vap.0902201545.xrf
./updates/MCDW/mergefiles/merg.mcdw.wet.0902201545.mat
./updates/MCDW/mergefiles/merg.mcdw.wet.0902201545.act
./updates/MCDW/mergefiles/merg.mcdw.wet.0902201545.xrf
./updates/MCDW/mergefiles/merg.mcdw.cld.0902201545.mat
./updates/MCDW/mergefiles/merg.mcdw.cld.0902201545.act
./updates/MCDW/mergefiles/merg.mcdw.cld.0902201545.xrf
./updates/MCDW/db/db.0902201545
./updates/MCDW/db/db.0902201545/mcdw.tmp.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.vap.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.wet.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.pre.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.sun.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.cld.0902201545.dtb
./updates/MCDW/db/db.0902201545/int1.tmp.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.tmp.0902201545.dtb.rejected
./updates/MCDW/db/db.0902201545/int1.pre.0902201545.dtb
./updates/MCDW/db/db.0902201545/int1.vap.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.vap.0902201545.dtb.rejected
./updates/MCDW/db/db.0902201545/int1.wet.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.wet.0902201545.dtb.rejected
./updates/MCDW/db/db.0902201545/int1.cld.0902201545.dtb
./updates/MCDW/db/db.0902201545/mcdw.cld.0902201545.dtb.rejected
./logs/logs.0902201545
./logs/logs.0902201545/conv.mcdw.0902201545.log
./logs/logs.0902201545/merg.mcdw.tmp.0902201545.log
./logs/logs.0902201545/merg.mcdw.pre.0902201545.log
./logs/logs.0902201545/merg.mcdw.vap.0902201545.log
./logs/logs.0902201545/merg.mcdw.wet.0902201545.log
./logs/logs.0902201545/merg.mcdw.cld.0902201545.log
./logs/logs.0902201545/conv.climat.0902201545.log
./logs/logs.0902201545/merg.climat.tmp.0902201545.log
./logs/logs.0902201545/merg.climat.vap.0902201545.log
./logs/logs.0902201545/merg.climat.wet.0902201545.log
./logs/logs.0902201545/merg.climat.pre.0902201545.log
./logs/logs.0902201545/merg.climat.cld.0902201545.log
./logs/logs.0902201545/merg.climat.tmn.0902201545.log
./logs/logs.0902201545/merg.climat.tmx.0902201545.log
./logs/logs.0902201545/conv.bom.0902201545.log
./logs/logs.0902201545/merg.bom.tmn.0902201545.log
./logs/logs.0902201545/merg.bom.tmx.0902201545.log
./logs/logs.0902201545/mdtr.0902201545.log
crua6[/cru/cruts/version_3_0/update_top]
So, this leaves the new databases in the db/xxx/ directories, and db/latest.versions.dat
telling
us which ones they are. Which should be all the next suite of programs needs to create the
final
Well for this 'half' of the process it's going to be 90% planning and strategy - because
that's
Let's revisit the process list from earlier - just the database-onwards bits and interactivity
removed:
* Produce Primary Parameters (TMP, TMN, TMX, DTR, PRE)
frs_gts_tdm
quick_interp_tdm2
glo2abs
makegrids
vap_gts_anom
anomdtb
quick_interp_tdm2
glo2abs
makegrids
rd0_gts_anom
anomdtb
quick_interp_tdm2
glo2abs
makegrids
movenorms
dtr2cld
quick_interp_tdm2
glo2abs
makegrids
enough to output both .glo and binary gridded files, simultaneously? This would simplify
and speed things up a bit. So, with absolutely no alarm bells ringing at all, I decided
to make a sample run for DTR, just for 2006, to compare simultaneous outputs with the
IDL>
quick_interp_tdm2,2006,2006,'testdtrglo/dtr.',750,gs=0.5,pts_prefix='dtrtxt/dtr.',dumpglo
='dumpglo',dumpbin='dumpbin'
Defaults set
2006
IDL> exit
crua6[/cru/cruts/version_3_0/primaries/dtr] ls -l testdtrglo/
total 43048
crua6[/cru/cruts/version_3_0/primaries/dtr]
So there, as hoped-for, binary and text output files. BUT. Comparisons with earlier
33484
crua6[/cru/cruts/version_3_0/primaries/dtr]
Sample comparison of lines 700-710 from old and new glo files:
crua6[/cru/cruts/version_3_0/primaries/dtr]
They're NOTHING LIKE EACH OTHER. I really do hate this whole project. Ran the
gridder again, just
crua6[/cru/cruts/version_3_0/primaries/dtr]
Different again! Can this just be the random seed used in the gridding algorithm? If so,
why aren't
we seeing a consistent pattern of 0.0 vs non-0.0 values? Another reason - if one were
needed - why
we should dump this gridding approach altogether. But, er, not yet! No time to finish and
test the
fortran gridder, which will doubtless sink to some depth and never be seen again, we'll
carry on
Spent a whole day knocking up an anomaly program - as I felt anomdtb was vastly
overweight and
supremely complicated to compile. Unfortunately, I got stuck trying to work out data and
latlon
factors for different parameters, (argh! why?), and what percentage anomalies really
were, and in
the end GAVE UP and now I have to modify anomdtb after all. Actually - that looked
even worse, so
went back to anomauto and finished it off. And.. it works. Actually, a bit too well. For
example,
when deriving anomalies from the CLD database, this was the original (a few weeks
ago!):
uealogin1[/cru/cruts/version_3_0/update_top] wc -l cld.2000.11.txt
606 cld.2000.11.txt
..and this is the new one, from the same source database of course:
uealogin1[/cru/cruts/version_3_0/update_top] wc -l
interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1282 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
..so, um - more than twice as many got through! Erk. Screening not tough enough!
Results also not
OLD:
NEW:
uealogin1[/cru/cruts/version_3_0/update_top/interim_data/anoms/anoms.0902201545/cld
] head -10 cld.2000.11.txt
and the actual Nov 2000 value for this station (KARESUANDO, SWEDEN) is 987:
2000 887 800 900-9999-9999 812 762 825 625 825 987-9999
OK. So we read in 987. Then we multiply by the factor, which should be 0.1, giving us
98.7.
Then we subtract the mean, giving us 98.7-90.14 = 8.56, which is what we're getting. So
no
mismatches between data, time, and metadata. Good. and the 95/02 mean is right, too
(90.1375).
So, er. AH! solved it. Looking at the wrong 'old' cloud text files. tadaa:
Hurrah. Now I need to know why I'm producing too many. It's not as bad, though:
crua6[/cru/cruts/version_3_0/update_top] wc -l
../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt
760 ../secondaries/cld/cldupdatetxt/cldupdate.2000.11.txt
NEW:
uealogin1[/cru/cruts/version_3_0/update_top] wc -l
interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1282 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
Let's look at the first example, a station we let through that anomdtb kicked back:
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1995-9999-9999-9999-9999-9999-9999-9999-9999-9999 875-9999-9999
1996-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1997-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1998-9999-9999-9999-9999 575 675 762 762 675 837 775-9999
1999 1000 812 762 750 550 750 862 775 637 825 1000-9999
2000 1000 912 800 750 812 850 737 825 700 737 862-9999
2001 875 750 475 650 775 775 825 825 750 900 1000-9999
2002 800 862 750 737 612 612-9999 562 800 462 762-9999
2003 850 825 862 550 712-9999 525 775 762 750 825-9999
2004 937 875 762 525 637 725 787 675 837 750 1000-9999
2005 1000 812 762 700 737 775 687 800 850 850-9999-9999
2007 1000 712 750 837 762 687 675 812 850 975 950-9999
2008 1000 887 687-9999 750 775 675 612 725 887-9999-9999
Now, our limit for a valid normal is 75%, which for 1995-2002 should mean 6.
BODO VI has five valid values in November. So our limit is either wrong, or not being
applied.
..yup:
uealogin1[/cru/cruts/version_3_0/update_top] ./anomauto
minn calculated as 7
Ho hum. Recalculated it to 6 (whilst checking that 1961-1990 still gave 23). Re-ran.
To my horror - if not surprise - that let EVEN MORE IN! Well of course it did you silly
sausage.
uealogin1[/cru/cruts/version_3_0/update_top] wc -l
interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
1404 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
Aha. I wonder if I'm initialising the onestn() array in the wrong place? Because data is
only added if not -9999, so it has to be prefilled with -9999 *every time*.. dammit. If
uealogin1[/cru/cruts/version_3_0/update_top] wc -l
interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
746 interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
OLD RELIABLE:
NEW LATEST:
uealogin1[/cru/cruts/version_3_0/update_top] head
interim_data/anoms/anoms.0902201545/cld/cld.2000.11.txt
It's not going to be easy to find 14 missing stations, is it? Since the anomalies aren't
exactly the same.
Should I be worried about 14 lost series? Less than 2%. Actually, I noticed something
interesting.. look
at the anomalies. The anomdtb ones aren't *rounded* to 1dp, they're *truncated*! So, er -
wrong!
So let's say, anomalies are done. Hurrah. Onwards, plenty more to do!
start and end months, otherwise it just produces whole years with files of zeros for
months with
no anomaly file. And errors. And since this is likely to be a six-month update..
Re-planned the program layout. Not a major exercise, just putting different loops in to
speed up and
2. Update Databases
2.1 Convert any MCDW bulletins to CRU format; merge into existing databases
2.2 Convert any CLIMAT bulletins to CRU format; merge into databases from 2.1
2.3 Convert any BOM bulletins to CRU format; merge into databases from 2.2
3. Update datasets
1876 lines including subrotuines and notes. Ten Fortran and four IDL programs (plus
indirect ones). All
So, to station counts. These will have to mirror section 3 above. Coverage of secondary
parameters is
particularly difficult - what is the best approach? To include synthetic coverage, when it's
only at
2.5-degree?
No. I'm going to back my previous decision - all station count files reflect actualy obs for
that
parameter only. So for secondaries, you get actual obs of that parameter (ie naff all for
FRS). You
get the info about synthetics that enables you to use the relevant primary counts if you
want to. Of
course, I'm going to have to provide a combined TMP and DTR station count to satisfy
VAP & FRS users.
The problem is that the synthetics are incorporated at 2.5-degrees, NO IDEA why, so
saying they affect
particular 0.5-degree cells is harder than it should be. So we'll just gloss over that entirely
;0)
ARGH. Just went back to check on synthetic production. Apparently - I have no memory
of this at all -
we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to
need
conditionals in the update program to handle that. And separate gridding before 1989.
And what TF
OH FUCK THIS. It's Sunday evening, I've worked all weekend, and just when I thought
it was done I'm
hitting yet another problem that's based on the hopeless state of our databases. There is
no uniform
data integrity, it's just a catalogue of issues that continues to grow as they're found.
rd0_gts_anom_05 will produce half-degree .glo files from gridded pre anoms. So if we
call that, we
can use it, and stncounts for PRE will be authentic (as it's the sole input). Final decision:
coded
update.for to produce WET from obs+syn until 12/1989, syn only thereafter. WET station
counts only
produced until 1989, PRE must be used (with caveats) after that point.
Wrote tmpdtrstnsauto.for to produce tmp.and.dtr station counts (ie you only get a count
when both
parameters have a count, and even then it's the min()). The resulting counts are the
effective FRS
counts, and the synthetic VAP counts.
Onto PET. Tracked down the PET program from Dimitrios, way back in 2007! It uses
TMP, TMN, TMX, VAP,
CLD and WND (the latter as 61-90 normals from IPCC). Converted to f77 'automatic'
(makepetauto.for).
Discovered that WMO codes are still a pain in the arse. And that I'd forgotten to match
Australian
updates by BOM code (last field in header) instead of WMO code - so I had to modify
newmergedbauto.
Also found that running fixwmos.for was less than successful on VAP, because it's
already screwed:
1001000 7093 -866 9 JAN MAYEN(NOR NAVY) NORWAY 1971 2003 -999
-999
uealogin1[/cru/cruts/version_3_0/update_top/db/vap]
diverted back onto getting the whole update process compiled and running end to end.
Almost
immediately found that match rated in the merging were mixed. Added a section to
newmergedbauto
that did a quick matchmaking exercise on any update stations that failed the code
matching. Just
lat/lon and character fields really. Didn't seem to make a lot of difference. Here are the
merge
results for all updates and parameters, in the order they would have happened:
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.mcdw.tmp.0903091631.log
OUTPUT(S) WRITTEN
(automatically: 1759)
(by operator: 0)
> Rejected: 0
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.mcdw.pre.0903091631.log
OUTPUT(S) WRITTEN
(automatically: 2783)
(by operator: 0)
> Rejected: 0
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.mcdw.vap.0903091631.log
(automatically: 2677)
(by operator: 0)
> Rejected: 3
Rejects file:
updates/MCDW/db/db.0903091631/mcdw.vap.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.mcdw.wet.0903091631.log
(automatically: 2634)
(by operator: 0)
> Rejected: 4
Rejects file:
updates/MCDW/db/db.0903091631/mcdw.wet.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.mcdw.cld.0903091631.log
(automatically: 2199)
(by operator: 0)
> Rejected: 5
Rejects file:
updates/MCDW/db/db.0903091631/mcdw.cld.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.tmp.0903091631.log
New master database: updates/CLIMAT/db/db.0903091631/int2.tmp.0903091631.dtb
(automatically: 2629)
(by operator: 0)
> Rejected: 91
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.tmp.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.vap.0903091631.log
(automatically: 2912)
(by operator: 0)
> Rejected: 89
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.vap.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.wet.0903091631.log
New master database: updates/CLIMAT/db/db.0903091631/int2.wet.0903091631.dtb
(automatically: 2718)
(by operator: 0)
> Rejected: 97
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.wet.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.pre.0903091631.log
(automatically: 2801)
(by operator: 0)
> Rejected: 24
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.pre.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.cld.0903091631.log
(automatically: 1964)
(by operator: 0)
> Rejected: 3
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.cld.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.tmn.0903091631.log
(automatically: 2406)
(by operator: 0)
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.tmn.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.climat.tmx.0903091631.log
(automatically: 2406)
(by operator: 0)
Rejects file:
updates/CLIMAT/db/db.0903091631/climat.tmx.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.bom.tmn.0903091631.log
(automatically: 783)
(by operator: 0)
> Rejected: 3
Rejects file:
updates/BOM/db/db.0903091631/bom.tmn.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631] tail
merg.bom.tmx.0903091631.log
(automatically: 783)
(by operator: 0)
> Rejected: 3
Rejects file:
updates/BOM/db/db.0903091631/bom.tmx.0903091631.dtb.rejected
uealogin1[/cru/cruts/version_3_0/update_top/logs/logs.0903091631]
Probably the worst story is temperature, particularly for MCDW. Over 1000 new
stations! Highly
unlikely. I am tempted to blame the different lat/lon scale, but for now it will have to rest.
Still hitting the problem with TMP lats and lons being a mix of deg*10 and deg*100, it's
screwing
up the station counts work (of course). Unfortunately, I did some tests and the 'original'
TMP
database has the trouble, it's not my update suite :-(((
Then.. I worked it out. Sample headers from the 'original' TMP db tmp.0705101334.dtb:
10010 709 -87 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
10080 783 155 28 Svalbard Lufthavn NORWAY 1911 2006 341911 -999.00
10260 697 189 100 Tromsoe NORWAY 1890 2006 341890 -999.00
0100100 7090 -870 10 Jan Mayen NORWAY 1921 2006 341921 -999.00
0102600 6970 1890 100 Tromsoe NORWAY 1890 2006 341890 -999.00
locfac = 10 ! init location factor assuming lat & lon are in degs*10
do i=1,10000
read(10,'(a86)',end=12)buffy
read(buffy,fmt)wmo,lat,lon,alt,sname,ctry,sy,ey,flag,extref
if (lat.gt.900) goto 12
do j=sy,ey+norml
read(10,'()')
enddo
enddo
So it was written with TMP in mind! Oh, for a memory. So we don't need to fret about
TMP
And it's taken me until NOW to realise that the IDL synthetic generators (vap_gts_anom,
frs_gts_tdm, rd0_gts_anom) all need to calculate 1961-1990 normals! So they will need
TMP, DTR and/or PRE binary normals for 1961 to 1990. Which means anomalies will
have to
be automatically generated for that period regardless of the requested period!!! *Cries*
Introduced suitable conditionals to ensure that 61-90 anomalies and gridded binaries are
(this only appears once the vap_gts_anom.pro program has finished, so can't be
identified)
Stuck on WET production, getting an error from rd0_gts_anom_05.pro, from the same bit
of
% $MAIN$
Line 34:
rd0syn=float(prenorm)*0.0 & rd0syn(nsea)=-9999
Do you know, I actually worked this one out by myself. Preen. It turned out that nsea was
-1,
When I looked - the min and max of rd0norm and prenorm were:
-32768 32514
..and I thought, what a coincidence, that's 2^16. Aha! Must be an Endian problem.
Looked it up
on the web, abd the IDL ref manual, and found that adding:
,/swap_if_big_endian
..to the end of the openr statements in rdbin.pro, it all worked! :-)))
and then, of course, another problem I should have anticipated: half-degree gridding of
synthetics
needs half-degree primary binaries. So the precip binaries must be half-degree for WET
(after 1989)
and the usual 2.5-degrees earlier. More modifications to update.for!! And it took me a
further 24
hours to cotton on that I'd need half-degree TMP and DTR binaries for FRS. VAP won't
mind as it's
using the synthetics as an adjunct to the observations - the exceptions are those
secondaries where
no observations can be used. WET after 1989, FRS, and CLD after 2002 (but CLD
considerately works
So I'm going to have to produce half-degree gridded binary TMP and DTR anomalies,
adding HALF AN
HOUR to the run time. Bollocks. Though I could be clever and save it.. then I'd have to
monitor
Got that done. Then got it all working (though outputs not tested). Whooo. Now for
BADC.
...
Actually, BADC wasn't too bad. Took a day or so to get everything to compile, mainly
having to
shift to gfortran rather than f77, and also to use -w to suppress warnings. Discovered that
the IDL there didn't look at IDL_STARTUP, bah, but then found a way to specify a
startup file
..so that's all right then. Got it all running without errors at BADC. Well, I say that, I'm
Anyway the next items are the tricky saving of 2.5 and 0.5 binaries for 1961-1990, only
regenerating then if the dbs have been altered. Requires multi-process cooperation, since
we
can't tell from the database timestamps which years were potentially changed.
Admittedly, with
this system that only accepts MCDW/CLIMAT/BOM updates, a pre-1991 change is all
but impossible,
but build for the case you can't anticipate..! Also up next is the deconstruction of the early
cloud data (ie to 2002) so we can generate NetCDF files for the whole shebang.
degroupcld.for
then modified update (extensively) to skip anything to do with CLD (including station
counts)
before 2003. Then, at the anoms-to-absolutes stage, unzipped and copied over any pre-
2003
I suppose I'll have to do CLD station counts (just 'n' obviously) at some stage, too.
Ran update, just for CLD, just for 1901-06/2006. Realised halfway through that I'd really
have
to do station counts as well because update does 'em for DTR anyway! That ought to cut
out but
It's getting faster.. implementing the 'saved binaries' was easier than I thought as well.
Lots
to change but straightforward. Now the IDL synthetics generators will always look in the
reference area for 1961-1990 gridded binaries, whether 2.5-degree or 0.5-degree. And
those
datasets *should* be regenerated if flags are set that 1961-1990 data has been changed in
the
databases.
Then, a big problem. Lots of stars ('*********') in the PET gridded absolutes. Wrote
sidebyside.m
to display the five input parameters; VAP looks like being the culprit, with unfeasibly
large
values (up to 10034 in fact). And that's after the standard /10. So, erm.. a drains-up on
VAP
is now required. Oh, joy. And CLD also looks unacceptable, despite all that work - big
patches
Reassuringly, the 3_00 VAP and CLD that are published look fine, so it's soemthing I've
done in
Started chaining back through (initially) VAP. The gabs files were identical to the finals
(now,
if that had failed it would have been a problem!). The gridded anomaly files were a lot
more
interesting, because although they looked just as bad, their max values were exactly 9999.
That
ain't no coincidence!
Trailing fiurther back.. VAP anoms are OK, so suspicion falls on the synthetics. And lo
and behold,
re-running the TMP and DTR 2.5-grid binary productions with quick_interp_tdm2 gives:
IDL>
quick_interp_tdm2,2006,2006,'interim_data/gbins/gbins.0903201540/tmp/tmp.',1200,gs=
2.5,pts_prefix='interim_data/anoms/anoms.0903201540/tmp/tmp.',dumpbin='dumpbin',st
artm=07,endm=12
Defaults set
2006
IDL> quick_interp_tdm2,2006,2006,'interim_data/gbins/gbins.0903201540/dtr/dtr.',
750,gs=2.5,pts_prefix='interim_data/anoms/anoms.0903201540/dtr/dtr.',dumpbin='dump
bin',startm=07,endm=12
Defaults set
2006
IDL>
Those strings of numbers? They're supposed to be mean, average magnitude, and std dev!
Should
IDL>
quick_interp_tdm2,2006,2006,'testdtrglo/dtr.',750,gs=0.5,pts_prefix='dtrtxt/dtr.',dumpglo
='dumpglo',dumpbin='dumpbin'
Defaults set
2006
2006
..confirming that the DTR (in this case) incoming anomalies are all within expected
tolerances.
Ooh! Just found this a few thousand lines back, which may be relevant:
<QUOTE>
On a parallel track (this would really have been better as a blog), Tim O has found that
the binary
'binfac' set to 10 for TMP and DTR. This may explain the poor performance and
coverage of VAP in
particular.
<END_QUOTE>
Did that help? Not much at all, unfortunately. This is frustrating, I can't see what's
different. I
even enabled a commented-out line that prints the ranges of pts2(*,2) and r, and they look
OK:
IDL> quick_interp_tdm2,2006,2006,'interim_data/gbins/gbins.0903201540/dtr/dtr.',
750,gs=2.5,pts_prefix='interim_data/anoms/anoms.0903201540/dtr/dtr.',dumpbin='dump
bin',startm=07,endm=12,info='info'
2006
-10.4367 17.5867
-7.15403 14.5088
-7.00800 25.6929
-4.89867 15.7781
-18.4621 26.2400
-15.1905 22.7162
-8.61333 18.5400
-6.05684 15.7678
-6.91852 33.3200
-5.10848 28.5915
-6.03609 21.9419
IDL>
OK, I *think* I've got it. It's the fact that we're writing a yearly binary file but only have
data
..but I do see how vap_gts_anom might just read the *first* six months, which would all
be -9999.
So, we need to be able to write six-month binaries. Oh, my giddy aunt. What a crap crap
system. We'll
have to switch to monthly binaries, it's the only unambiguous way. Meaning major
modifications to
numerous IDL proglets. Fuck. Everything from the main progs (vap_gts_anom,
quick_interp_tdm2, etc) to
constructed with binfac=10, because otherwise the integer bit renders most anomalies 0.
Then, in
the vap_gts_anom script, the values have to be divided by 100, to give degrees. any other
combination
of scaling factors throws the sat vap pressure calculations into the weeds. Of course,
monthly binaries
will see that binaries are saved monthly, not yearly. Of course, the 2.5-degree TMP and
DTR binary
So, rather than carry on with mods, I thought I'd mod update enough to fix VAP, then run
it all again.
Well, it ran. Until PET production, where it crashed with the same (understandable) read
error as
before ('********' not being an integer). However, when I invoked the Matlab sidebyside
proglet to
examine the VAP, it was much improved on the previous VAP. The max was still 10000,
just a shade too
high, but the actual spatial pollution was much reduced. There's hope! I think this all
stems from
the sensitivity of the saturated vapour pressure calculations, where a factor of 10 error in
an
Had to briefly divert to trick makegridsauto into thinking it was in the middle of a full
1901-2006
update, to get CLD NetCDF files produced for the whole period to June '06. Kept some
important users
in Bristol happy.
So, back to VAP. Tried dividing the incoming TMP 7 DTR binaries by 1000! Still no joy.
Then had the
bright idea of imposing a threshold on the 3.00 vap in the Matlab program. The result was
that
quite a lot of data was lost from 3.00, but what remained was a very good match for the
2.10 data
I think I've got it! Hey - I might be home by 11. I got quick_interp_tdm2 to dump a
min/max
for the synthetic grids. Guess what? Our old friend 32767 is here again, otherwise known
as big-endian
trauma. And sure enough, the 0.5 and 2.5 binary normals (which I inherited, I've never
produced them),
openr,lun,fname,/swap_if_big_endian
..so I added that as an argument to rdbin, and used it wherever rdbin is called to open
these normals.
So, I went through all the IDL routines. I added an integer-to-float conversion on all
binary reads,
and generally spruced things up. Also went through the parameters one by one and fixed
(hopefully)
The PET problem, or unwriteable numbers, was solved by this tightening of secondaries,
particularly
VAP, and also putting in a clause to abs() any negative values from the wind climatology.
I really
Finally I'm able to get a run of all ten parameters. The results, compared to 2.10 with
sidebyside3col.m,
are pretty good on the whole. Not really happy with FRS (range OK but mysterious
banding in Southern
Hemisphere), or PET:
pet
range210 = 0 573
range300 = 0 17.5000
So I've ended up with a range that doesn't scale simply to the 2.10 range. I also have no
idea what
the actual range ought to be. And they said PET would be easy. Next step has to be a
comparison of
max/min values of PET precursors vs. PET actuals for the two sources. Did that. No
significant
differences, except that of course the 2.10 PET was produced with uncorrected wind.
When I took
out the correction for 3.00, it shot up to even higher levels, so we'll just have to ignore
2.10
Still, a top whack of 17.5 isn't too good for PET. Printed out the ranges of the precursors:
tm -49.40 39.20
tn -52.80 39.50
tx -45.10 59.80
vp 0.00 36.60
wn 0.00 29.00
cl 0.00 1.00
So the temps are in degs C, vapour pressure's in hPa, wind's in m/s and cloud's fractional.
Then I thought about it. 17.5mm/day is pretty good - especially as it looks to be Eastern
Sahara.
As for FRS.. with those odd longitudinal stripes - I just tidied the IDL prog up and it, er..
Did a complete run for 7/06 to 12/06, ran the Matlab visuals, all params looked OK (if
not special).
FTP'd the program suite and reference tree to BADC, replacing the existing ones, and
tried the
Well the first thing I noticed was how slow it was! Ooops. Maybe 3x slower than
uealogin1. Then,
lots of error messages (see below). I had wondered whether the big endian scene was
going to show,
<QUOTE>
date25: 0903270742
date05: 0903270742
last6190: 0901010001
Producing anomalies
Deriving PET
see: logs/completion/infolog.0904010108.dat
and: logs/logs.0904010108/update.0904010108.log
-bash-3.00$
<END_QUOTE>
Pulled back the output files and ran the sidebyside3col Matlab script to compare
tmp: BADC 300 m/m: -49.4 39.2, CRU 300 m/m: -49.4 39.2
tmn: BADC 300 m/m: -52.8 39.5, CRU 300 m/m: -52.8 39.5
tmx: BADC 300 m/m: -45.1 59.8, CRU 300 m/m: -45.1 59.8
I don't know which is more worrying - the VAP discrepancy or the fact that the
minimum DTR is 1 degree (for both!), the maximum BADC CLD is 99.9%, and the
maximum CRU WET is 30.95 days! Well I guess the VAP issue is the show-stopper, and
Now, these are IDL errors, and probably from our old pal vap_gts_anom_m.pro. So,
the established procedure is to re-run just that program, with all the info
IDL>
vap_gts_anom_m,2006,2006,dtr_prefix='interim_data/gbins/gbins.0904010108/dtr/dtr.',t
mp_prefix='interim_data/gbins/gbins.0904010108/tmp/tmp.',outprefix='interim_data/syns
/syns.0904010108/vap/vap.syn.',dumpbin=1,startm=07,endm=12
IDL>
Yes, it's back. Right back where we started with VAP at CRU, all those, er, days ago.
Well last time it was big endian stuff, wasn't it? And presumably the little Linux
box at BADC is big endian. So I might try changing those rdbin calls, just to see..
..that didn't seem to help. Here's a dump of key array ranges, just before the main
So tmpgrd and dtrgrd look waaay too high, though could just be *100. v and vapsyn are
shot.
This does look like scaling. Boo hoo. I *fixed* that!! These are the ranges on
UEALOGIN1:
Since normals are reading OK without it (CRU version has bigend=1, BADC version
doesn't). Let's
IDL>
So, just tadj to 'fix', then? Though surely I should read the 2006 tmp & dtr the same way.
Or is it that I copied the 61-90 over from here, but generated the 2006 there. Ah. Should
probably regenerate the 61-90 binaries at BADC? Yes. Anyway, found the 'other' tmp/dtr
The CRU version has the same ranges, but some month stats differ:
December in particular has quite a drift! No idea why, since the data going in
So, another full run, with regeneration of binary reference grids enforced:
tmp: BADC 300 m/m: -49.4 39.2, CRU 300 m/m: -49.4 39.2
tmn: BADC 300 m/m: -52.8 39.5, CRU 300 m/m: -52.8 39.5
tmx: BADC 300 m/m: -45.1 59.8, CRU 300 m/m: -45.1 59.8
I honestly don't think it'll get closer. So, I guess I'll clear out and reset
Well, BADC have had it for a good while, without actually doing anything. what
a surprise. It's lucky actually, as I've ironed out a few bugs (including PET
being garbage). One bug is eluding me, however - I can't get a full 1901-2008
run to complete! It gets stuck after producing the final TMP files (data plus
1901-2008 failed
1901-1910 worked
1901-1950 worked
1951-2008 worked
1901-2008 failed
**sigh** WHAT THE HELL'S GOING ON?! Well, time to ask the compiler. So I
recompiled
as follows:
Producing anomalies
Hurrah! In a way.. thyat bug was easy enough, I'd just forgotten to put an extra
test (ipar.le.5) in the test for binary production, so as it was in a 1..8 loop,
there was bound (ho ho) to be trouble. There was a second, identical, instance.
date25: 0903270742
date05: 0903270742
last6190: 0901010001
Producing anomalies
Deriving PET
see: logs/completion/infolog.0905070939.dat
and: logs/logs.0905070939/update.0905070939.log
uealogin1[/esdata/cru/f098/update_top]
..and in terms of disk usage (um, remember it's not *that* reliable):
uealogin1[/esdata/cru/f098/update_top] du -ks *
64 anomauto
32 batchdel
64 bom2cruauto
64 climat2cruauto
32 compile_all
629856 db
32 dtr2cldauto
64 glo2absauto
16108896 gridded_finals
13822176 interim_data
18368 logs
416 makegridsauto
64 makepetauto
64 mcdw2cruauto
32 movenormsauto
32 newdata.latest.date
288 newmergedbauto
2368 programs
1101088 reference
2848 results
3008 runs
32 saved_timings_090420_1716
64 stncountsauto
64 stncountsauto_safe
704 timings
32 tmnx2dtrauto
32 tmpdtrstnsauto
352 update
638432 updates
uealogin1[/esdata/cru/f098/update_top]
Meaning that a complete 1901-2008 run will need about 14gb of working data and the
Then, of course (or 'at last', depending on your perspective), Tim O had a look at the
data with that analytical brain thingy he's got. Oooops. Lots of wild values, even for
TMP and PRE - and that's compared to the previous output!! Yes, this is comparing the
automated 1901-2008 files with the 1901-June2006 files, not with CRu TS 2.1. So, you
First investigation was WET, where variance was far too low - usually indicative of a
scaling issue, and thus it was. Despite having had a drains-up on scaling, WET seems
to have escaped completely. The initial gridding (to binary) outputs at x10, which is
absolutely fine. But the PRE-to-WET converters are not so simple. The 2.5-degree
The trouble is, when written to binary, these will be rounded to integer and a degree
of accuracy will be lost. They should be x10. Then there's the 0.5-degree converter
These are basically 1000 times too small!!! How did this happen when I specifically
Aha. Not so silly. The 0.5 grids are saved as .glo files (because after 1989 it's all
synthetic). So they're not rounded. On the other hand, they are still 100x too low for
percentage anomalies. and the 2.5 grids are sent to the gridder as 'synthfac=100'!!
0.5-degree PRE/WET path is at x10 until the production of the synthetic WET, at which
point it has to be x1 to line up with the pre-1990 output from the gridder (the gridder
outputs .glo files as x1 only, we haven't used the 'actfac' parameter yet and we're
Got all that fixed. Then onto the excessions Tim found - quite a lot that really should
have triggered the 3/4 sd cutoff in anomauto.for. Wrote 'retrace.for', a proglet I've
been looking for an excuse to write. It takes a country or individual cell, along with
dates and a run ID, and preforms a reverse trace from final output files to database. It's
not complete yet but it already gives extremely helpful information - I was able to look
at the first problem (Guatemala in Autumn 1995 has a massive spike) and find that a
station in Mexico has a temperature of 78 degrees in November 1995! This gave a local
anomaly of 53.23 (which would have been 'lost' amongst the rest of Mexico as Tim just
did country averages) and an anomaly in Guatemala of 24.08 (which gave us the spike):
1996 219 232 235 256 285 276 280 226 285 260 247 235
Now, this is a clear indication that the standard deviation limits are not being applied.
Which is extremely bad news. So I had a drains-up on anomauto.for.. and.. yup, my awful
programming strikes again. Because I copied the anomdtb.f90 process, I failed to notice
an extra section where the limit was applied to the whole station - I was only applying
it to the normals period (1961-90)! So I fixed that and re-ran. Here are the before and
1995 11 3.23 -0.57 238 148 2232 53.23 217 172 2244
1995 11 22.39 12.80 243 148 7227000 78.00 217 172 7674100
1995 11 0.73 -0.57 238 148 2227 1.82 227 148 2231
1995 11 22.39 12.80 243 148 7227000 78.00 217 172 7674100
The column to be looking at is this one ---------------------^
Row 3 Climatology
Row 5 Anomalies
Col 3 Mean
Row 5:
Cols 4-7 Min with cell indices and line # in anoms file
Cols 8-11 Max with cell indices and line # in anoms file
Row 6:
Cols 4-7 Min with cell indices and WMO code in database
Cols 8-11 Max with cell indices and WMO code in database
In this case, the erroneous value of 78 degrees has been counted in the earlier run,
giving an anomaly of 53.23. In the later run, it hasn't - the anomaly of 1.82 is from a
So, re-running improved matters. The extremes have vanished. But the means are still
out,
sometimes significantly.
I took the twelve 1990 anomaly files from the original 1901-2006 run (that was done with
/cru/cruts/version_3_0/primaries/tmp/tmptxt/*1990*
Then I modified the update 'latest databases' file to say that tmp.0705101334.dtb was the
current database, and made a limited run of the update program for tmp only, killing it
once it had produced the anomaly files. The run was #0908181048.
'manual' directory and an 'automatic' directory, each with twelve 1990 anomaly files. And
how do they compare? NOT AT ALL!!!!!!!!!
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre/custom_anom_comparisons] head
manual/tmp.1990.01.txt
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre/custom_anom_comparisons] head
automatic/tmp.1990.01.txt
The numbers of values in each pair are not always identical, but are within 2 or 3 so that's
There are a number of things going on. The lats and lons are the same, just scaled
(because
originally the TMP coordinates were x10 not x100). We can ignore that problem. The
real
problem is the completely different results from the automated system - I don't
understand
this because I painstakingly chcked the anomauto.for file to ensure it was doing the right
job!! The overall pattern of anomalies is roughly the same - it's just that the actual values
Got anomauto to dump the first month of the first station (10010). The clue to the
problem is
in the first lines - we're only getting the full-length mean (used for SD calculations) and
WMO = 10010, im = 01
d( 1,0490,01) = 2.98
Aaaaand.. FOUND IT! What happened was this: in the original anomdtb.f90 program,
there's a
test for existing normals (in the header of each station). If they are present, then SD is
calculated (to allow excessions to be screened out). If not, SD is calculated and then
used to screen excessions, then a 61-90 normal is built provided there are enough values
after the screening. However, in my version, I followed the same process - but crucially,
I wasn't using the same variable to store the existing normals and the calculated ones!!
So we were ending up with the 'full length' normal (n=86 in the above example) instead.
We then get:
onestn(0490,01) = -1.50
d( 1,0490,01) = 4.20
Which is what we want. So, a complete re-run (just tmp) for 1990, still using the old db.
Tadaa:
uealogin1[/cru/cruts/version_3_0/fixing_tmp_and_pre/custom_anom_comparisons/new_
automatic] head tmp.1990.01.txt
Mostly the same, but one noticeable exception is the hot 2003 JJA in Europe - it's much
less
extreme in the automated version. So I ran with the original database again. Thought I'd
1259 tmp.2003.06.txt
1216 tmp.2003.07.txt
1223 tmp.2003.08.txt
For 0909041051 (fixed anomauto and original June 2006 db, tmp.0705101334.dtb):
For 0909021348 (the 'fixed' anomauto and the latest db, tmp.0904151410.dtb):
1228 tmp.2003.08.txt (+ 5)
2003 6 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
2003 6 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre] cat
retrace.France.0909041051.tmp.2003.07.stat
2003 7 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
2003 7 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre] cat
retrace.France.0909041051.tmp.2003.08.stat
2003 8 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
2003 8 -999.00 -999.00 -999 -999 -999 -999.00 -999 -999 -999
2003 7 20.70 9.20 272 375 6717000 26.40 266 379 7790000
2003 8 23.15 11.80 272 375 6717000 28.20 266 379 7790000
Well the differences certainly show up! And it looks like a database change. So.. I
guess I need to look at changes in French stations. Argh. And that 'argh' was prescient,
since, when I ran getcountry to extract the French stations from each database, I found:
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre] ./getcountry
Somehow, I've added 71 new French stations?! Surely I'd remember that. Especially
as they'd have had to have arrived with the MCDW/CLIMAT bulletins. Sizes:
crua6[/cru/cruts/version_3_0/fixing_tmp_and_pre] wc -l *FRANCE
2725 tmp.0705101334.dtb.FRANCE
3700 tmp.0904151410.dtb.FRANCE
That's not so bad. Well the ratio's improved. Could be a lot of unmatched incoming
stations?
Oh, ****. It's the bloody WMO codes again. **** these bloody non-standard,
ambiguous,
First example, the beautiful city of Lille. Here are the appropriate headers:
tmp.0705101334.dtb.FRANCE:
70150 506 31 47 LILLE FRANCE 1851 2006 101851 -999.00
tmp.0904151410.dtb.FRANCE:
So.. just what I was secretly hoping for (not!) - a drains-up on the CLIMAT and MCDW
cat
/cru/cruts/version_3_0/update_top/runs/runs.0904151410/merg.mcdw.0904151410.dat
db/cld/cld.0904021239.dtb
updates/MCDW/db/db.0904151410/mcdw.cld.0904151410.dtb
updates/MCDW/db/db.0904151410/int1.cld.0904151410.dtb
blind
..is for cld, but indicates that the input database was tmp.0904021239.dtb.
The MCDW database was mcdw.tmp.0904151410.dtb.
tmp.0904021239.dtb:
mcdw.tmp.0904151410.dtb:
tmp..dtb:
I'll bet this just updated the 'false' LILLE with another month or something. In fact:
conv.mcdw.0904151410.dat:
1 2009 1 2009
Before (tmp.0904021239.dtb):
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2009 7 37-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
After (tmp.0904151410.dtb):
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2009 7 37-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
The data is i-dentical. But ooh, lookit the WMO codes! Oh, Lordy! The update has
changed
Let's look at all available headers for this LILLE station (ie the short modern one).
Aha - it's not the same station! ;0) An input has updated the details, istr that can happen.
Well that's no help. Why did it change? CLIMAT? But CLIMAT doesn't have station
names!
MCDW is first:
Then CLIMAT:
Interestingly, we only seem to have the last three tmp databases, at least in terms of
This is so hard because I cannot remember the process. Have to dig some more..
OK, I think the key update is 0903091631. Here's the CLIMAT runfile:
conv.climat.0903091631.dat:
1 2000 12 2008
9 1994 11 2008
Still not there. One issue is that for some reason I didn't give the merg runfiles
individual names for each parameter! So I might mod the update program to do that.
The problem with re-running all updates, of course, is that I also fixed WMO codes. And,
by Tim. Oh bugger.
Well, WMO code fixing is identifiable because you get a log file, ie, here's the tmp dir:
tmp.0705101334.dtb had its WMO codes fixed and became tmp.0903081416.dtb, which
has the
Unfortunately, only tmp and pre have such log files. Here's the one for pre:
cld.0902101409.dtb
dtr.0708081052.dtb
pre.0903051740.dtb
tmn.0708071548.dtb
tmx.0708071548.dtb
tmp.0903081416.dtb
vap.0804231150.dtb
wet.0710161148.dtb
Well it's worth a try. Actually, let's compare those eight databases - assuming we can find
Oh, boy:
So, cld already has the problem, and it's the earliest version in the archive. Also vap.
Well, looking back (er, up ^) we know what happened to cld - it was updated with
newmergedb
updated first with derived-cloud data from MCDW (1994-2008), then with
'Discovered that WMO codes are still a pain in the arse. And that I'd forgotten to match
Australian
updates by BOM code (last field in header) instead of WMO code - so I had to modify
newmergedbauto.
Also found that running fixwmos.for was less than successful on VAP, because it's
already screwed:
Ulp!
I am seriously close to giving up, again. The history of this is so complex that I can't get
far enough
into it before by head hurts and I have to stop. Each parameter has a tortuous history of
manual and
semi-automated interventions that I simply cannot just go back to early versions and run
the update prog.
I could be throwing away all kinds of corrections - to lat/lons, to WMOs (yes!), and
more.
So what the hell can I do about all these duplicate stations? Well, how about
fixdupes.for? That would
be perfect - except that I never finished it, I was diverted off to fight some other fire.
Aarrgghhh.
What about the ones I used for the CRUTEM3 work with Phil Brohan? Can't find the
bugger!! Looked everywhere,
Matlab scripts aplenty but not the one that produced the plots I used in my CRU
presentation in 2005. Oh,
FUCK IT. Sorry. I will have to WRITE a program to find potential duplicates. It can
show me pairs of headers,
and correlations between the data, and I can say 'yay' or 'nay'. There is the finddupes.for
program, though
c Further post-processing of the duplicates file - just to show how crap the
c program that produced it was! Well - not so much that but that once it was
c running, it took 2 days to finish so I couldn't really reset it to improve
c (1) Removes and squirrels away all segments where dates don't match;
c (4) Sorts based on total segment length for each station pair'
You see how messy it gets when you actually examine the problem?
This time around, (dedupedb.for), I took as simple an approach as possible - and almost
immediately hit a
problem that's generic but which doesn't seem to get much attention: what's the minimum
n for a reliable
standard deviation?
I wrote a quick Matlab proglet, stdevtest2.m, which takes a 12-column matrix of values
and, for each month,
calculates standard deviations using sliding windows of increasing size - finishing with
the whole vector
The results are depressing. For Paris, with 237 years, +/- 20% of the real value was
possible with even 40
values. Windter months were more variable than Summer ones of course. What we really
need, and I don't think
it'll happen of course, is a set of metrics (by latitude band perhaps) so that we have a
broad measure of
the acceptable minimum value count for a given month and location. Even better, a
confidence figure that
All that's beyond me - statistically and in terms of time. I'm going to have to say '30'.. it's
pretty good
apart from DJF. For the one station I've looked at.
Back to the actual database issues - I need a day or two to think about the duplicate
finder.
Let's just look at the year 2003, for all the French stations in each database! Duh.
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 x
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 x
2003 42 54 108 122 144 199 200 232 173 109 101 65
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 x
2003 51 58 123 142 160 227 219 252 190 129 109 77
2003 45 56 114 133 165 242 243 267 197 133 109 72
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 x
2003 54 63 111 141 194 261 263 279 201 153 121 85
2003 85 79 115 137 191 248 253 274 209 156 129 96
2003 73 73 118 140 181 247 262 275 205 155 125 94
2003 86 73 105 131 184 241 251 265 207 166 136 93
2003 42 54 108 122 144 199 200 232 173 109 101 65 *
2003 51 58 123 142 160 227 219 252 190 129 109 77 *
2003 45 56 114 133 165 242 243 267 197 133 109 72 *
2003 85 79 115 137 191 248 253 274 209 156 129 96 *
2003 73 73 118 140 181 247 262 275 205 155 125 94 *
2003 86 73 105 131 184 241 251 265 207 166 136 93 *
2003 42 54 108 122 144 199 200 232 173 109 101 65 Do
2003 51 58 123 142 160 227 219 252 190 129 109 77 Do
2003 45 56 114 133 165 242 243 267 197 133 109 72 Do
2003 57 60 110 133 184 245 256 267 196 147 113 81 DC
2003 54 63 111 141 194 261 263 279 201 153 121 85 *
2003 85 79 115 137 191 248 253 274 209 156 129 96 Do
2003 73 73 118 140 181 247 262 275 205 155 125 94 Do
2003 86 73 105 131 184 241 251 265 207 166 136 93 Do
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 -
2003 57 60 110 133 184 245 256 267 196 147 113 81 DC
2003 67 75 105 116 136 179 189 210 173 125 120 90
2003 61 67 113 128 151 197 207 229 188 136 120 89
2003 39 48 107 131 183 254 251 268 182 126 100 61
2003 81 71 111 133 183 244 246 272 200 149 128 101
2003 93 69 111 137 191 254 264 282 212 168 137 103
In the original db, I've x'd those lines missing in the new one. Just missing vals.
In the new db, I've asterisked all the lines matching the old one, with duplicate
matches labeled 'Do'. Any other duplicates are marked Da, DB, DC. We can see that
all the original 2003 lines are included, *and replicated*. Three new lines are also
replicated. A further 25 lines are apparently new (though could well have parents
in the original db). This implies that very little matching is being performed!!
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/MCDW/mergefiles/merg.mcdw.tmp.0904021106.act.gz | grep 'LILLE'
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/MCDW/mergefiles/merg.mcdw.tmp.0904021239.act.gz | grep 'LILLE'
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/MCDW/mergefiles/merg.mcdw.tmp.0904151410.act.gz | grep 'LILLE'
So.. what happened? Why did it behave differently? No idea. It was the same for pre
though!
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/MCDW/mergefiles/merg.mcdw.pre.0904021239.act.gz | grep 'LILLE'
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/MCDW/mergefiles/merg.mcdw.pre.0904151410.act.gz | grep 'LILLE'
There was something very fishy about that run. Of course it was a single month - I
wonder if that made a difference?
crua6[/cru/cruts/version_3_0/update_top] cat
runs/runs.0904021239/conv.mcdw.0904021239.dat
9 1994 12 2008
crua6[/cru/cruts/version_3_0/update_top] cat
runs/runs.0904151410/conv.mcdw.0904151410.dat
1 2009 1 2009
Also of interest - how did the program find a 2000-2009 station when the previous update
was to 2008?
Aha:
crua6[/cru/cruts/version_3_0/update_top] cat
runs/runs.0904021239/conv.climat.0904021239.dat
1 2000 2 2009
The CLIMAT update did it!! It's that bloody no-metadata problem!! So I should be
looking at the
CLIMAT process for 0904021239, not the MCDW one. Duhh. So, the merge run:
crua6[/cru/cruts/version_3_0/update_top] cat
runs/runs.0904021239/merg.climat.0904021239.dat
db/tmx/tmx.0708071548.dtb
updates/CLIMAT/db/db.0904021239/climat.tmx.0904021239.dtb
updates/CLIMAT/db/db.0904021239/int2.tmx.0904021239.dtb
blind
crua6[/cru/cruts/version_3_0/update_top]
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/CLIMAT/db/db.0904021239/climat.tmp.0904021239.dtb.gz |grep -i 'lille'
crua6[/cru/cruts/version_3_0/update_top] gunzip -c
updates/CLIMAT/db/db.0904021239/int2.tmp.0904021239.dtb.gz |grep -i 'lille'
(there's then the BOM section but it's all over by now)
So, the merging of climat.tmp.0904021106.dtb into int1.tmp.0904021106.dtb FAILED.
WHY?
Well, the WMO codes are the same as for MCDW: 0701500. So it can't be that. The lat
and lon
are ~slightly~ different, though. Remember, the DATABASE entry was originally
(tmp.0903081416.dtb):
Now, the 'LILLE/LESQUIN' station header comes from the CLIMAT bulletins, ie, from
the WMO
reference file wmo.0710151633.dat. But it should have matched with the existing LILLE
-
the problem looks like the latitude shift (from 50.60 to 50.34) introduced by MCDW did
the damage. Obviously, if we are going to trust MCDW metadata as being valid
corrections,
then the WMO reference file needs to be updated at the same time!! So, we'll need:
1. A file called 'wmoref.latest.dat' that contains the name of the latest WMO reference
file.
- DONE
- EXTREMELY COMPLICATED
3. A routine to write a new WMO reference and to archive the old one.
- EXTREMELY COMPLICATED
- DONE
As part of the investigations, I found that I wasn't close()-ing off channel 10 when I used
it in update.for. Now I'm pretty sure that F77 follows the convention that an OPEN on an
open channel initiates an initial CLOSE automatically, but who wants to take that chance
with
the variety of compilers we're subject to? So I went through and inserted an indecent
number
of close(10)s.
I guess that new stations should be added to the wmo reference file? They are pan-
parameter (well
the MCDW ones are) but I have an eerie feeling that I won't experience joy when headers
are
compares headers when WMO codes match. If all WMO matches amongst the databases
share common
metadata (lat, lon, alt, name, country) then the successful header is written to a file. If,
however, any one of the WMO matches fails on any metadata - even slightly! - the
gaggle of
disjointed headers is written to a second file. I know that leeway should be given,
particularly
with lats & lons, but as a first stab I just need to know how bad things are. Well, I got
that:
crua6[/cru/cruts/version_3_0/update_top] ./metacmp
RESULTS:
Matched/unopposed: 2435
1250
279
41
92
83
129
552
Interesting, but not astounding. Roughly half are unpaired stations, with an impressive
Analysis of the 4000+ bad matches will be more complicated unfortunately. An initial
re-run looking for lat/lon within half a degree, and/or station partial, will be useful.
No, hang on. Easier to analyse the output from metacmp! And so.. postmetacmp.for:
2 in group: 642
3 in group: 71
4 in group: 188
5 in group: 625
6 in group: 183
7 in group: 411
8 in group: 1957
LAT:
1. 0
2. 3059
3. 276
4. 15
5. 0
6. 0
7. 0
8. 0
Maximum differences:
<0.1: 1233
<0.2: 726
<0.5: 1225
<1.0: 15
1.0+: 151
LON:
1. 0
2. 2996
3. 339
4. 30
5. 1
6. 0
7. 0
8. 0
Maximum differences:
<0.1: 1195
<0.2: 722
<0.5: 1242
<1.0: 30
1.0+: 177
ALT:
1. 0
2. 2035
3. 237
4. 17
5. 0
6. 0
7. 0
8. 0
Maximum differences:
<50m : 1767
<100m: 75
<500m: 121
<1km : 36
1km+ : 290
STATION NAME:
1. 0
2. 2167
3. 365
4. 43
5. 0
6. 0
7. 0
8. 0
<25% : 281
<50% : 385
<75% : 770
<100%: 276
100% : 863
COUNTRY NAME:
1. 0
2. 1475
3. 182
4. 41
5. 0
6. 0
7. 0
8. 0
Hmmm.. lots of groups that could be eliminated if we incorporated the WMO reference
list, because then we could allow an element of 'drift' from a reference point.
Decided to make it a bit quicker and easier as well, by removing tmn/tmx and letting