|
From: antonv <vas...@ya...> - 2009-01-17 07:16:58
|
Dear all, I know this is not related to matplotlib but this seems to be the only place where I found people that have knowledge of both NOAA data and python so please bear with me. The .bull file that NOAA gives for upload is an ascii file formatted for human readability but it creates a lot of issues when I am trying to parse it. Here is a link to one of these files: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull Do you have any idea on how to extract the data there in columns for plotting with matplotlib? If you look at the file you'll notice that there is both a header and a footer for the file that needs to be eliminated and the main columns have sub columns also. Another issue is that in a column there is missing data that should keep it's relationship with the time column. And the last issue, some of the values there are preceded by a "*" sign that should just be removed too. Any ideas are greatly appreciated! Anton -- View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Pierre GM <pgm...@gm...> - 2009-01-17 20:11:52
|
Anton, You may wanna check on the numpy list as well. I recently reimplemented a function to read text file as a combination of numpy.loadtxt and mlab.csv2rec, that handles missing data nicely. You can get it here for the moment: https://code.launchpad.net/~pierregm/numpy/numpy_addons The function you would need is mafromtxt, in fromascii. Alternatively, you can try using the scikits.timeseries package (http://pytseries.sourceforge.net/ ): recent SVN versions introduced tsfromtxt, that read a text file and return a timeseries. However, none of these possibilities will work out-of-the-box, because of the presence of the footer. What you could do is write a first function that gets rid of this footer (example of MO: open the file, read all the lines in a list, get rid of the first 7 rows (header) and last 8 ones, store the result in a file). Once you have only the data, use mafromtxt (for example) using space as a delimiter, and specify the columns you want to use with usecols (that way, you can get rid of the column with the '*'). The missing data should be taken into account properly. Let me know how it goes. P. On Jan 17, 2009, at 2:16 AM, antonv wrote: > > Dear all, > > I know this is not related to matplotlib but this seems to be the > only place > where I found people that have knowledge of both NOAA data and > python so > please bear with me. > > The .bull file that NOAA gives for upload is an ascii file formatted > for > human readability but it creates a lot of issues when I am trying to > parse > it. Here is a link to one of these files: > > ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull > > Do you have any idea on how to extract the data there in columns for > plotting with matplotlib? If you look at the file you'll notice that > there > is both a header and a footer for the file that needs to be > eliminated and > the main columns have sub columns also. Another issue is that in a > column > there is missing data that should keep it's relationship with the time > column. And the last issue, some of the values there are preceded by > a "*" > sign that should just be removed too. > > Any ideas are greatly appreciated! > > Anton > > -- > View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html > Sent from the matplotlib - users mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users |
|
From: antonv <vas...@ya...> - 2009-01-20 00:22:06
|
Hi Pierre, Thanks for the quick and thorough response! What I ended up doing is writing a custom function that does all the stuff that I needed without using numpy or mlab. Anton Pierre GM-2 wrote: > > Anton, > You may wanna check on the numpy list as well. > I recently reimplemented a function to read text file as a combination > of numpy.loadtxt and mlab.csv2rec, that handles missing data nicely. > You can get it here for the moment: > https://code.launchpad.net/~pierregm/numpy/numpy_addons > The function you would need is mafromtxt, in fromascii. Alternatively, > you can try using the scikits.timeseries package > (http://pytseries.sourceforge.net/ > ): recent SVN versions introduced tsfromtxt, that read a text file and > return a timeseries. > > However, none of these possibilities will work out-of-the-box, because > of the presence of the footer. What you could do is write a first > function that gets rid of this footer (example of MO: open the file, > read all the lines in a list, get rid of the first 7 rows (header) and > last 8 ones, store the result in a file). Once you have only the data, > use mafromtxt (for example) using space as a delimiter, and specify > the columns you want to use with usecols (that way, you can get rid of > the column with the '*'). The missing data should be taken into > account properly. > > Let me know how it goes. > P. > > > > On Jan 17, 2009, at 2:16 AM, antonv wrote: > >> >> Dear all, >> >> I know this is not related to matplotlib but this seems to be the >> only place >> where I found people that have knowledge of both NOAA data and >> python so >> please bear with me. >> >> The .bull file that NOAA gives for upload is an ascii file formatted >> for >> human readability but it creates a lot of issues when I am trying to >> parse >> it. Here is a link to one of these files: >> >> ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull >> >> Do you have any idea on how to extract the data there in columns for >> plotting with matplotlib? If you look at the file you'll notice that >> there >> is both a header and a footer for the file that needs to be >> eliminated and >> the main columns have sub columns also. Another issue is that in a >> column >> there is missing data that should keep it's relationship with the time >> column. And the last issue, some of the values there are preceded by >> a "*" >> sign that should just be removed too. >> >> Any ideas are greatly appreciated! >> >> Anton >> >> -- >> View this message in context: >> http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html >> Sent from the matplotlib - users mailing list archive at Nabble.com. >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by: >> SourcForge Community >> SourceForge wants to tell your story. >> http://p.sf.net/sfu/sf-spreadtheword >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -- View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21554671.html Sent from the matplotlib - users mailing list archive at Nabble.com. |