From: John H. <jd...@gm...> - 2007-06-07 16:54:05
|
I just added support for native plotting of python date and datetime objects (you still can, but don't have to use plot_date with date2num conversions). We will continue to do conversion to floats under the hood, but the conversion can be handled automagically. I also added support for loading CSV files (or general space/tab/comma delimited files) into numpy record arrays, and the type conversions (int, float, date, etc...) happen automagically. The function assumes there is a header row, and these strings will be munged to give valid python attribute names. It inspects the first checkrows lines after the header to try and infer the datatype and set the appropriate conversion function. It's not entirely bullet proof, but it should cover a lot of common use cases. Here is an example (svn only) from matplotlib.mlab import csv2rec from pylab import figure, show a = csv2rec('data/msft.csv') fig = figure() ax = fig.add_subplot(111) ax.plot(a.date, a.adj_close, '-') fig.autofmt_xdate() show() The autofmt_xdate is optional, but is a new function that does a few things you usually want in date plots: turns off tick labels in the upper subplots if any, rotates the tick labels on the lowest axes and right aligns them, and increases the bottom of the subplots adjust to make room for the rotated tick labels. Here is what the dtype looks like from the example above. In [3]: !head -3 data/msft.csv Date,Open,High,Low,Close,Volume,Adj. Close* 19-Sep-03,29.76,29.97,29.52,29.96,92433800,29.79 18-Sep-03,28.49,29.51,28.42,29.50,67268096,29.34 In [4]: a = csv2rec('data/msft.csv') In [5]: a.dtype Out[5]: dtype([('date', '|O4'), ('open', '<f8'), ('high', '<f8'), ('low', '<f8'), ('close', '<f8'), ('volume', '<i4'), ('adj_close', '<f8')]) In [6]: a.date[:2] Out[6]: array([2003-09-19 00:00:00, 2003-09-18 00:00:00], dtype=object) I'll probably add a few performance features to the csv2rec function, mainly to let you skip columns and supply conversion functions where desired because the autodate parser is pretty slow if you want to parse date strings, but this is enough to make it useful. Another useful feature will be able to support customizable type dependent NULL value conversion (eg convert to numpy.nan for floats, '0000-00-00' for dates, etc...) Record arrays are your friend; have fun! JDH |
From: Lionel R. <lro...@li...> - 2007-06-08 07:08:19
|
Hi John, very very interesting idea. Is there a way to add some extras informations on the records arrays column= s,=20 like the units or/and the desired labels for the resulting plotted lines,=20 directly retrieved in the CSV files? Cordialy Le jeudi 07 juin 2007, John Hunter a =E9crit=A0: > I just added support for native plotting of python date and datetime > objects (you still can, but don't have to use plot_date with date2num > conversions). We will continue to do conversion to floats under the > hood, but the conversion can be handled automagically. I also added > support for loading CSV files (or general space/tab/comma delimited > files) into numpy record arrays, and the type conversions (int, float, > date, etc...) happen automagically. The function assumes there is a > header row, and these strings will be munged to give valid python > attribute names. It inspects the first checkrows lines after the > header to try and infer the datatype and set the appropriate > conversion function. It's not entirely bullet proof, but it should > cover a lot of common use cases. > > Here is an example (svn only) > > from matplotlib.mlab import csv2rec > from pylab import figure, show > > a =3D csv2rec('data/msft.csv') > fig =3D figure() > ax =3D fig.add_subplot(111) > ax.plot(a.date, a.adj_close, '-') > fig.autofmt_xdate() > show() > > The autofmt_xdate is optional, but is a new function that does a few > things you usually want in date plots: turns off tick labels in the > upper subplots if any, rotates the tick labels on the lowest axes and > right aligns them, and increases the bottom of the subplots adjust to > make room for the rotated tick labels. > > Here is what the dtype looks like from the example above. > > In [3]: !head -3 data/msft.csv > Date,Open,High,Low,Close,Volume,Adj. Close* > 19-Sep-03,29.76,29.97,29.52,29.96,92433800,29.79 > 18-Sep-03,28.49,29.51,28.42,29.50,67268096,29.34 > > In [4]: a =3D csv2rec('data/msft.csv') > > In [5]: a.dtype > Out[5]: dtype([('date', '|O4'), ('open', '<f8'), ('high', '<f8'), > ('low', '<f8'), ('close', '<f8'), ('volume', '<i4'), ('adj_close', > '<f8')]) > > In [6]: a.date[:2] > Out[6]: array([2003-09-19 00:00:00, 2003-09-18 00:00:00], dtype=3Dobjec= t) > > I'll probably add a few performance features to the csv2rec function, > mainly to let you skip columns and supply conversion functions where > desired because the autodate parser is pretty slow if you want to > parse date strings, but this is enough to make it useful. Another > useful feature will be able to support customizable type dependent > NULL value conversion (eg convert to numpy.nan for floats, > '0000-00-00' for dates, etc...) > > Record arrays are your friend; have fun! > JDH > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users =2D-=20 Lionel Roubeyrie - lro...@li... Chag=E9 d'=E9tudes et de maintenance LIMAIR - la Surveillance de l'Air en Limousin http://www.limair.asso.fr |
From: John H. <jd...@gm...> - 2007-06-08 14:50:38
Attachments:
plotfile.png
|
On 6/8/07, Lionel Roubeyrie <lro...@li...> wrote: > Hi John, > very very interesting idea. > Is there a way to add some extras informations on the records arrays columns, > like the units or/and the desired labels for the resulting plotted lines, > directly retrieved in the CSV files? It could be done, but my goal here is not to create a persistence layer for record arrays, or a method of describing them or mpl labels, but rather a way to easily import 3rd party CSV files into numpy record arrays. I work with a lot of tab/space/ascii delimited files, and found myself duplicating a lot of code importing them into record arrays. This function is the distillation of that code. It would be fairly easy to add designated rows for those who did want to decorate their CSV files. I think it might be most useful to support a row that provided a numpy dtype per column, or perhaps the name of a converter function... One thing people coming from gnuplot miss is file plotting functionality. I just added a function to pylab called plotfile which uses the csv2rec functionality (with autolabeling etc) to plot data from a file. Eg, >>> plotfile(fname, (0,5,6)) plots columns 5 and 6 against column 0. And >>> plotfile(fname, ('date', 'volume', 'adj_close'), plotfuncs={'volume': 'bar'}) does the same using the names of the columns, using "plot" for adj_close (the default) and "bar" for volume (customization from the plotfuncs dictionary). The column names in either case are used to create default x and y labels. The 2nd command produces the attached plot. This is just a first pass, so if people want to see a different interface or have an opinion what should be returned, or where this function should live outside of pylab, feel free to comment or commit changes. JDH |
From: butterw <bu...@gm...> - 2011-04-27 04:09:35
|
given a recarray r, r.dtype.names contains a tuple with the column names. It should be easy to do what you want using a loop. briant100 wrote: > > Hey John - currently using matplotlib.mlab import csv2rec functionality in > a script. > > Is there a tool or way to automate plotting of multiple y series contained > in a csv data file (data in columns, header is first row, x axis is time, > several y series) with varying column header names and varying numbers of > columns depending on the individual data file? > I particularly want to avoid manually typing individual series names -as > this information is contained in the header row for each column of data it > seems inefficient to have to type series names for plotting, only to have > to retype series names for the next csv file which contains different > column header names > > Plotfile came close, but doesnt seem to automatically label individual > series by column header > eg file formats (varying headers, and numbers of columns): > > file 1 > elapsedtime,AS2data,AS45data,SE34data,VB56data > > file 2 > elapsedtime,AS09data,VB24data > -- View this message in context: http://old.nabble.com/record-array-and-date-support-tp11011990p31483894.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
From: briant100 <btr...@ho...> - 2011-04-28 06:16:38
|
that should work many thanks butterw wrote: > > given a recarray r, r.dtype.names contains a tuple with the column names. > > It should be easy to do what you want using a loop. > > > briant100 wrote: >> >> Hey John - currently using matplotlib.mlab import csv2rec functionality >> in a script. >> >> Is there a tool or way to automate plotting of multiple y series >> contained in a csv data file (data in columns, header is first row, x >> axis is time, several y series) with varying column header names and >> varying numbers of columns depending on the individual data file? >> I particularly want to avoid manually typing individual series names -as >> this information is contained in the header row for each column of data >> it seems inefficient to have to type series names for plotting, only to >> have to retype series names for the next csv file which contains >> different column header names >> >> Plotfile came close, but doesnt seem to automatically label individual >> series by column header >> eg file formats (varying headers, and numbers of columns): >> >> file 1 >> elapsedtime,AS2data,AS45data,SE34data,VB56data >> >> file 2 >> elapsedtime,AS09data,VB24data >> > > -- View this message in context: http://old.nabble.com/record-array-and-date-support-tp11011990p31493748.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
From: briant100 <btr...@ho...> - 2011-04-27 02:10:08
|
Hey John - currently using matplotlib.mlab import csv2rec functionality in a script. Is there a tool or way to automate plotting of multiple y series contained in a csv data file (data in columns, header is first row, x axis is time, several y series) with varying column header names and varying numbers of columns depending on the individual data file? I particularly want to avoid manually typing individual series names -as this information is contained in the header row for each column of data it seems inefficient to have to type series names for plotting, only to have to retype series names for the next csv file which contains different column header names Plotfile came close, but doesnt seem to automatically label individual series by column header eg file formats (varying headers, and numbers of columns): file 1 elapsedtime,AS2data,AS45data,SE34data,VB56data file 2 elapsedtime,AS09data,VB24data John Hunter-4 wrote: > > <<support for native plotting of python date and datetime > objects <<support for loading CSV files (or general space/tab/comma > delimited > files) into numpy record arrays, and the type conversions (int, float, > date, etc...) >><<The function assumes there is a > header row, and these strings will be munged to give valid python > attribute names. It inspects the first checkrows lines after the > header to try and infer the datatype and set the appropriate > conversion function. >> > Here is an example (svn only) > > from matplotlib.mlab import csv2rec > from pylab import figure, show > > a = csv2rec('data/msft.csv') > fig = figure() > ax = fig.add_subplot(111) > ax.plot(a.date, a.adj_close, '-') > fig.autofmt_xdate() > show() > > The autofmt_xdate is optional, but is a new function that does a few > things you usually want in date plots: turns off tick labels in the > upper subplots if any, rotates the tick labels on the lowest axes and > right aligns them, and increases the bottom of the subplots adjust to > make room for the rotated tick labels. > > Here is what the dtype looks like from the example above. > > In [3]: !head -3 data/msft.csv > Date,Open,High,Low,Close,Volume,Adj. Close* > 19-Sep-03,29.76,29.97,29.52,29.96,92433800,29.79 > 18-Sep-03,28.49,29.51,28.42,29.50,67268096,29.34 > > In [4]: a = csv2rec('data/msft.csv') > > In [5]: a.dtype > Out[5]: dtype([('date', '|O4'), ('open', '<f8'), ('high', '<f8'), > ('low', '<f8'), ('close', '<f8'), ('volume', '<i4'), ('adj_close', > '<f8')]) > > In [6]: a.date[:2] > Out[6]: array([2003-09-19 00:00:00, 2003-09-18 00:00:00], dtype=object) > > I'll probably add a few performance features to the csv2rec function, > mainly to let you skip columns and supply conversion functions where > desired because the autodate parser is pretty slow if you want to > parse date strings, but this is enough to make it useful. Another > useful feature will be able to support customizable type dependent > NULL value conversion (eg convert to numpy.nan for floats, > '0000-00-00' for dates, etc...) > > Record arrays are your friend; have fun! > JDH > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -- View this message in context: http://old.nabble.com/record-array-and-date-support-tp11011990p31483567.html Sent from the matplotlib - users mailing list archive at Nabble.com. |