From: Martin M. <mmo...@gm...> - 2013-10-10 13:19:51
|
Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png FigureCanvasAgg.draw(self) File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw self.figure.draw(self.renderer) File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw func(*args) File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw a.draw(renderer) File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw return Collection.draw(self, renderer) File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin |
From: Benjamin R. <ben...@ou...> - 2013-10-10 13:34:05
|
On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm...> wrote: > Hi, > rendering some of my charts takes almost 50GB of RAM. I believe below is > a stracktrace > of one such situation when it already took 15GB. Would somebody comments > on what is > matplotlib doing at the very moment? Why the recursion? > > The charts had to have 262422 data points in a 2D scatter plot, each > point has assigned > its own color. They are in batches so that there are 153 distinct colors > but nevertheless, > I assigned to each data point a color value. There are 153 legend items > also (one color > won't be used). > > ^CTraceback (most recent call last): > ... > _figure.savefig(filename, dpi=100) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line > 1421, in savefig > self.canvas.print_figure(*args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", > line 2220, in print_figure > **kwargs) > File > "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", > line 505, in print_png > FigureCanvasAgg.draw(self) > File > "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", > line 451, in draw > self.figure.draw(self.renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, > in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line > 1034, in draw > func(*args) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, > in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, > in draw > a.draw(renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, > in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", > line 718, in draw > return Collection.draw(self, renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, > in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", > line 276, in draw > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", > line 551, in get_edgecolor > return self._edgecolors > KeyboardInterrupt > ^CError in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", > line 90, in destroy_all > gc.collect() > KeyboardInterrupt > Error in sys.exitfunc: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", > line 90, in destroy_all > gc.collect() > KeyboardInterrupt > > ^C > > > Clues what is the code doing? I use mpl-1.3.0. > Thank you, > Martin > > Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Ben Root |
From: Martin M. <mmo...@gm...> - 2013-10-10 13:46:10
|
Benjamin Root wrote: > > > > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: > > Hi, > rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace > of one such situation when it already took 15GB. Would somebody comments on what is > matplotlib doing at the very moment? Why the recursion? > > The charts had to have 262422 data points in a 2D scatter plot, each point has assigned > its own color. They are in batches so that there are 153 distinct colors but nevertheless, > I assigned to each data point a color value. There are 153 legend items also (one color > won't be used). > > ^CTraceback (most recent call last): > ... > _figure.savefig(filename, dpi=100) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig > self.canvas.print_figure(*args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure > **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png > FigureCanvasAgg.draw(self) > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw > self.figure.draw(self.renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw > func(*args) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw > a.draw(renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw > return Collection.draw(self, renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor > return self._edgecolors > KeyboardInterrupt > ^CError in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > gc.collect() > KeyboardInterrupt > Error in sys.exitfunc: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > gc.collect() > KeyboardInterrupt > > ^C > > > Clues what is the code doing? I use mpl-1.3.0. > Thank you, > Martin > > > Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. > > Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. Thanks, Martin |
From: Michael D. <md...@st...> - 2013-10-10 14:12:12
|
On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote: > Benjamin Root wrote: >> >> >> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: >> >> Hi, >> rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace >> of one such situation when it already took 15GB. Would somebody comments on what is >> matplotlib doing at the very moment? Why the recursion? >> >> The charts had to have 262422 data points in a 2D scatter plot, each point has assigned >> its own color. They are in batches so that there are 153 distinct colors but nevertheless, >> I assigned to each data point a color value. There are 153 legend items also (one color >> won't be used). >> >> ^CTraceback (most recent call last): >> ... >> _figure.savefig(filename, dpi=100) >> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig >> self.canvas.print_figure(*args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure >> **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png >> FigureCanvasAgg.draw(self) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw >> self.figure.draw(self.renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw >> func(*args) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw >> a.draw(renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw >> return Collection.draw(self, renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw >> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor >> return self._edgecolors >> KeyboardInterrupt >> ^CError in atexit._run_exitfuncs: >> Traceback (most recent call last): >> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >> func(*targs, **kargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >> gc.collect() >> KeyboardInterrupt >> Error in sys.exitfunc: >> Traceback (most recent call last): >> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >> func(*targs, **kargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >> gc.collect() >> KeyboardInterrupt >> >> ^C >> >> >> Clues what is the code doing? I use mpl-1.3.0. >> Thank you, >> Martin >> >> >> Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. >> >> Without the accompanying code, it would be difficult to determine where the memory hog is. > Could there be places where gc.collect() could be introduced? Are there places where matplotlib > could del() unnecessary objects right away? I think the problem is with huge lists or pythonic > dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just > 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely > a dict and that is the same issue. > > Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of > dots, of course. Matplotlib generally keeps data in Numpy arrays, not lists or dictionaries (though given that matplotlib predates Numpy, there are some corner cases we've found recently where arrays are converted to lists and back unintentionally). As Ben said, the traceback looks quite normal -- and it doesn't show what any of the values are. If you can provide us with a script that reproduces this, that's the only way we can really plug in and see what might be going wrong. It doesn't have to have anything proprietary, such as your data. You can even start with one of the existing examples, if that helps. Mike > _ > |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ > | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | > > http://www.droettboom.com |
From: Martin M. <mmo...@gm...> - 2013-10-10 14:20:18
|
Michael Droettboom wrote: > On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote: >> Benjamin Root wrote: >>> >>> >>> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: >>> >>> Hi, >>> rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace >>> of one such situation when it already took 15GB. Would somebody comments on what is >>> matplotlib doing at the very moment? Why the recursion? >>> >>> The charts had to have 262422 data points in a 2D scatter plot, each point has assigned >>> its own color. They are in batches so that there are 153 distinct colors but nevertheless, >>> I assigned to each data point a color value. There are 153 legend items also (one color >>> won't be used). >>> >>> ^CTraceback (most recent call last): >>> ... >>> _figure.savefig(filename, dpi=100) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig >>> self.canvas.print_figure(*args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure >>> **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png >>> FigureCanvasAgg.draw(self) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw >>> self.figure.draw(self.renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw >>> func(*args) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw >>> a.draw(renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw >>> return Collection.draw(self, renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw >>> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor >>> return self._edgecolors >>> KeyboardInterrupt >>> ^CError in atexit._run_exitfuncs: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> Error in sys.exitfunc: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> >>> ^C >>> >>> >>> Clues what is the code doing? I use mpl-1.3.0. >>> Thank you, >>> Martin >>> >>> >>> Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. >>> >>> Without the accompanying code, it would be difficult to determine where the memory hog is. >> Could there be places where gc.collect() could be introduced? Are there places where matplotlib >> could del() unnecessary objects right away? I think the problem is with huge lists or pythonic >> dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just >> 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely >> a dict and that is the same issue. >> >> Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of >> dots, of course. > > Matplotlib generally keeps data in Numpy arrays, not lists or > dictionaries (though given that matplotlib predates Numpy, there are > some corner cases we've found recently where arrays are converted to > lists and back unintentionally). Just a brief note. I don't use Numpy myself in my code, so consider that while replicating my use case. ;) The code is merely what I think Tony Yu of Chao Yue proposed or somebody, sorry, don't remember now, proposed to me on this list in the past. I am writing it now really from top of my head, maybe I remember rubbish. ;) Martin |
From: Michael D. <md...@st...> - 2013-10-10 13:34:26
|
Can you provide a complete, standalone example that reproduces the problem. Otherwise all I can do is guess. The usual culprit is forgetting to close figures after you're done with them. Mike On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote: > Hi, > rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace > of one such situation when it already took 15GB. Would somebody comments on what is > matplotlib doing at the very moment? Why the recursion? > > The charts had to have 262422 data points in a 2D scatter plot, each point has assigned > its own color. They are in batches so that there are 153 distinct colors but nevertheless, > I assigned to each data point a color value. There are 153 legend items also (one color > won't be used). > > ^CTraceback (most recent call last): > ... > _figure.savefig(filename, dpi=100) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig > self.canvas.print_figure(*args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure > **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png > FigureCanvasAgg.draw(self) > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw > self.figure.draw(self.renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw > func(*args) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw > a.draw(renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw > return Collection.draw(self, renderer) > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > draw(artist, renderer, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor > return self._edgecolors > KeyboardInterrupt > ^CError in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > gc.collect() > KeyboardInterrupt > Error in sys.exitfunc: > Traceback (most recent call last): > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > gc.collect() > KeyboardInterrupt > > ^C > > > Clues what is the code doing? I use mpl-1.3.0. > Thank you, > Martin > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users -- _ |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | http://www.droettboom.com |
From: Martin M. <mmo...@gm...> - 2013-10-10 14:11:40
|
Michael Droettboom wrote: > Can you provide a complete, standalone example that reproduces the > problem. Otherwise all I can do is guess. > > The usual culprit is forgetting to close figures after you're done with > them. Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which is not the case here. Also use gc.collect() a lot through the code, especially before and after I draw every figure. That is not enough here. from itertools import izip, imap, ifilter import pylab import matplotlib # Force matplotlib not to use any X-windows backend. matplotlib.use('Agg') import pylab F = pylab.gcf() # convert the view of numpy array to tuple # http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html DefaultSize = tuple(F.get_size_inches()) def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1): # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs) if len(mydata_x) != len(mydata_y): raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % (filename, len(mydata_x), len(mydata_y)) if colors and len(mydata_x) != len(colors): sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors))) if colors and legends and len(colors) != len(legends): sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n" % (filename, len(colors), len(legends))) if mydata_x and mydata_y and filename: if legends: if not legend_ncol: _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize) else: _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol else: _subfigs, _ax1_num, _legend_ncol = 3, 313, 0 set_my_pylab_defaults() pylab.clf() _figure = pylab.figure() _figure.clear() _figure.set_tight_layout(True) gc.collect() if legends: # do not crash on too tall figures if 8.4 * _subfigs < 200: _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1)) else: # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with: # ValueError: width and height must each be below 32768 _figure.set_size_inches(11.2, 200) sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n" % (filename, 8.4 * _subfigs, 200)) if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches())) _ax1 = _figure.add_subplot(_ax1_num) _ax2 = _figure.add_subplot(_ax2_num) else: _figure.set_size_inches(11.2, 8.4 * 2) _ax1 = _figure.gca() if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches())) _series = [] #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) if legends: #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) _series.append(_my_PathCollection) _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', fontsize=legend_fontsize) _ax2.set_frame_on(False) _ax2.tick_params(bottom='off', left='off', right='off', top='off') pylab.setp(_ax2.get_yticklabels(), visible=False) pylab.setp(_ax2.get_xticklabels(), visible=False) else: for _x, _y, _c in izip(mydata_x, mydata_y, colors): _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # keeps eating memory in: # # draw_hist2d_plot(filename, _data_xrow, _data_yrow, _my_colors, _title, _xlabel, _ylabel, [], xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, dpi=100) # File "/blah.py", line 14080, in draw_hist2d_plot # _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 6247, in scatter # self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs) # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 1685, in _process_unit_info # self.xaxis.update_units(xdata) # File "/usr/lib64/python2.7/site-packages/matplotlib/axis.py", line 1332, in update_units # converter = munits.registry.get_converter(data) # pylab.subplots_adjust(left = (5/25.4)/_figure.xsize, bottom = (4/25.4)/_figure.ysize, right = 1 - (1/25.4)/_figure.xsize, top = 1 - (3/25.4)/_figure.ysize) _ax1.set_xlabel(xlabel_data, fontsize=fontsize) _ax1.set_ylabel(ylabel_data, fontsize=fontsize) _ax1.set_xmargin(0.05) _ax1.set_ymargin(0.05) _ax1.set_autoscale_on(False) set_limits(_ax1, xmin, xmax, ymin, ymax) if fontsize == 10: _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2) elif fontsize == 12: _ax1.set_title('\n'.join(wrap(title_data, 90)), fontsize=fontsize+2) else: _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2) if legends: _figure.savefig(filename, dpi=100) #, bbox_inches='tight') del(_my_PathCollection) del(_ax2) else: _figure.savefig(filename, dpi=100) del(_series) del(_ax1) _figure.clear() del(_figure) pylab.clf() pylab.close() # pylab.rcdefaults() gc.collect() That's the whole function. I used to suspect _ax1.scatter() in the past but probably only because I hit the memory problems earlier. That is worked around now by using on disk bsddb3 file or gdbm somewhere upstream. This particular function is nevertheless fed with just a huge list numbers, and that is not the issue in itself. I would be glad if I could tell matplotlib: Here you have 100 colors, use them for all data as you wish, just spread them evenly over the whole dataset so that first 1/100th of the data gets the first color, second 1/100th of the data gets the second color, and so on. Optionally, if you would like to say: use the 100 colors in cycles for all data points, just loop through the colors as long as you need some. In both scenarios, I could have avoided the two for loops in the above code and necessity to generate those objects. Same for legend stuff. Martin > > Mike > > On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote: >> Hi, >> rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace >> of one such situation when it already took 15GB. Would somebody comments on what is >> matplotlib doing at the very moment? Why the recursion? >> >> The charts had to have 262422 data points in a 2D scatter plot, each point has assigned >> its own color. They are in batches so that there are 153 distinct colors but nevertheless, >> I assigned to each data point a color value. There are 153 legend items also (one color >> won't be used). >> >> ^CTraceback (most recent call last): >> ... >> _figure.savefig(filename, dpi=100) >> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig >> self.canvas.print_figure(*args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure >> **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png >> FigureCanvasAgg.draw(self) >> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw >> self.figure.draw(self.renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw >> func(*args) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw >> a.draw(renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw >> return Collection.draw(self, renderer) >> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >> draw(artist, renderer, *args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw >> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), >> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor >> return self._edgecolors >> KeyboardInterrupt >> ^CError in atexit._run_exitfuncs: >> Traceback (most recent call last): >> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >> func(*targs, **kargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >> gc.collect() >> KeyboardInterrupt >> Error in sys.exitfunc: >> Traceback (most recent call last): >> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >> func(*targs, **kargs) >> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >> gc.collect() >> KeyboardInterrupt >> >> ^C >> >> >> Clues what is the code doing? I use mpl-1.3.0. >> Thank you, >> Martin >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from >> the latest Intel processors and coprocessors. See abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -- Martin Mokrejs, Ph.D. Bioinformatics Donovalska 1658 149 00 Prague Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs |
From: Michael D. <md...@st...> - 2013-10-10 14:23:18
|
Thanks. This is much more helpful. What we need, however, is a "self contained, standalone example". The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong. Mike On 10/10/2013 10:12 AM, Martin MOKREJŠ wrote: > Michael Droettboom wrote: >> Can you provide a complete, standalone example that reproduces the >> problem. Otherwise all I can do is guess. >> >> The usual culprit is forgetting to close figures after you're done with >> them. > Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks > ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which > is not the case here. Also use gc.collect() a lot through the code, especially before and after > I draw every figure. That is not enough here. > > > > > > from itertools import izip, imap, ifilter > import pylab > import matplotlib > # Force matplotlib not to use any X-windows backend. > matplotlib.use('Agg') > import pylab > > F = pylab.gcf() > > # convert the view of numpy array to tuple > # http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html > DefaultSize = tuple(F.get_size_inches()) > > > > def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1): > # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs) > > if len(mydata_x) != len(mydata_y): > raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % (filename, len(mydata_x), len(mydata_y)) > > if colors and len(mydata_x) != len(colors): > sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors))) > > if colors and legends and len(colors) != len(legends): > sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n" % (filename, len(colors), len(legends))) > > if mydata_x and mydata_y and filename: > if legends: > if not legend_ncol: > _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize) > else: > _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol > else: > _subfigs, _ax1_num, _legend_ncol = 3, 313, 0 > > set_my_pylab_defaults() > pylab.clf() > _figure = pylab.figure() > _figure.clear() > _figure.set_tight_layout(True) > gc.collect() > > if legends: > # do not crash on too tall figures > if 8.4 * _subfigs < 200: > _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1)) > else: > # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with: > # ValueError: width and height must each be below 32768 > _figure.set_size_inches(11.2, 200) > sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n" % (filename, 8.4 * _subfigs, 200)) > if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches())) > _ax1 = _figure.add_subplot(_ax1_num) > _ax2 = _figure.add_subplot(_ax2_num) > else: > _figure.set_size_inches(11.2, 8.4 * 2) > _ax1 = _figure.gca() > if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches())) > > _series = [] > #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): > for _x, _y, _c in izip(mydata_x, mydata_y, colors): > # _Line2D = _ax1.plot(_x, _y) # returns Line2D object > _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object > _series.append(_my_PathCollection) > > if legends: > #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): > for _x, _y, _c in izip(mydata_x, mydata_y, colors): > _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) > _series.append(_my_PathCollection) > > _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', fontsize=legend_fontsize) > _ax2.set_frame_on(False) > _ax2.tick_params(bottom='off', left='off', right='off', top='off') > pylab.setp(_ax2.get_yticklabels(), visible=False) > pylab.setp(_ax2.get_xticklabels(), visible=False) > else: > for _x, _y, _c in izip(mydata_x, mydata_y, colors): > _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # keeps eating memory in: > # > # draw_hist2d_plot(filename, _data_xrow, _data_yrow, _my_colors, _title, _xlabel, _ylabel, [], xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, dpi=100) > # File "/blah.py", line 14080, in draw_hist2d_plot > # _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') > # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 6247, in scatter > # self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs) > # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 1685, in _process_unit_info > # self.xaxis.update_units(xdata) > # File "/usr/lib64/python2.7/site-packages/matplotlib/axis.py", line 1332, in update_units > # converter = munits.registry.get_converter(data) > > # pylab.subplots_adjust(left = (5/25.4)/_figure.xsize, bottom = (4/25.4)/_figure.ysize, right = 1 - (1/25.4)/_figure.xsize, top = 1 - (3/25.4)/_figure.ysize) > > _ax1.set_xlabel(xlabel_data, fontsize=fontsize) > _ax1.set_ylabel(ylabel_data, fontsize=fontsize) > _ax1.set_xmargin(0.05) > _ax1.set_ymargin(0.05) > _ax1.set_autoscale_on(False) > > > set_limits(_ax1, xmin, xmax, ymin, ymax) > > if fontsize == 10: > _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2) > elif fontsize == 12: > _ax1.set_title('\n'.join(wrap(title_data, 90)), fontsize=fontsize+2) > else: > _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2) > > if legends: > _figure.savefig(filename, dpi=100) #, bbox_inches='tight') > del(_my_PathCollection) > del(_ax2) > else: > _figure.savefig(filename, dpi=100) > > del(_series) > del(_ax1) > _figure.clear() > del(_figure) > pylab.clf() > pylab.close() > # pylab.rcdefaults() > > gc.collect() > > > > That's the whole function. I used to suspect _ax1.scatter() in the past but probably > only because I hit the memory problems earlier. That is worked around now by using > on disk bsddb3 file or gdbm somewhere upstream. This particular function is nevertheless > fed with just a huge list numbers, and that is not the issue in itself. > > I would be glad if I could tell matplotlib: Here you have 100 colors, use them for all data > as you wish, just spread them evenly over the whole dataset so that first 1/100th of the data > gets the first color, second 1/100th of the data gets the second color, and so on. Optionally, > if you would like to say: use the 100 colors in cycles for all data points, just loop through > the colors as long as you need some. In both scenarios, I could have avoided the two for loops > in the above code and necessity to generate those objects. Same for legend stuff. > > Martin > >> Mike >> >> On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote: >>> Hi, >>> rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace >>> of one such situation when it already took 15GB. Would somebody comments on what is >>> matplotlib doing at the very moment? Why the recursion? >>> >>> The charts had to have 262422 data points in a 2D scatter plot, each point has assigned >>> its own color. They are in batches so that there are 153 distinct colors but nevertheless, >>> I assigned to each data point a color value. There are 153 legend items also (one color >>> won't be used). >>> >>> ^CTraceback (most recent call last): >>> ... >>> _figure.savefig(filename, dpi=100) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig >>> self.canvas.print_figure(*args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure >>> **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png >>> FigureCanvasAgg.draw(self) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw >>> self.figure.draw(self.renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw >>> func(*args) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw >>> a.draw(renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw >>> return Collection.draw(self, renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw >>> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor >>> return self._edgecolors >>> KeyboardInterrupt >>> ^CError in atexit._run_exitfuncs: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> Error in sys.exitfunc: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> >>> ^C >>> >>> >>> Clues what is the code doing? I use mpl-1.3.0. >>> Thank you, >>> Martin >>> >>> ------------------------------------------------------------------------------ >>> October Webinars: Code for Performance >>> Free Intel webinars can help you accelerate application performance. >>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from >>> the latest Intel processors and coprocessors. See abstracts and register > >>> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Matplotlib-users mailing list >>> Mat...@li... >>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >> -- _ |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | http://www.droettboom.com |
From: Benjamin R. <ben...@ou...> - 2013-10-10 14:47:55
|
On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom <md...@st...>wrote: > Thanks. This is much more helpful. > > What we need, however, is a "self contained, standalone example". The > code below calls functions that are not present. See http://sscce.org/for why this is so important. Again, I would have to guess what those > functions do -- it may be relevant, it may not. If I have something that I > can *just run* then I can use various introspection tools to see what is > going wrong. > > Mike > > That being said, I do see a number of anti-patterns here that could be significant. For example: for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) Could be more concisely written as: _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c in izip(mydata_x, mydata_y, colors)] Python can then more intelligently handle memory management by intelligently allocating the memory for _series. You can then use _series.extend() for when you are doing the scatter plots for _ax2 with a similar list comprehension (or even a generator statement). I would also question the need to store _series in the first place. You use it for the call to legend, but you could have simply passed a label to each call of scatter as well. Some other things of note: 1) The clear() call here is completely useless as the figure is already clear. _figure = pylab.figure() _figure.clear() 2) When limits are set on an axis, autoscaling for that axis is automatically turned off anyway, so no need to turn if off yourself (also not sure why you are calling out to an external function here): _ax1.set_autoscale_on(False) set_limits(_ax1, xmin, xmax, ymin, ymax) 3) Finally, some discussion on the end of your function here: if legends: _figure.savefig(filename, dpi=100) #, bbox_inches='tight') del(_my_PathCollection) del(_ax2) else: _figure.savefig(filename, dpi=100) del(_series) del(_ax1) _figure.clear() del(_figure) pylab.clf() pylab.close() first, as discussed, you can easily eliminate the need for _my_PathCollection and possibly even _series. Second, when calling _figure.clear(), all of its axes objects are deleted for you, so you don't need to delete them yourself. Third, you delete the _figure object, but then call "pylab.clf()". I haven't double-checked exactly what would happen, but I think you might run the risk of accidentially clearing some other existing figure by doing that. Lastly, you then call pylab.close(), which I point out the same caveat as before. Really, all you needed was pylab.close() and you can eliminate the 5 preceding lines and the other two del()'s. All del() really does is remove the variable out of scope. Once that object is out of everybody's scope, then the gc can clean it up. Since the function was ending anyway, there is no point in deleting the variable. I don't know if this would fix your problem, and there are a bunch of other style issues here (particularly, pylab really shouldn't be used this way), but hopefully this gives some food for thought. Cheers! Ben Root |
From: Benjamin R. <ben...@ou...> - 2013-10-10 14:05:36
|
On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ <mmo...@gm...> wrote: > Benjamin Root wrote: > > > > > > > > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm...<mailto: > mmo...@gm...>> wrote: > > > > Hi, > > rendering some of my charts takes almost 50GB of RAM. I believe > below is a stracktrace > > of one such situation when it already took 15GB. Would somebody > comments on what is > > matplotlib doing at the very moment? Why the recursion? > > > > The charts had to have 262422 data points in a 2D scatter plot, > each point has assigned > > its own color. They are in batches so that there are 153 distinct > colors but nevertheless, > > I assigned to each data point a color value. There are 153 legend > items also (one color > > won't be used). > > > > ^CTraceback (most recent call last): > > ... > > _figure.savefig(filename, dpi=100) > > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", > line 1421, in savefig > > self.canvas.print_figure(*args, **kwargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line > 2220, in print_figure > > **kwargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", > line 505, in print_png > > FigureCanvasAgg.draw(self) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", > line 451, in draw > > self.figure.draw(self.renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", > line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", > line 1034, in draw > > func(*args) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", > line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line > 2086, in draw > > a.draw(renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", > line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, > in draw > > return Collection.draw(self, renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", > line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, > in draw > > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), > > File > "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, > in get_edgecolor > > return self._edgecolors > > KeyboardInterrupt > > ^CError in atexit._run_exitfuncs: > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, > in destroy_all > > gc.collect() > > KeyboardInterrupt > > Error in sys.exitfunc: > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File > "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, > in destroy_all > > gc.collect() > > KeyboardInterrupt > > > > ^C > > > > > > Clues what is the code doing? I use mpl-1.3.0. > > Thank you, > > Martin > > > > > > Unfortunately, that stacktrace isn't very useful. There is no recursion > there, but rather the perfectly normal drawing of the figure object that > has a child axes, which has child collections which have child artist > objects. > > > > Without the accompanying code, it would be difficult to determine where > the memory hog is. > > Could there be places where gc.collect() could be introduced? Are there > places where matplotlib > could del() unnecessary objects right away? I think the problem is with > huge lists or pythonic > dicts. I could save 10GB of RAM when I converted one python dict to a > bsddb3 file having just > 10MB on disk. I speculate matplotlib in that code keeps the data in some > huge list or more likely > a dict and that is the same issue. > > Are you sure you cannot see where a problem is? It happens (is visible) > only with huge number of > dots, of course. > > I am not going to claim that matplotlib is the most lean graphing library out there, and we already do know where we can make continued improvements, but the symptom you are describing (50 GB for a couple hundred thousand scatter points) is just unheard of for matplotlib. Without a simple, concise, complete code example to demonstrate your problem, we can only hazard guesses. For all I know, you might be "appending" to numpy arrays in a loop prior to plotting, which would eat up significant amount of memory without it being the fault of matplotlib. As far as I am aware, we don't do very large dictionaries, so I am doubtful that is the issue either. As a side note, I have typically found that situations where del() significantly improved memory usage were typically situations where I was "doing it wrong" in the first place and a simple refactor of the code improved memory and (sometimes) speed, with an added benefit of improved readability. I have even seen situations where calling del() in the wrong places (say, for a list created at the beginning of the loop) actually hurt performance because python couldn't recycle that chunk of memory. Give us a code example that reproduces your problem, and then we can start doing some more serious debugging. Ben Root > Thanks, > Martin > |
From: Martin M. <mmo...@gm...> - 2013-10-10 14:17:02
|
Benjamin Root wrote: > > On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: > > Benjamin Root wrote: > > > > > > > > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto:mmo...@gm...>>> wrote: > > > > Hi, > > rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace > > of one such situation when it already took 15GB. Would somebody comments on what is > > matplotlib doing at the very moment? Why the recursion? > > > > The charts had to have 262422 data points in a 2D scatter plot, each point has assigned > > its own color. They are in batches so that there are 153 distinct colors but nevertheless, > > I assigned to each data point a color value. There are 153 legend items also (one color > > won't be used). > > > > ^CTraceback (most recent call last): > > ... > > _figure.savefig(filename, dpi=100) > > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig > > self.canvas.print_figure(*args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure > > **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png > > FigureCanvasAgg.draw(self) > > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw > > self.figure.draw(self.renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw > > func(*args) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw > > a.draw(renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw > > return Collection.draw(self, renderer) > > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper > > draw(artist, renderer, *args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw > > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), > > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor > > return self._edgecolors > > KeyboardInterrupt > > ^CError in atexit._run_exitfuncs: > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > > gc.collect() > > KeyboardInterrupt > > Error in sys.exitfunc: > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all > > gc.collect() > > KeyboardInterrupt > > > > ^C > > > > > > Clues what is the code doing? I use mpl-1.3.0. > > Thank you, > > Martin > > > > > > Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. > > > > Without the accompanying code, it would be difficult to determine where the memory hog is. > > Could there be places where gc.collect() could be introduced? Are there places where matplotlib > could del() unnecessary objects right away? I think the problem is with huge lists or pythonic > dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just > 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely > a dict and that is the same issue. > > Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of > dots, of course. > > > I am not going to claim that matplotlib is the most lean graphing library out there, and we already do know where we can make continued improvements, but the symptom you are describing (50 GB for a couple hundred thousand scatter points) is just unheard of for matplotlib. Without a simple, concise, complete code example to demonstrate your problem, we can only hazard guesses. For all I know, you might be "appending" to numpy arrays in a loop prior to plotting, which would eat up significant amount of memory without it being the fault of matplotlib. > > As far as I am aware, we don't do very large dictionaries, so I am doubtful that is the issue either. > > As a side note, I have typically found that situations where del() significantly improved memory usage were typically situations where I was "doing it wrong" in the first place and a simple refactor of the code improved memory and (sometimes) speed, with an added benefit of improved readability. I have even seen situations where calling del() in the wrong places (say, for a list created at the beginning of the loop) actually hurt performance because python couldn't recycle that chunk of memory. > > Give us a code example that reproduces your problem, and then we can start doing some more serious debugging. Should be in your Inboxes now. I have to rush for a meeting now, so there was no example call to that function with sample data, but hope I wrote already enough as I knew number of dots and legends to be drawn. Yeah, the number of columns is determined elsewhere, put 2 as a value into that variable. Surely one can rewrite the code, but ideally I would also propose that matplotlib is improved so that others with similarly bad coding style do not hit the issue. ;) Thank you for your time, Martin |
From: Martin M. <mmo...@gm...> - 2013-10-10 22:10:34
|
Hi Ben, thank you for your comments. Looks I will have a bad sleep tonight. :( Some quick answers below. Benjamin Root wrote: > > > > On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom <md...@st... <mailto:md...@st...>> wrote: > > Thanks. This is much more helpful. > > What we need, however, is a "self contained, standalone example". The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong. > > Mike > > > That being said, I do see a number of anti-patterns here that could be significant. For example: > > for _x, _y, _c in izip(mydata_x, mydata_y, colors): > # _Line2D = _ax1.plot(_x, _y) # returns Line2D object > _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object > _series.append(_my_PathCollection) > > Could be more concisely written as: > > _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c in izip(mydata_x, mydata_y, colors)] > > Python can then more intelligently handle memory management by intelligently allocating the memory for _series. You can then use _series.extend() for when you are doing the scatter plots for _ax2 with a similar list comprehension (or even a generator statement). You are right the .append() is ugly, maybe is a the real source of troubles. I somehow do not understand myself right now why under the "if legends:" use ax1 instead of ax2. Weird. I actually stopped using legends with this function because that was my first guess that they cause the memory issues. Seems the culprit is elsewhere so I should add them back and likely fix the ax2 vs. ax1 copy/paste (most likely) error. As you could have seen, I used in the past label=_l but for some reason I switched away to the current ugly code. Will try to find out why I did that. Hmm, I don't know what you mean with _series.extend() at the moment, will read some python Intro on using lists. :( > > I would also question the need to store _series in the first place. You use it for the call to legend, but you could have simply passed a label to each call of scatter as well. As I said, I used that in the past but somehow that did not work. Maybe time to re-try that. > > Some other things of note: > > 1) The clear() call here is completely useless as the figure is already clear. > _figure = pylab.figure() > _figure.clear() Right, I was just trying to ensure everything is cleared. I somewhat suspect python garbage collector does not recycle too often, and therefore added more and more del() and gc.collect() calls. > > 2) When limits are set on an axis, autoscaling for that axis is automatically turned off anyway, so no need to turn if off yourself (also not sure why you are calling out to an external function here): > _ax1.set_autoscale_on(False) > set_limits(_ax1, xmin, xmax, ymin, ymax) The set_limits() is called because I got unstable coordinates in every figure. Sometimes, matplotlib used wider offset from the axes line while sometimes not. So, I basically force same layout for expected layouts. > > 3) Finally, some discussion on the end of your function here: > if legends: > _figure.savefig(filename, dpi=100) #, bbox_inches='tight') > del(_my_PathCollection) > del(_ax2) > else: > _figure.savefig(filename, dpi=100) > > del(_series) > del(_ax1) > _figure.clear() > del(_figure) > pylab.clf() > pylab.close() > first, as discussed, you can easily eliminate the need for _my_PathCollection and possibly even _series. Second, when calling _figure.clear(), all of its axes objects are deleted for you, so you don't need to delete them yourself. Third, you delete the _figure object, but then call "pylab.clf()". I haven't double-checked exactly what would happen, but I think you might run the risk of accidentially clearing some other existing figure by doing that. Lastly, you then call pylab.close(), which I point out the same caveat as before. Really, all you needed was pylab.close() and you can eliminate the 5 preceding lines and the other two del()'s. All del() really does is remove the variable out of scope. Once that object is out of everybody's scope, then the gc can clean it up. Since the function was ending anyway, there is no point in deleting the variable. Right, but I suspect that garbage collector does not recycle quickly enough unused objects after the function is left. If I generate many figure sin a loop, one after another, it appeared to me helpful to interleave the function calls with the gc.collect() calls. > > I don't know if this would fix your problem, and there are a bunch of other style issues here (particularly, pylab really shouldn't be used this way), but hopefully this gives some food for thought. I think I will start tomorrow finishing up the broken testcase so that we can be sure where was the culprit. Then should improve the function as you proposed. I am not sure some places what you really mean but will resolve it hopefully. I was thinking about submitting several other functions like this one for discussion and improvement, so that so that such wrapper functions could be included in matplotlib. I am sure you would not like the many function argument and would prefer kwargs instead, but something have same API would be helpful if I want to switch easily between scatter, histplot, piechart. Actually, the hist2d substring in this function name is a remnant of my attempts to do 2d charts but I did not take that route in the end. Just in case you were puzzled by the function name. ;) Thank you, Martin > > Cheers! > Ben Root |
From: Martin M. <mmo...@gm...> - 2013-10-12 16:56:38
Attachments:
eatmem.py
|
Hi, so here is some quick but working example. I added there are 2-3 functions (unused) as a bonus, you can easily call them from the main function using same API (except the piechart). I hope this shows what I lack in matplotlib - a general API so that I could easily switch form scatter plot to piechart or barchart without altering much the function arguments. Messing with return objects line2D, PathCollection, Rectangle is awkward and I would like to stay away from matplotlib's internals. ;) Some can be sliced, so not, you will see in the code. This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible with 16GB of RAM. While the example is for sure inefficient in many places generating the data in python does not eat RAM. That happens afterwards. I would really like to hear whether matplotlib could be adjusted instead. ;) I already mentioned in this thread that it is awkward to pre-create colors before passing all data to a drawing function. I think we could all save a lot if matplotlib could dynamically fetch colors on the fly from user-created generator, same for legends descriptions. I think my example code shows the inefficient approach here. Would I have more time I would randomize a bit more the sublist of each series so that the numbers in legends would be more variable but that is a cosmetic issue. Probably due to my ignorance you will see that figures with legends have different font sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both approaches but failed badly. The files/figures with legends should be just accompanied by the legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings but not only. I placed some comments in the code, please don't take them in person. ;) Of course I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp this ugly code into matplotlib testsuite, provide similar function (the API mentioned above) so that I could use your code directly. That would be great. I just tried to show multiple issues at once, notably that is why I included those unused functions. You will for sure find a way to use them. Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because the function is not always left soon enough. I could drop some, you are right, but for some I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference so ... go and test this lousy code. Now you have a testcase. ;) Same with the gc.collect() calls. Actually, the main loop with 10 iteration is there just to show why I always want to clear a figure when entering a function and while leaving it as well. It happened too many times that I drawed over an old figure, and this was posted also few times on this list by others. That is a weird behavior in my opinion. We, users, are just forced to use too low-level functions. So, have fun eating your memory! :)) Martin |
From: Michael D. <md...@st...> - 2013-10-14 17:20:41
|
Sorry to repeat myself, but please reduce this to a short, self contained example, that is absolutely minimal to demonstrate the problem. http://sscce.org/ should help better explain what I'm after. I don't want to find the needle in the haystack here -- there is code in your example that doesn't even run, for example. That said, are you really after creating a legend entry for each of the dots? (See below). That just isn't going to work, and I'm not surprised it eats up excessive amounts of memory. I think you want (and can) reduce this to a single scatter call. _series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns PathCollection object Mike On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote: > Hi, > so here is some quick but working example. I added there are 2-3 functions (unused) > as a bonus, you can easily call them from the main function using same API > (except the piechart). I hope this shows what I lack in matplotlib - a general API > so that I could easily switch form scatter plot to piechart or barchart without altering > much the function arguments. Messing with return objects line2D, PathCollection, Rectangle > is awkward and I would like to stay away from matplotlib's internals. ;) Some can be sliced, > so not, you will see in the code. > > This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible > with 16GB of RAM. While the example is for sure inefficient in many places generating the data > in python does not eat RAM. That happens afterwards. > > I would really like to hear whether matplotlib could be adjusted instead. ;) I already mentioned > in this thread that it is awkward to pre-create colors before passing all data to a drawing > function. I think we could all save a lot if matplotlib could dynamically fetch colors > on the fly from user-created generator, same for legends descriptions. I think my example > code shows the inefficient approach here. Would I have more time I would randomize a bit > more the sublist of each series so that the numbers in legends would be more variable > but that is a cosmetic issue. > Probably due to my ignorance you will see that figures with legends have different font > sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both > approaches but failed badly. The files/figures with legends should be just accompanied by the > legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings > but not only. > > I placed some comments in the code, please don't take them in person. ;) Of course > I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp > this ugly code into matplotlib testsuite, provide similar function (the API mentioned above) > so that I could use your code directly. That would be great. I just tried to show multiple > issues at once, notably that is why I included those unused functions. You will for sure find > a way to use them. > > Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because > the function is not always left soon enough. I could drop some, you are right, but for some > I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference > so ... go and test this lousy code. Now you have a testcase. ;) Same with the gc.collect() calls. > Actually, the main loop with 10 iteration is there just to show why I always want to clear > a figure when entering a function and while leaving it as well. It happened too many times that > I drawed over an old figure, and this was posted also few times on this list by others. That is > a weird behavior in my opinion. We, users, are just forced to use too low-level functions. > > So, have fun eating your memory! :)) > Martin -- _ |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | http://www.droettboom.com |
From: Martin M. <mmo...@gm...> - 2013-10-14 22:47:08
|
Michael Droettboom wrote: > Sorry to repeat myself, but please reduce this to a short, self contained example, that is absolutely minimal to demonstrate the problem. http://sscce.org/ should help better explain what I'm after. I don't want to find the needle in the haystack here -- there is code in your example that doesn't even run, for example. > > That said, are you really after creating a legend entry for each of the dots? (See below). That just isn't going to work, and I'm not surprised it eats up excessive amounts of memory. I think you want (and can) reduce this to a single scatter call. > > _series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns PathCollection object Are you sure? I think it was concluded on this list that scatter cannot (or does not) take nested lists of lists with series like histogram and piechart do. I cannot find the thread but maybe you are more lucky. I even think that I already opened a bugreport/feature requested in the past for this. But maybe not. Martin > > Mike > > On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote: >> Hi, >> so here is some quick but working example. I added there are 2-3 functions (unused) >> as a bonus, you can easily call them from the main function using same API >> (except the piechart). I hope this shows what I lack in matplotlib - a general API >> so that I could easily switch form scatter plot to piechart or barchart without altering >> much the function arguments. Messing with return objects line2D, PathCollection, Rectangle >> is awkward and I would like to stay away from matplotlib's internals. ;) Some can be sliced, >> so not, you will see in the code. >> >> This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible >> with 16GB of RAM. While the example is for sure inefficient in many places generating the data >> in python does not eat RAM. That happens afterwards. >> >> I would really like to hear whether matplotlib could be adjusted instead. ;) I already mentioned >> in this thread that it is awkward to pre-create colors before passing all data to a drawing >> function. I think we could all save a lot if matplotlib could dynamically fetch colors >> on the fly from user-created generator, same for legends descriptions. I think my example >> code shows the inefficient approach here. Would I have more time I would randomize a bit >> more the sublist of each series so that the numbers in legends would be more variable >> but that is a cosmetic issue. >> Probably due to my ignorance you will see that figures with legends have different font >> sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both >> approaches but failed badly. The files/figures with legends should be just accompanied by the >> legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings >> but not only. >> >> I placed some comments in the code, please don't take them in person. ;) Of course >> I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp >> this ugly code into matplotlib testsuite, provide similar function (the API mentioned above) >> so that I could use your code directly. That would be great. I just tried to show multiple >> issues at once, notably that is why I included those unused functions. You will for sure find >> a way to use them. >> >> Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because >> the function is not always left soon enough. I could drop some, you are right, but for some >> I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference >> so ... go and test this lousy code. Now you have a testcase. ;) Same with the gc.collect() calls. >> Actually, the main loop with 10 iteration is there just to show why I always want to clear >> a figure when entering a function and while leaving it as well. It happened too many times that >> I drawed over an old figure, and this was posted also few times on this list by others. That is >> a weird behavior in my opinion. We, users, are just forced to use too low-level functions. >> >> So, have fun eating your memory! :)) >> Martin > > > -- > _ > |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ > | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | > > http://www.droettboom.com > -- Martin Mokrejs, Ph.D. Bioinformatics Donovalska 1658 149 00 Prague Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs |
From: Daniele N. <da...@gr...> - 2013-10-14 23:56:21
|
On 10/10/2013 15:05, Martin MOKREJŠ wrote: > Hi, > rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace > of one such situation when it already took 15GB. Would somebody comments on what is > matplotlib doing at the very moment? Why the recursion? > > The charts had to have 262422 data points in a 2D scatter plot, each point has assigned > its own color. They are in batches so that there are 153 distinct colors but nevertheless, > I assigned to each data point a color value. There are 153 legend items also (one color > won't be used). Hello Martin, can I ask what is the meaning of plotting a scatter plot with 200 thousands points in it? Either you visualize it on a screen much larger than mine, or you are not going to be able to distinguish the single data points. Maybe you should rethink the visualization tool you are using. Nevertheless, I'm perfectly able to plot a scatter plot with 262422 data points each with its own color just fine, and the python process consumes a few hundred Mb of ram (having quite a few other datasets loaded in memory):: import numpy as np import matplotlib.pyplot as plt n = 262422 x = np.random.rand(n) y = np.random.rand(n) c = np.random.rand(n) f = plt.figure() a = f.add_subplot(111) a.scatter(x, y, c=c, s=50) plt.show() and a possible solution using exactly 153 different colors, but again, I don't see how you can distinguish between hundreds different shades of colors:: n = 262422 #22 ncolors = 153 x = np.random.rand(n) y = np.random.rand(n) c = np.random.rand(ncolors) f = plt.figure() a = f.add_subplot(111) for i in xrange(n // ncolors): a.scatter(x[i*ncolors:(i+1)*ncolors], y[i*ncolors:(i+1)*ncolors], c=c, s=50) plt.show() Unfortunately the code you provide is too contrived to be useful to understand the root cause of your problem. Cheers, Daniele |