Ipython: Memory leak with %matplotlib inline

Created on 19 Dec 2014  ·  23Comments  ·  Source: ipython/ipython

Hey everyone

I've found a problem. Just launch the code and look at the memory. Then delete "%matplotlib inline" and launch again.

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150

OUTPUT_FILENAME = "Asd"

def printHTML(html):
    with open(OUTPUT_FILENAME, "a") as outputFile: outputFile.write(html if type(html) == str else html.encode('utf8') )

def friendlyPlot():

    figure = plt.Figure()
    ax = plt.subplot2grid((1,2), (0,0))

    ax.plot( range(1000), range(1000) )


    #plt.show() 
    fig = plt.gcf()

    imgdata = StringIO.StringIO()
    fig.savefig(imgdata, format='png')
    imgdata.seek(0)  # rewind the data
    image = imgdata.buf.encode('base64').replace('\n', '')
    printHTML('<img src="data:image/png;base64,{0}" /><br />'.format(image))
    plt.close('all')
    imgdata.close()

open(OUTPUT_FILENAME, 'w').close()

for i in range(500):
    friendlyPlot()
bug matplotlib

Most helpful comment

I'll second that a fix on this issue would be appreciated.

All 23 comments

I hit this bug as well, is there any way to get inline plots without memory leaks? I do not want to launch separate processes for each plot, since the arrays are quite large.

Can you check this when memory usage increases:

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)

That's a list where figures are being stored. They should be there only temporarily, but maybe they're building up without getting cleared.

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)=0

BTW, I'm plotting using .plot() method on pandas dataframes.

OK, so much for that theory.

It's possible pandas keeps some data around plots internally as well. The original report doesn't involve pandas, though.

How much memory does each additional plot appear to add?

ok, this seems to be my case, I was using pandas 0.16.0, but the issue is fixed in master:

https://github.com/pydata/pandas/pull/9814

Great, thanks. Leaving open since the original report didn't involve pandas.

This can be reproduced more simply:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

This does not leak memory so it is something on the IPython side not the pyplot side (I think).

import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker



import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

@tacaswell With your test code IPython on Windows 7 consumes here approximately 1.7GB which are not freed afterwards. Running with a slightly higher number of iterations leads to a memory error. So this is still an issue.

@asteppke The first or second block?

@tacaswell With your first test code (%matplotlib inline) memory consumption goes up to 1.7GB. In contrast when using the second piece (matplotlib.use('agg')) memory usage oscillates only between 50MB and 100MB.

Both tests are executed with Python 3.4 and IPython notebook version 4.0.5.

I've played with this a bit more. I notice that if I re-run the for loop in @tacaswell's example a few times, memory usage doesn't increase - it seems to be the number you create in a single cell that matters. IPython certainly keeps a list of all the figures generated in the cell for the inline backend, but that list is quite definitely being cleared after the cell runs, which doesn't make memory usage drop, even after doing gc.collect().

Could our code be interacting badly with something in matplotlib? I thought _pylab_helpers.Gcf looked likely, but it doesn't seem to be holding on to anything.

I tried grabbing a reference to one of the figures and calling gc.get_referrers() on it; apart from the reference I had in user_ns, all the others looked like mpl objects - presumably many of them are in reference loops. What object is it most likely something else would be inappropriately keeping a reference to?

I'm dropping this to milestone 'wishlist'. We want to fix it, but at the moment we're not sure how to make further progress in identifying the bug, and I don't think it's worth holding up releases for it.

Anyone who can make progress gets brownie points. Also cake.

Not really progress, but the memory seems to be lost somewhere inside the kernel. Neither does calling gc.collect() after or inside the loop help, and summary.print_(summary.summarize(muppy.get_objects())) doesn't find any of the leaked memory. Neither does setting all _N and _iN to None help. It's really mysterious.

I also wondered if it was creating uncollectable objects, but those should end up in gc.garbage when there are no other references to them, and that's still empty when I see it using up loads of RAM.

I think someone who knows about these things is going to have to use C-level tools to track down what memory is not getting freed. There's no evidence of extra Python objects being kept alive anywhere we can find.

I'll second that a fix on this issue would be appreciated.

We know, but at present no-one has worked out the cause of the bug.

+1

+1

BTW, I'm still hitting this issue from time to time on latest matplotlib, pandas, jupyter, ipython. If anyone knows any debugger that can help to troubleshoot this multi-process communication, then please let me know.

Could it perhaps have anything to do with the browser cache mechanism?

Good thought, but I don't think so. It's IPython's process taking up extra memory, not the browser, and
@tacaswell's reproduction doesn't involve sending plots to the browser.

Hi, I believe I have found part of the culprit and a way to significantly, but not completely, reduce this problem!

After scrolling through the ipykernel/pylab/backend_inline.py code, I got the hunch that interactive mode does a lot of storing of "plot-things", though I don't understand it completely, so I am not able to pinpoint the exact reason with certainty.

Here is the code to verify this (based on @tacaswell's snippet above), useful for anyone trying to implement a fix.

Initialization:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (24, 6)
matplotlib.rcParams['figure.dpi'] = 150

from resource import getrusage
from resource import RUSAGE_SELF

def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')

Actual test:

print("before any:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
friendlyPlot()
print("before loop: {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
for i in range(50):
    friendlyPlot()
print("after loop:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
import gc ; gc.collect(2)
print("after gc:    {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))

Running it for 50 iterations of the loop, I get:

before any:    87708 kB
before loop:  106772 kB
after loop:   786668 kB
after gc:     786668 kB

Running it for 200 iterations of the loop, I get:

before any:    87708 kB
before loop:  100492 kB
after loop:  2824316 kB
after gc:    2824540 kB

which shows the almost linear increase in memory with iterations.

Now to the fix/workaround: call matplotlib.interactive(False) before the test-snippet, and then run it.

With 50 iterations:

before any:    87048 kB
before loop:  104992 kB
after loop:   241604 kB
after gc:     241604 kB

And with 200 iterations:

before any:    87536 kB
before loop:  103104 kB
after loop:   239276 kB
after gc:     239276 kB

Which confirms that only a constant increase (independent of iterations) is left.

Using these numbers, I make a rough estimate of the leak size per iteration:

(786668-(241604 - 104992))/50   = 13001.12
(2824316-(241604 - 104992))/200 = 13438.52

And for a single iteration of the loop, I get 13560. So the amount of leak per iteration is significantly smaller than the image size, be it raw (>3MB) or png-compressed (54KB).

Also, strangely, running a small-scale test (only few iterations) repeatedly in the same cell without restarting the kernel is much less consistent, I have not been able to understand this or determine a pattern.

I hope someone with more knowledge of the internals can take it from here, as I lack the time and knowledge to dive deeper into it right now.

it works

Was this page helpful?
0 / 5 - 0 ratings