I've narrowed it down to this little script, haven't dug into it yet but the smoking gun is the new function compilation stuff
from werkzeug.routing import Map, Rule
def main():
while True:
Map([Rule('/a/<string:b>')])
if __name__ == '__main__':
exit(main())
@edk0
Seems you're saying that rules aren't being GC'd correctly?
I'm not clear under what real circumstances this would happen. Typically you define a set of rules and then use them, they're not arbitrarily created and deleted.
our usage here involves a set of redirects defined in a configuration file, this configuration file changes periodically as we add or remove vanity routes
instead of deploying the application each time we want to add a vanity route, we simply update the configuration file and the application rebuilds a route mapping (which is used by a flask
app to serve redirects)
Hmm, that's usually discouraged because changes to the map aren't synchronized in multiprocess workers. I'd probably implement it instead as an error handler for 404, to check if a redirect should be returned instead. Not saying it shouldn't be fixed, just that it's not a use case I'd heard of.
each individual worker checks the configuration file periodically and reloads it -- it isn't being injected directly into the flask app as far as I can tell
either way, these objects probably shouldn't leak :laughing: -- I'm looking into what could cause that -- I suspect either the functions being compiled have issue or the hashing of those (since the Map
seems to also be involved in some way)
Of note, this leak doesn't happen without the <string:b>
portion
here's the disassembly of the function objects it creates:
>>> x = Rule('/a/<string:b>')
>>> from werkzeug.routing import Map
>>> y = Map([x])
>>> x._build
<function <builder:'/a/<string:b>'> at 0x7f16d62d1730>
>>> import dis
>>> dis.dis(x._build)
1 0 LOAD_CONST 0 ('')
2 LOAD_CONST 1 ('/a/')
4 LOAD_CONST 2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
6 LOAD_FAST 0 (b)
8 CALL_FUNCTION 1
10 BUILD_STRING 2
12 BUILD_TUPLE 2
14 RETURN_VALUE
>>> dis.dis(x._build_unknown)
1 0 LOAD_CONST 0 ('')
2 LOAD_CONST 1 ('/a/')
4 LOAD_CONST 2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
6 LOAD_FAST 0 (b)
8 CALL_FUNCTION 1
10 LOAD_FAST 1 (.keyword_arguments)
12 JUMP_IF_TRUE_OR_POP 20
14 LOAD_CONST 0 ('')
16 DUP_TOP
18 JUMP_FORWARD 10 (to 30)
>> 20 LOAD_CONST 3 (functools.partial(<function url_encode at 0x7f16d2d5d510>, charset='utf-8', sort=False, key=None))
22 ROT_TWO
24 CALL_FUNCTION 1
26 LOAD_CONST 4 ('?')
28 ROT_TWO
>> 30 BUILD_STRING 4
32 BUILD_TUPLE 2
34 RETURN_VALUE
Adjusting the script slightly:
import collections
import gc
import pprint
from werkzeug.routing import Map, Rule
def main():
for _ in range(10000):
Map([Rule('/a/<string:b>')])
for _ in range(5):
gc.collect()
counts = collections.Counter(type(o) for o in gc.get_objects())
pprint.pprint(counts.most_common(15))
if __name__ == '__main__':
exit(main())
it looks like it is leaking (at the very least, probably more in the other common types above it as well) the Map
, Rule
, as well as a functools.partial
and a UnicodeConverter
per call:
$ ./venv/bin/python t.py
[(<class 'dict'>, 62085),
(<class 'list'>, 50514),
(<class 'function'>, 24085),
(<class 'tuple'>, 21811),
(<class 'method'>, 20032),
(<class 'set'>, 10518),
(<class 'functools.partial'>, 10002),
(<class 'werkzeug.routing.UnicodeConverter'>, 10000),
(<class 'werkzeug.routing.Map'>, 10000),
(<class 'werkzeug.routing.Rule'>, 10000),
(<class 'weakref'>, 1306),
(<class 'wrapper_descriptor'>, 1131),
(<class 'method_descriptor'>, 879),
(<class 'builtin_function_or_method'>, 839),
(<class 'getset_descriptor'>, 740)]
here's some graphs of the things keeping this alive in gc:
def graph(obj, ids, *, seen=None, indent='', limit=10):
if seen is None:
seen = set()
for referrer in gc.get_referrers(obj):
# the main frame which has a hard reference to ths object
if (
type(referrer).__name__ == 'frame' and
referrer.f_globals['__name__'] == '__main__'
):
continue
# objects only present due to traversal of gc referrers
elif id(referrer) not in ids:
continue
elif id(referrer) in seen:
print(f'{indent}(already seen) {id(referrer)}')
continue
seen.add(id(referrer))
if indent == '':
print('=' * 79)
print(f'{indent}type: {type(referrer).__name__} ({id(referrer)})')
fmted = repr(referrer) #pprint.pformat(referrer)
print(indent + fmted.replace('\n', f'\n{indent}'))
if limit:
graph(
referrer, ids,
seen=seen, indent='==' + indent, limit=limit - 1,
)
...
ids = {id(o) for o in gc.get_objects()}
obj = next(iter(o for o in gc.get_objects() if isinstance(o, Map)))
graph(obj, ids)
===============================================================================
type: dict (140272699432680)
{'map': Map([<Rule '/a/<b>' -> None>]), 'regex': '[^/]{1,}'}
==type: UnicodeConverter (140272699175656)
==<werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>
====type: method (140272750734216)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726348928)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>)
====type: method (140272749846664)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726372424)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>, functools.partial(<function url_encode at 0x7f93c877eae8>, charset='utf-8', sort=False, key=None), '?')
====type: dict (140272723298056)
===={'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}
======type: dict (140272749460432)
======{'rule': '/a/<string:b>', 'is_leaf': True, 'map': Map([<Rule '/a/<b>' -> None>]), 'strict_slashes': True, 'subdomain': '', 'host': None, 'defaults': None, 'build_only': False, 'alias': False, 'methods': None, 'endpoint': None, 'redirect_to': None, 'arguments': {'b'}, '_trace': [(False, '|'), (False, '/a/'), (True, 'b')], '_converters': {'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}, '_regex': re.compile('^\\|\\/a\\/(?P<b>[^/]{1,})$'), '_argument_weights': [100], '_static_weights': [(0, -1)], '_build': <function <builder:'/a/<string:b>'> at 0x7f93c9d880d0>, '_build_unknown': <function <builder:'/a/<string:b>'> at 0x7f93c9d88158>}
========type: Rule (140272749480984)
========<Rule '/a/<b>' -> None>
==========type: list (140272699123848)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280736)
============{'_rules': [<Rule '/a/<b>' -> None>], '_rules_by_endpoint': {None: [<Rule '/a/<b>' -> None>]}, '_remap': False, '_remap_lock': <unlocked _thread.lock object at 0x7f93cb6d9da0>, 'default_subdomain': '', 'charset': 'utf-8', 'encoding_errors': 'replace', 'strict_slashes': True, 'redirect_defaults': True, 'host_matching': False, 'converters': {'default': <class 'werkzeug.routing.UnicodeConverter'>, 'string': <class 'werkzeug.routing.UnicodeConverter'>, 'any': <class 'werkzeug.routing.AnyConverter'>, 'path': <class 'werkzeug.routing.PathConverter'>, 'int': <class 'werkzeug.routing.IntegerConverter'>, 'float': <class 'werkzeug.routing.FloatConverter'>, 'uuid': <class 'werkzeug.routing.UUIDConverter'>}, 'sort_parameters': False, 'sort_key': None}
==============type: Map (140272749480872)
==============Map([<Rule '/a/<b>' -> None>])
================(already seen) 140272699432680
================(already seen) 140272749460432
==========type: list (140272700110792)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280808)
============{None: [<Rule '/a/<b>' -> None>]}
==============(already seen) 140272726280736
(already seen) 140272749460432
This is sufficient to break the cycle and eliminate the memory leak:
diff --git a/src/werkzeug/routing.py b/src/werkzeug/routing.py
index c7cff94d..8176ddfe 100644
--- a/src/werkzeug/routing.py
+++ b/src/werkzeug/routing.py
@@ -1254,13 +1254,13 @@ class BaseConverter(object):
weight = 100
def __init__(self, map):
- self.map = map
+ self.charset = map.charset
def to_python(self, value):
return value
def to_url(self, value):
- return _fast_url_quote(text_type(value).encode(self.map.charset))
+ return _fast_url_quote(text_type(value).encode(self.charset))
class UnicodeConverter(BaseConverter):
though I believe the actual cause of the cycle is that LOAD_CONST
is being used in the code objects against objects which aren't themselves constants, I believe this confuses the gc (assuming that the referred code objects are live, when they are actually not)
via #1524
Most helpful comment
This is sufficient to break the cycle and eliminate the memory leak:
though I believe the actual cause of the cycle is that
LOAD_CONST
is being used in the code objects against objects which aren't themselves constants, I believe this confuses the gc (assuming that the referred code objects are live, when they are actually not)