Werkzeug: 0.15.x 上的失控内存使用

创建于 2019-04-23  ·  11评论  ·  资料来源: pallets/werkzeug

我已经把它缩小到这个小脚本,还没有深入研究,但吸烟枪是新的函数编译的东西

from werkzeug.routing import Map, Rule


def main():
    while True:
        Map([Rule('/a/<string:b>')])


if __name__ == '__main__':
    exit(main())
bug routing

最有用的评论

这足以打破循环并消除内存泄漏:

diff --git a/src/werkzeug/routing.py b/src/werkzeug/routing.py
index c7cff94d..8176ddfe 100644
--- a/src/werkzeug/routing.py
+++ b/src/werkzeug/routing.py
@@ -1254,13 +1254,13 @@ class BaseConverter(object):
     weight = 100

     def __init__(self, map):
-        self.map = map
+        self.charset = map.charset

     def to_python(self, value):
         return value

     def to_url(self, value):
-        return _fast_url_quote(text_type(value).encode(self.map.charset))
+        return _fast_url_quote(text_type(value).encode(self.charset))


 class UnicodeConverter(BaseConverter):

虽然我相信循环的实际原因是LOAD_CONST正在代码对象中用于对抗本身不是常量的对象,但我相信这会混淆 gc(假设引用的代码对象是活动的,当他们实际上不是)

所有11条评论

@edk0

似乎你是说规则没有被正确地 GC 处理?

我不清楚在什么实际情况下会发生这种情况。 通常,您定义一组规则然后使用它们,它们不会被随意创建和删除。

我们在这里的用法涉及配置文件中定义的一组重定向,当我们添加或删除虚路由时,此配置文件会定期更改

不是每次我们想要添加虚路由时都部署应用程序,我们只需更新配置文件,应用程序就会重建路由映射( flask应用程序使用它来提供重定向服务)

嗯,这通常是不鼓励的,因为对映射的更改在多进程工作器中不同步。 我可能会将它实现为 404 的错误处理程序,以检查是否应该返回重定向。 并不是说它不应该被修复,只是它不是我听说过的用例。

每个工作人员都会定期检查配置文件并重新加载它——据我所知,它并没有直接注入到 Flask 应用程序中

无论哪种方式,这些对象可能不应该泄漏 :laughing: -- 我正在调查可能导致这种情况的原因 -- 我怀疑正在编译的函数有问题或它们的散列(因为Map似乎也以某种方式参与)

值得注意的是,没有<string:b>部分就不会发生这种泄漏

这是它创建的函数对象的反汇编:

>>> x = Rule('/a/<string:b>')
>>> from werkzeug.routing import Map
>>> y = Map([x])
>>> x._build
<function <builder:'/a/<string:b>'> at 0x7f16d62d1730>
>>> import dis
>>> dis.dis(x._build)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 BUILD_STRING             2
             12 BUILD_TUPLE              2
             14 RETURN_VALUE
>>> dis.dis(x._build_unknown)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 LOAD_FAST                1 (.keyword_arguments)
             12 JUMP_IF_TRUE_OR_POP     20
             14 LOAD_CONST               0 ('')
             16 DUP_TOP
             18 JUMP_FORWARD            10 (to 30)
        >>   20 LOAD_CONST               3 (functools.partial(<function url_encode at 0x7f16d2d5d510>, charset='utf-8', sort=False, key=None))
             22 ROT_TWO
             24 CALL_FUNCTION            1
             26 LOAD_CONST               4 ('?')
             28 ROT_TWO
        >>   30 BUILD_STRING             4
             32 BUILD_TUPLE              2
             34 RETURN_VALUE

稍微调整脚本:

import collections
import gc
import pprint
from werkzeug.routing import Map, Rule


def main():
    for _ in range(10000):
        Map([Rule('/a/<string:b>')])
    for _ in range(5):
        gc.collect()
    counts = collections.Counter(type(o) for o in gc.get_objects())
    pprint.pprint(counts.most_common(15))


if __name__ == '__main__':
    exit(main())

看起来它正在泄漏(至少,在它上面的其他常见类型中可能更多) MapRule ,以及functools.partial和每次调用UnicodeConverter

$ ./venv/bin/python t.py
[(<class 'dict'>, 62085),
 (<class 'list'>, 50514),
 (<class 'function'>, 24085),
 (<class 'tuple'>, 21811),
 (<class 'method'>, 20032),
 (<class 'set'>, 10518),
 (<class 'functools.partial'>, 10002),
 (<class 'werkzeug.routing.UnicodeConverter'>, 10000),
 (<class 'werkzeug.routing.Map'>, 10000),
 (<class 'werkzeug.routing.Rule'>, 10000),
 (<class 'weakref'>, 1306),
 (<class 'wrapper_descriptor'>, 1131),
 (<class 'method_descriptor'>, 879),
 (<class 'builtin_function_or_method'>, 839),
 (<class 'getset_descriptor'>, 740)]

这是在 gc 中保持它活着的一些图表:

def graph(obj, ids, *, seen=None, indent='', limit=10):
    if seen is None:
        seen = set()

    for referrer in gc.get_referrers(obj):
        # the main frame which has a hard reference to ths object
        if (
                type(referrer).__name__ == 'frame' and
                referrer.f_globals['__name__'] == '__main__'
        ):
            continue
        # objects only present due to traversal of gc referrers
        elif id(referrer) not in ids:
            continue
        elif id(referrer) in seen:
            print(f'{indent}(already seen) {id(referrer)}')
            continue

        seen.add(id(referrer))

        if indent == '':
            print('=' * 79)
        print(f'{indent}type: {type(referrer).__name__} ({id(referrer)})')
        fmted = repr(referrer)  #pprint.pformat(referrer)
        print(indent + fmted.replace('\n', f'\n{indent}'))

        if limit:
            graph(
                referrer, ids,
                seen=seen, indent='==' + indent, limit=limit - 1,
            )

...

    ids = {id(o) for o in gc.get_objects()}
    obj = next(iter(o for o in gc.get_objects() if isinstance(o, Map)))
    graph(obj, ids)
===============================================================================
type: dict (140272699432680)
{'map': Map([<Rule '/a/<b>' -> None>]), 'regex': '[^/]{1,}'}
==type: UnicodeConverter (140272699175656)
==<werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>
====type: method (140272750734216)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726348928)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>)
====type: method (140272749846664)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726372424)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>, functools.partial(<function url_encode at 0x7f93c877eae8>, charset='utf-8', sort=False, key=None), '?')
====type: dict (140272723298056)
===={'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}
======type: dict (140272749460432)
======{'rule': '/a/<string:b>', 'is_leaf': True, 'map': Map([<Rule '/a/<b>' -> None>]), 'strict_slashes': True, 'subdomain': '', 'host': None, 'defaults': None, 'build_only': False, 'alias': False, 'methods': None, 'endpoint': None, 'redirect_to': None, 'arguments': {'b'}, '_trace': [(False, '|'), (False, '/a/'), (True, 'b')], '_converters': {'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}, '_regex': re.compile('^\\|\\/a\\/(?P<b>[^/]{1,})$'), '_argument_weights': [100], '_static_weights': [(0, -1)], '_build': <function <builder:'/a/<string:b>'> at 0x7f93c9d880d0>, '_build_unknown': <function <builder:'/a/<string:b>'> at 0x7f93c9d88158>}
========type: Rule (140272749480984)
========<Rule '/a/<b>' -> None>
==========type: list (140272699123848)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280736)
============{'_rules': [<Rule '/a/<b>' -> None>], '_rules_by_endpoint': {None: [<Rule '/a/<b>' -> None>]}, '_remap': False, '_remap_lock': <unlocked _thread.lock object at 0x7f93cb6d9da0>, 'default_subdomain': '', 'charset': 'utf-8', 'encoding_errors': 'replace', 'strict_slashes': True, 'redirect_defaults': True, 'host_matching': False, 'converters': {'default': <class 'werkzeug.routing.UnicodeConverter'>, 'string': <class 'werkzeug.routing.UnicodeConverter'>, 'any': <class 'werkzeug.routing.AnyConverter'>, 'path': <class 'werkzeug.routing.PathConverter'>, 'int': <class 'werkzeug.routing.IntegerConverter'>, 'float': <class 'werkzeug.routing.FloatConverter'>, 'uuid': <class 'werkzeug.routing.UUIDConverter'>}, 'sort_parameters': False, 'sort_key': None}
==============type: Map (140272749480872)
==============Map([<Rule '/a/<b>' -> None>])
================(already seen) 140272699432680
================(already seen) 140272749460432
==========type: list (140272700110792)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280808)
============{None: [<Rule '/a/<b>' -> None>]}
==============(already seen) 140272726280736
(already seen) 140272749460432

这足以打破循环并消除内存泄漏:

diff --git a/src/werkzeug/routing.py b/src/werkzeug/routing.py
index c7cff94d..8176ddfe 100644
--- a/src/werkzeug/routing.py
+++ b/src/werkzeug/routing.py
@@ -1254,13 +1254,13 @@ class BaseConverter(object):
     weight = 100

     def __init__(self, map):
-        self.map = map
+        self.charset = map.charset

     def to_python(self, value):
         return value

     def to_url(self, value):
-        return _fast_url_quote(text_type(value).encode(self.map.charset))
+        return _fast_url_quote(text_type(value).encode(self.charset))


 class UnicodeConverter(BaseConverter):

虽然我相信循环的实际原因是LOAD_CONST正在代码对象中用于对抗本身不是常量的对象,但我相信这会混淆 gc(假设引用的代码对象是活动的,当他们实际上不是)

通过#1524

此页面是否有帮助?
0 / 5 - 0 等级

相关问题

miki725 picture miki725  ·  10评论

golf-player picture golf-player  ·  10评论

ngaya-ll picture ngaya-ll  ·  8评论

SimonSapin picture SimonSapin  ·  12评论

davidism picture davidism  ·  9评论